“There are only two hard things in Computer Science: cache invalidation, naming things, and off-by-one errors.” — Phil Karlton++?
Concurrency is hard. It is a world of trade offs and you often see some junior engineers thinking they have found the one true solution by adding in a caching layer to fix their performance problem.
I have been there. So many of us have. We need to be able to offer predictable and reliable scalability, and to do that we need to minimize bottlenecks. If we get this wrong though it takes more time to manage replication and invalidation and we have just created a potentially worse, and more complex problem.
As hard as it is to architect a solution that maps to product requirements and SLAs, I often think about how the same problem lies at the heart of organizations.
We have come a long way in how we try to scale humans:
But we may have a ways to go, even in 2016:
How can we predictably and reliably scale?
It is also tempting to grasp for the “simple” solution. If you have grown up around silicon valley for example, you think that the answer is talent density and small teams.
You can’t outgrow those pizza teams, you just need to split them up and you are all set!
However, it may not be that simple. This works just fine if you don’t need to communicate or rely on each other. Normally this isn’t the case. If you have team A relying on team B for some infrastructure (which is defined here as “the stuff you don’t want to worry about to get your work done”) then you could be bottlenecked on their velocity. If team B have multiple client teams then they have to prioritize and work out how to solve for that.
You can easily get frustrated here, especially if there is a lack of transparency around how prioritization happens. Do you have a way to offer up head count or resources so you aren’t just sitting there saying “when are you done” but are diving in to help? Is that even possible given the skill-set needed?
That is often the rub. Humans can’t scale in the same way as CPUs as each one of us is so different, and each one of us has very different software onboard 😉
It is so very easy to get into a log jam. Let’s look at a hypothetical example that you may have seen yourself.
Your company has a team that provides an infrastructure as a service. In general you want to use shared resources, especially if they have domain knowledge that you lack, and if by supporting you all of the ships that they support can rise.
It turns out that they want to support you, but they don’t have the resources to jump on this work. Now begins the prioritization game. Can you collectively get support for this work to be done?
There is nothing worse than being the small fry when it comes to these discussions. Let’s say you are a global company, and you are sharing a central payment service. A small country has a payment solution that is The Gold Standard where they are, so it is crucial to get it integrated into the Global Solution. But then a long comes the Big Gorilla. The country that is so large that one compliance feature goes through the ROI analysis and it knocks the small guys solution below the line. Ouch.
Well, screw it, just go off and do a custom build! We can move fast! Talent density ftw! It feels great while you do it, but you may have created a new cache to invalidate. Fast forward a little and you notice that there are now a slew of other groups that have done the same thing. Team members have moved around and some of those systems are hard to keep alive, let alone gaining features.
How will this play out in the long run? It may still make sense to take this path though, and as someone once told me:
“The life blood of a company is momentum.”
This is why scaling companies can be hard. It will be wasteful and you need to be OK with that. It will be messy and you need to deal with that. You can build a framework to help you make the decisions that you need to make as they come up, and then you need to paddle as fast as hell.
p.s. I am looking forward to seeing what comes out of Scaling Teams. There is a dearth of content on engineering management.