When building software products you are at times doing work akin to chemistry, and at others more like biology.
What do I mean by that? Take unit testing vs. integration testing as an example. Unit testing is very much like chemistry in that you are isolating the environment so you can do analysis and measurements on the output of certain reactions and experiments. This helps you isolate your understanding, and in software it allows you to regress the changes to your experiments (since you are changing the reaction every time you change some code).
In the real world your code isn’t that isolated and this is where integration testing comes in. It is akin to biology in that you are testing the entire system and it gives you a global view on that scope. It allows you to run multiple systems to test against and perform tweaks on (e.g. change settings that the system runs with and see how performance is impacted). The beauty of software is that you can run a huge number of concurrent systems, randomize variables, and pull together the results to see what wins. Brute Force Baybee. Would you feel totally at ease with that approach though? Or, would you like to understand why?
To do real science you need both views. Unfortunately though, I feel like the computer scientists focus on chemistry and math when we need a much larger focus on the biology and practical engineering disciplines.
We too often merge together anyone who writes code as software engineers or software developers, but in reality there are sub-disciplines and a spectrum between app developers and engineers. We often run into problems when we have the wrong people in the wrong roles, but the current misstep du jour seems to be in the world of microservices.
I have seen epic failures in microservice deployments, and the root cause is generally because everyone is focused on the individual services (the chemistry) and not giving adequate attention to the overall system (the biology). It is easy to see how this happens, and I talked about some of the reasons in a recent video interview:
With monolithic applications teams could easily be stepping on each others toes and causing a lot of pain. But now with microservices each team can work in its own space and not have to worry about what the other teams are doing. Loose dependencies are good dependencies!
The key issue though is that no silver bullet can get you away from doing the hard work. Software is hard. You need to rigorously be taking problems and breaking them down mercilessly until you are left with the smallest, understandable pieces in your bare hands. To do this at scale you are looking to do so with a team larger than one, so you need to start to communicate. Now you need to understand each other and come up with contracts that each side can agree on. Getting these specified correctly and concretely is tough, but that isn’t the end. You need to define SLAs, and how you will work together. Good partners will be inclusive. They won’t be in CYA mode, but will instead realize that you are in this together. You will run each others tests against your systems to make sure that you aren’t breaking your customers.
Work will be done on the system that houses and allows for the communication between these systems. How are services versioned, rolled out and accepted? How is everything correlated in the system? How do we handle back pressure? There are many questions that a team needs to have answers for.
When it goes wrong, microservices feel like a step backwards. Everyone is pushing new code and breaking each other. The system is unpredictable and down all of the time and everyone points at each other. People may start to think that they have been sold a new one and should go back to their Old Faithful POS monolith. They are probably wrong. The problem wasn’t the shiny bullet, it was how everyone went about using it. It can be turned around if work is put into the infrastructure, process, and incentives.
I can tell if something isn’t going to end well when I see individual teams going out for a beer after they finished a sprint rather than the entire team going out to celebrate users having success with the product in production.
One of the worst examples of this created API granularity based on team size rather than the natural composition and abstraction of the domain. This lead to far too many end points and dependencies between them all.
So maybe it isn’t just that we need to account for the biology of the systems (although we need to), perhaps we also need to focus more on the team biology.
How much time is spent on the biology of your systems?