• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Dion Almaer

Software, Development, Products

  • @dalmaer
  • LinkedIn
  • Medium
  • RSS
  • Show Search
Hide Search

Software Development

“I have never had a fight with my wife!” The Importance of Resilience

August 25, 2015 Leave a Comment


I overheard a conversation that you have probably heard a variant of yourself. A bloke was so proud of the fact that he hadn’t had a serious confrontation with his spouse to date. Ah, the perfect union.

While some joined him in appreciation, I had to hide the real thoughts going through my head:

“Oh crap, he may be lucky and truly have the perfect situation, or when something does come to a boil, they have never practiced the art of disagreement. They have never worked though a tricky situation”

Resilient Software

This reminded me of a similar conversation that I had awhile back, where I heard an admin act so very proud that one machine had been up for over a year. That scared the hell out of me too. It meant that a restart hadn’t been tested in over a year, and can you imagine the magic and cruft that was built up? That is one of the reasons why folks are excited about immutable servers, or at the very least having systems that get built up from scratch.

I am building a new application, and not only should it be mobile first, it should also be offline first. The majority of experiences should probably be architected in the same way. You notice the opposite these days, and often from applications that were built before the mobile revolution. It is often much harder to bolt on offline capability after the fact.

When you build an offline first client you tend to get some great side benefits:

  • If you are working on local data you can keep a responsive UI (as long as you are smart about keeping off the main thread!)
  • You can progressively enhance when online
  • For example, when DuoLingo matches your typed in answer a simple match can occur locally while a more complex match can be kicked off online (if the client is offline).

You can also get into a situation where you are out of sync. The client and server see differing versions of reality.

Shift+Reload

In retrospect we had it lucky in the Web 1.0 days. We could purely server render and the client was a dumb terminal that recreated itself on every request. As browsers got richer caching we needed to give users the nuclear option: Shift+Reload. Not exactly user friendly, but it sure came in handy (and still does!).

These days we need to make sure that our rich clients aren’t getting corrupt. For our clients to work well offline they have local state and data to work on. We have all been frustrated when there is a bug that you can’t easily restore from.

Borken Downloads

One example for me is installing applications on my devices. As I type I have gotten into the situation where my download is hung, yet I have no way to kick start it or even delete it. There is something so very infuriating when this happens, when a version of a shift+reload doesn’t fix the situation. Just yesterday a coworker and I created the same projects on Asana because we didn’t see that the other had already done so. It took forever for me to see his version, and be able to clean it up.

It is tough to get this right. We are trying to do the right thing for the user by caching and keeping their application responsive, yet we should take some hints and have systems to help out. If a user is killing and restarting their application, that is the modern shift+reload is it not?

Micro Services

In theory the birth of micro services and trying to hide the complexity of functionality behind nice clean decoupled APIs helps us with resiliency. In practice I have seen this turn out to be a real mess. It isn’t the fault of the practice, but rather the implementation details. Here is what I have seen go wrong:

The scope of the services isn’t defined correctly

  • In one example the scope seemed to be a function of team size vs. the natural composition of the functionality

Inter-dependency killers

  • There were separate small services, but they all depended on each other. The result was a lot of communication around “a new version of service X was deployed and it broke service Y in QA”

No view of the system as awhole

  • When something goes wrong, how are you made aware? How do you then find where the problem is? Due to not having enough of a view on the whole system it can be hard to get this information. You have to explicitly spend time on the seams

Poor exception handling

  • I hate it when all errors and exceptions are treated as equal. This results in socket closed exceptions being thrown into the mix where they weren’t errors at all…. the client just disconnected and it was fine! As soon as you get this wrong you get a sea of information that you can’t trust, and the killer errors can go un-noticed. I have seen shocking bugs live in production for far too long due to this :/

Finger pointing

  • The worst situations occur when you have constant finger pointing. Something is wrong in the system but each team is arguing about what is actually broken. Services teams point at each other, and point at the network guys, who point at the infrastructure guys who …. point back to the services folk!

Spending time up front to get ahead of this is vital. Certain platforms shine here too. Erlang is known for holding resiliency as its core tenet. Various reactive platforms do well, but although these can make life better for you, you need to care.

I have often had to hold my nose and do the impure. I have setup proxy layers that do automatic retries when the core backend should have been fixed. This is risky, because you can end up increasing the traffic and causing even more issues, but if done right it can save your bacon.

Have you gone through and spec’d the SLA needed for various services? So often we see a least common denominator when it is better to split things out. As an example, if you look at an API that gives you information on a product (description, price, availability, reviews, images, etc) you may want an up to date price but those reviews? Not so much. You can probably deal just fine without that one review that just came in. In this case you probably want to say the equivalent of:

“try to get the latest reviews, but if they don’t come back in Xms then use the last grabbed…. and when that call comes back update the cache for next time maybe, cool?”

It isn’t a surprise that hapi, which Eran Hammer and his team started with me at Walmart, does this pretty well thanks to a box for a cat, as well for handling microservices in general.


For a great modern experience that is fast and works well for your users, chances are that you should:

  • Build an offline first client, but give it enough intelligence to be able to handle corruption and get back to a clean state of health, even with nuclear options
  • Build a services tier that assumes failure at each tier, and that can deal with that failure gracefully
  • Progressively enhance the experience on both the client and the server to make sure the core service always works, but that it can also turn on features and tweaks when available.

And as soon as you have something running, start taking your code to counselling so the system can get good at dealing with disagreements and disruption 😉

Delivering software on time is important, but not most important

July 14, 2015 Leave a Comment

https://twitter.com/kartar/status/619587592300969984

Reliable software delivery is welcome, and an ideal trait of a great product engineering team. Most would trade off a slightly slower pace of delivery to gain predictability.

This level of execution is tough to come by. The team needs to learn to work well together but that isn’t enough. Just as all teams aren’t equal, all problems aren’t equal too. You may be able to get into a predictable rhythm when it comes to estimating the time it will take to deliver a screen when the API is already stable, but if there are more unknowns (usually the case) it gets harder. And then there is true R&D. You can’t predict the unknown, so if a team doesn’t understand how they are going to solve a problem then your estimate could be wildly off.

The thing is: that is OK! This is how creative work happens!

One frustration I have with The Business wanting fixed deadlines is that they rarely appear to have time to understand any of the nuance, risk, and unknown. They want a date, even if it is a false sense of security and doesn’t represent reality. Tools such as LiquidPlanner that try to put in as much of the uncertainty as possible can help visualize this nuance. If you are giving an absolute amount of work then chances are you are very wrong. Favor ranges over absolutes and push to get people thinking in that way. Understand how any fuzzy prediction gets clearer as it gets closer (like the weather!).

This is often a tough sell, which I have always found interesting given that 90% of the projects have all had the goal posts moved, often near the end. Teams can be scared to change the date because “we don’t want to be flaky” so they keep holding their breath and hoping that heroics or luck will save the day. Sometimes they do, but do you want to run your business (or life) that way?

I feel awful when an engineer saves the day through heroics as it means that I didn’t do my job. Great engineers have done this countless times in my teams, and I celebrate them whilst feeling the personal frustration.

We naturally want to keep commitments, and being good partners is very important, but transparency and doing the work as a team is more critical than keeping false views. Transparency allows for understanding and an easier changing of scope and incremental tweaks along the journey.

Then you get to one of the worst sins: shipping to hit the date. Teams persuade themselves that the risk is worth it, that it is “good enough”, and prioritize the commitment to partners over customers. This tends to burn you though, as what really gets remembered? the quality of the product. If it is buggy, or suffers downtime, the team will be scrambling and paying the price for some time. There will be pressure to ship something, especially if it has slipped a couple times, but what is actually remembered is the product and how well it performs. That extra couple of weeks of testing and polish may be critical, so we shouldn’t take the talk of “MVP” as meaning “ship stuff that isn’t tested” (MVP is about the feature set, not the quality).

A crazy black friday rush

Some dates are more sacred that others. If you are in retail you understand that there is a bit of a difference in shipping functionality in October vs. February. You have to be ready for Black Friday and the holiday season. This means that your processes need to change accordingly. Not only do you need to account for those periods where the “dates can’t move” (and thus scope etc has to), but minimize the importance of dates at other times during the year to give the teams a freaking break.

I know that someone promised some feature to some team for a March 1st ship date. You know what, software happens, and the fact that it ships in April isn’t the end of the world.

Be proud of what you ship, and how the customers experience you, and prioritize that above politics. The politics will probably take care of themselves: when the product does ship and it does well, the stake holders will appreciate it and will forget that it was a little late. The customers (and stake holders) won’t forget the product that shipped on time but blew up in everyones face.


Getting close to pluto!

I have had this post in the queue for awhile, but it felt like I should finish it up and post it on the day that NASA comes to Pluto 72 seconds ahead of schedule on a 9 year mission.

That is both impressive, and showcases the trade offs needed to get precision.

What is also interesting is that we can get the drone to Pluto, billions of miles away, but we can’t keep up the website that talks about it. Huh!

I will take a drink and tip my hat to the engineers at NASA as we watch game time at 5:36pm Pacific Time!

Habit Driven Development

June 30, 2015 Leave a Comment


I have happened upon the importance of habits recently. I have always heard “good habits are important”, but I never really embraced that in a thoughtful way.

That changed when I had a health crisis. I realized that I needed to make a change to get healthy. I needed to create new habits with respect to nutrition, exercise, and holistic health (including mental health, sleep, you name it!).

I would often set myself lofty goals, and then if it didn’t look like I would reach them, everything would fizzle out. On the contrary if I can do something incremental on a daily basis it sticks, given enough repetitions to kick in.

Software Habits

A lot of the changes that we have seen in software development revolve around habits too.

Agile

I tend to dislike proprietary terminology or One True Way. There has been a backlash towards Agile ™ as many have seen it slip into dogma. As soon as you find that you have forgotten why you are doing something you are in trouble. The agile manifesto itself is a simple document that talks about values: favoring X over Y.

Some took this and created religion around them. There are fascinating spiritual questions, and long term values and learnings around how to live a good life. Certain folk managed to persuade people that their recipe is Truth whilst the thousands of other recipes are so wrong that you can end up in hell for believing them. Fortunately we don’t think that Scrum is an infallible document passed to man from God, so let’s keep trying new things and focusing on what works, what doesn’t, and why.

Athiests can also tend to poo poo particular practices. Some are bizarre and even barbaric, but others make sense. Alain de Botton lays out this case well in his book, Religion for Atheists.

Let’s not make the same mistake and ignore the good things that have come from various methodologies out there. Let’s not try to setup our own habits because we are scared to “be as bad as them”. Checklists don’t stiffle, they can save you from making mistakes, forgetting good knowledge, and give your brain the time and space to noodle on the important problems at hand. Dogma occurs when you forget why you are doing something and refuse to change with new knowledge and learnings. Practices that do change upon reflection, including looking at first principles, are a great thing.

Continous X

There are many tasks in development that we have managed to chunk up and allow us to do them more frequently. I am old enough to remember the bad days where people were scared to do a certain task because they couldn’t trust how it would go. At every release you would see people with crossed fingers, first for the release to get out well, and second to make sure that if a rollback was needed it actually worked. Today we release all the time and some are actually pulling off continuous deployments and ideally delivery.

We have seen the same thing elsewhere:

  • Writing and running tests
  • Creating and merging branches
  • Setting up infrastructure.

There is a huge pay off when you can trust your process. You can deliver higher quality product with less risk and a much improved pace over time to boot.

There are severe penalties for not catching things early. I remember the work at Palm where they found that: if you catch a bug on the same day it was introduced you can fix it in around an hour. If you catch that bug later it can take 24 times as long, and take a day. It is obvious why: your brain was right there so you don’t need to context shift, and the changes are few.

The core of an good process in my mind is simple:

  • What are the core values of the team
  • What practices map to those values
  • What habits will result in an improvement
  • Reflect and iterate.

A process that shows agility seeds the core values with the ability to change quickly because of the simple observation that those things that can’t evolve tend to die. This doesn’t mean that the process is fast. It may take effort to build agility into your software, but the bet is that it pays off and gives you the best chance of not getting stuck in the future.

The key to non-dogma is the retrospective. As long as you can revisit and try new things you can get to where you need. Just as with A/B testing, sometimes you can’t iterate to a better solution quick enough. Iteration is great if your dart was pretty close to the mark but sometimes you should throw another dart.

What are your and your teams habits? When was the last time you took a deep look at the why as well as the how?

« Previous Page
Next Page »

Primary Sidebar

Twitter

My Tweets

Recent Posts

  • Stitching with the new Jules API
  • Pools of Extraction: How I Hack on Software Projects with LLMs
  • Stitch Design Variants: A Picture Really Is Worth a Thousand Words?
  • Stitch Prompt: A CLI for Design Variety
  • Stitch: A Tasteful Idea

Follow

  • LinkedIn
  • Medium
  • RSS
  • Twitter

Tags

3d Touch 2016 Active Recall Adaptive Design Agile AI Native Dev AI Software Design AI Software Development Amazon Echo Android Android Development Apple Application Apps Artificial Intelligence Autocorrect blog Bots Brain Calendar Career Advice Cloud Computing Coding Cognitive Bias Commerce Communication Companies Conference Consciousness Cooking Cricket Cross Platform Deadline Delivery Design Design Systems Desktop Developer Advocacy Developer Experience Developer Platform Developer Productivity Developer Relations Developers Developer Tools Development Distributed Teams Documentation DX Ecosystem Education Energy Engineering Engineering Mangement Entrepreneurship Exercise Eyes Family Fitness Football Founders Future GenAI Gender Equality Google Google Developer Google IO Google Labs Habits Health Hill Climbing HR Integrations JavaScript Jobs Jquery Jules Kids Stories Kotlin Language LASIK Leadership Learning LLMs Lottery Machine Learning Management Messaging Metrics Micro Learning Microservices Microsoft Mobile Mobile App Development Mobile Apps Mobile Web Moving On NPM Open Source Organization Organization Design Pair Programming Paren Parenting Path Performance Platform Platform Thinking Politics Product Design Product Development Productivity Product Management Product Metrics Programming Progress Progressive Enhancement Progressive Web App Project Management Psychology Push Notifications pwa QA Rails React Reactive Remix Remote Working Resilience Ruby on Rails Screentime Self Improvement Service Worker Sharing Economy Shipping Shopify Short Story Silicon Valley Slack Soccer Software Software Development Spaced Repetition Speaking Startup Steve Jobs Stitch Study Teaching Team Building Tech Tech Ecosystems Technical Writing Technology Tools Transportation TV Series Twitter Typescript Uber UI Unknown User Experience User Testing UX vitals Voice Walmart Web Web Components Web Development Web Extensions Web Frameworks Web Performance Web Platform WWDC Yarn

Subscribe via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Archives

  • October 2025
  • September 2025
  • August 2025
  • January 2025
  • December 2024
  • November 2024
  • September 2024
  • May 2024
  • April 2024
  • December 2023
  • October 2023
  • August 2023
  • June 2023
  • May 2023
  • March 2023
  • February 2023
  • January 2023
  • September 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • November 2021
  • August 2021
  • July 2021
  • February 2021
  • January 2021
  • May 2020
  • April 2020
  • October 2019
  • August 2019
  • July 2019
  • June 2019
  • April 2019
  • March 2019
  • January 2019
  • October 2018
  • August 2018
  • July 2018
  • May 2018
  • February 2018
  • December 2017
  • November 2017
  • September 2017
  • August 2017
  • July 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012

Search

Subscribe

RSS feed RSS - Posts

The right thing to do, is the right thing to do.

The right thing to do, is the right thing to do.

Dion Almaer

Copyright © 2026 · Log in

Loading Comments...