December 2024

How do you build products and platforms for developers in a world that contains probabilistic black boxes that surprise you with what they can and can’t do, and when they decide to show you.

From my own trial and error, I have found that most of my mistakes are in not understanding the two pieces: developers and AI systems.

Instead of merging them, the key is to understand what makes each different and building with that in mind. Then the sum of the parts, bringing great UX to the party with smart LLMs, does the trick.

I think it is easy to anthropomorphize computers now that they seem to understand our language. Our written words. The images our eyes see. The sounds our ears hear. They have become the robots we have read about and watched on the big screen in Sci-Fi for ages!

Thus, if building a coding assistant, it should feel like another human that you are pairing with, right? With the same UX? No. We can do much better.

The Developer

First, let’s look at the developer, the human, and see how they operate, where they shine, and where they can use help.

Pat the developer:

Has a variable set of skills and knowledge when it comes to building software. They are proficient in some programming languages, databases, libraries, frameworks, platforms, and domains. Some of their knowledge has faded over time while other sets are fresh. It’s a unique mesh.
Only has a certain amount of time and energy to expend per day. They really don’t scale. Sometimes they feel sick, and other times they are in the zone and flowing.
Hates toil, can get bored, and prefers creative work.
Has some imposter syndrome.
Is forgetful, and makes random mistakes all the time.
Is able to deeply understand the context of their work environment, and the people around them.
Knows what winning is all about, and cares about the user and business problems that need to be solved.

This is just the tip of the iceberg, but you can already see how important it is for your solution to:

#1 Get the important context that is hidden in Pat’s brain out of their head and available to other team members and the AI system itself. There is gold locked up there.

Help Pat expand and elaborate with the system. For example, if you have a chat interface in your solution, end with questions for Pat to get more information, and teach Pat to keep iterating this way!
Make sure that Pat can tell you important things such as “this is a golden PR, trust this way of doing things”, or “this part of the codebase is legacy, please don’t give me more like this… instead treat this other part of the codebase as The New Way ™”, etc.
Secret side note: if this is done well, it also means that if Pat leaves the project or the company, more of the knowledge is left behind and available!

#2 Make sure that Pat is always unblocked, and in that flow state as often as possible.

Many AI researchers I have worked with see a failure and their instinct is: “we will fix that in the model”. They run off and try to steer the model to solve that particular problem, and for it to always come out with the perfect answer. While it’s great to keep improving on real tasks, and building those datasets, there will ALWAYS be issues. This is an endless game that reminds me of Google Search and the whack a mole world of “search quality”. The answer is to not fight for perfection, but to have a forgiving UX for the developer. If I am stuck on a task, I get angry, and feel like the system has totally failed me and my only hope is to start from scratch. If instead, there are threads for me to pull, and things for me to try, I am happy to fight to get to the solution!
I often think about the beauty of “the 10 blue links” with Google Search. If the best result is 3rd on the list, I don’t think of it as a failure at all… I am still very happy with Google. Contrast this with Google Assistant or Alexa… where there is one result. If it’s wrong, trust is eroded quickly. I spoke about this with Malte Ubl of Vercel, and how smart it was when the original v0 would show you multiple versions from a prompt so you could pick one. If 3 of the 4 were meh, but one was a solid starting point… great! Always have a next step for Pat.

#3 Take care of the toil so Pat can be doing the work that Pat can uniquely do best.

#4 Raise the level of abstraction: Let Pat talk to the system in a way that matches their skills, and allow translation. If Pat is expert with Rust and the backend, build confidence that they can dive into parts of the mono repo that are built with TypeScript because the guardrails are there and the details of syntax etc aren’t what is important here.

#5 Build trust with Pat. Show the sources and explain WHY the system is doing what it is doing and allow Pat the ability to jump around and learn more. Transparency is key. Let Pat change the context the AI has and re-run things so they can tinker and iterate to the best possible results.

The AI System

Now we have the AI system you are building. Broaden the view here and think of it as the overall computer system that happens to have AI components:

Think of this as somewhat infinitely scalable compute. You probably wouldn’t take an issue from your project tracker and farm it out to 6 developers on your team and then when they each send back a PR pick one you like to iterate from, but with AI you could decide to do that.
- Now imagine how the UX of a system can change. It can go off and come back to Pat with multiple options and Pat can happily curate and pick a favored one to iterate from!
Some developers complain that current LLMs are “only junior developers”. Let’s say this is the case… but LLMs are trained on so many domains of computing that they are junior developers who know EVERY programming language, library, framework, platform, etc. This is amazing. Oh, and they ain’t no junior developers to boot.
By default they have this incredible broad knowledge but it’s like they are showing up on their first day and they know nothing about your domain. Fix this by connecting them with all of the context they need!
AI has no feelings, and thus is happy to do toil. Do all the toil all the time.
AI has no ego, so won’t be a jerk to Pat and is a safe space with no judgement (unless you make the AI act like a jerk ofc! Don’t.)

With this acknowledgement, you can make sure that your solution:

#1 Eval driven development: First, make it work, then make it fast and affordable.

Once you prove something out you can use synthetic data and fine tune models for particular tasks that are cheaper. Oh, and everything is getting cheaper month by month. There are new models all the time, so build a platform that can make use of multiple ones and run them against each other. You will always be surprised at which models are best for particular tasks. Don’t bet on one, bet on evolution and enjoy the ride.

#2 Tools: Give this LLM “brain” tools to wield.

Don’t rely on the model to do deterministic things when it can just use tools. We are now seeing some of the SoTA LLMs do internal calculations to decide when to use tools vs. just solve the problem directly. Great. But think about what tools are most useful and put them in reach of the LLM. Do the dance of working out when your system should be the meta-cognition agent vs. when to let the LLM do its thing. It’s a fun dance to learn.

Noam Brown, who worked on reasoning tokens and the system in o1, was talking about this for many years, such as this talk, that discusses how neural nets without special pathways are vastly inferior. Computers really got good at chess (and then Go etc) when they added search and started playing themselves.

#3 Data: Use large LLMs to generate great synthetic data. Your system should be saving data to learn from and feed back into the system to improve the AI all the time. What your AI and Pat are doing is gold. Learn from it. You will be very surprised.

#4 Smart Context: With large context windows and smart retrieval, you can make sure they have the best possible information to work with to get something done. Think about all of the signals you can give them… build output, runtime errors, you name it. And if you don’t have enough space to give them all of these signals what can you do? Run multiple parallel versions that have different signals passed in and let Pat choose the best results… or another AI judge!

And now we are seeing SoTA models looking to integrate external data via protocols such as Anthropic’s Model Context Proposal. It’s fun to see the experiments here.

#5 Cheap Experiments: you need to be able to run experiments all the time. I remember talking to one of my favorite AI researchers who worked with Dario from Anthropic back when they were at Google, and something stuck with me:

“It wasn’t that Dario had the best ideas, although he had plenty… he just ran 10 to 100 times as many experiments as anyone else. That’s when I knew he would do amazing things.”
If you hear yourself or your team saying “oh, it will be hard to try that” take a step back. Invest in a platform that makes it easy to try things.
I’ve always been humbled to see the difference between how I *think* something will turn out vs. how it collides with reality. With LLMs, it’s even harder to know. Try things and allow emergence to be your friend. Imperical wins the race.
Don’t be precious with your prompts either. Over time I have seen that models have gotten better at understanding plain language vs. magic spells and incarnations. Let people play with the prompts and have that be great because your eval framework will allow for that nicely. Don’t gate keep here.

So, here we are. We are building and iterating on a system that gets the best out of developers, and the massive scale of compute, with the new technology of transformers with the fluke of GPUs gives us an epic opportunity to build amazing things.

Shall we?

Archives for December 2024

The Developer

The AI System