LLMs

As I use LLMs to help me build software I keep running into situations where there is a missing piece and leverage point, that if injected will dramatically raise the quality of creation: subject expert turtles.

Triangle showing how app devs, experts, and bots come together

Much of this formed when creating mock.shop and seeing what it takes to go from a demo to production, but let me explain via another recent experience: a web app framework migration.

Framework migration: switching between Next.js and Remix

LLMs are helpful for tasks such as porting. I have used this often, especially working between Python and JavaScript for some recent AI projects. I wanted to explore taking a web application using Next.js and have it ported to use Remix, or vice versa.

Out of the box, with ChatGPT, it would get some of the high level changes correct, but it would be very surface level. For example, actions and loaders may be created, but imports would have the form of @/components/foo and next/image.

Our LLM friend has a galaxy of information, but we don’t know what’s actually in there, and software keeps evolving so the information that may be there is probably outdated to some degree. This is where us humans come in. We can use that juicy context window to share:

The latest information from documentation that maps to our versions. Querying embeddings from this content can be plucked into context.
Rules and reasoning for translations. What are the steps that someone knowledgeable of both frameworks would write?
Quality examples of before and after. If you go through these steps what are solid mappings where patterns can be learned?

Depending on the quality of this work, you will see a massive upgrade in the results. They go from “some nice hints but wow so much is wrong” to “this is kinda usable out of the box!”

At the end of this process, “What are the steps that someone knowledgeable would write?” stuck with me. Someone else was going to go through the same migration, and it doesn’t make sense for them to have to build out all of the mappings. This is a waste of effort!

Time for the knowledgable turtles, already!

I have some knowledge of Next.js and Remix, but I am hardly The Expert. What if true experts (core team, folks from the community, etc) were the ones to package the relevant information about their frameworks?

Gonna live stream at 4pm PT (in 2 hours) and migrate an older Next.js application over to the App Router.

Will be just coding and playing tunes (strictly bangers) if you wanna hang out.https://t.co/LGZDMJiYzw
— Lee Robinson (@leeerob) June 7, 2023

Lee does great streams like this!

Lee does a great job showing a conversion from one version of next.js to another (to App Router land). This knowledge can be codified for anyone else doing this.

Picture an app developer creating a new project and installing all of their dependencies, and this time each one of them comes with hints from the projects themselves. And it’s turtles all the way down as each dependency comes with it’s own dependencies.

In this world you are building with a world of expertise funneling information into the amazing reasoning engine that is AI via LLMs.

What knowledge can we funnel?

A `.chat` file in every repo prompting AI assistant (e.g., Ghostwriter) to be most helpful in this project.
— Amjad Masad (@amasad) June 5, 2023

Others are talking about this

Each project has an ai-hints.json file, which is the router to correct information. It is a simple configuration that links out, or contains some inline information, for the given project.

It contains items such as:

URL to the source of the library (e.g. GitHub URL)
URL to the home page of the library
Description for the library
URL to issue tracker of the library. Given the variability of quality in here, pinches of salt are included, and can map to answers from trusted folk / voted up
URL to forums (e.g. StackOverflow / tags)
URL to documentation site(s)
URL to high quality community content (e.g. great blogs, YouTube, etc)
- Popular libraries often bring in examples, and other projects that use them, and run their test suites as a great way to catch regressions that your consumers will run into. We can follow this pattern to get wisdom from customers not just official content
Versioning scheme:
- One current issue is that LLMs aren’t aware of the differences between versions and thus you sometimes get feedback that is tied to an old version, which is frustrating!
URL or direct inline prompts that can be used to generate great tests
URL, or inline docs, to prompts and reasoning
- This can become a store of knowledge. E.g. it can be where conversion knowledge goes
URL to project settings such as package.json in node / js ecosystem to start to find all of the turtles
Polymath services: URL(s) to polymath services that have knowledge of the project
Embedding stores: URL to a store, or a local placement of embeddings that can be used and aggregated
- This way we can share embeddings versus recreating them time and time again

Speaking your full language

There have been some moments where my AI pair has been a true partner. A pattern in most of the best moments has been how the back and forth can be so much more concrete. Often, if you are building something for your own application, you end up following a path of translation.

You want to do concrete thing X using library Y. This would often resort with finding the documentation in various places and learning the abstract thing closest to X (which may not be easy to even find!) and then working out how to translate this information into what could work for the concrete task.

Now, you can *explain* the concrete task, explain that you want to use it with your set of tools, and the initial answers can be speaking in that language. And you may not even know which library to use, and you can ask for thoughts and implementations with those thoughts too!

Being able to aggregate the dependencies is huge.

One of the reasons I enjoy working on Polymath is it’s federated nature. If I am working on a project that uses Remix with Preact I can write a query that asks for information from both the Remix polymath and the one for Preact.

More knowledge? More context

Meet LTM-1: LLM with *5,000,000 prompt tokens*

That's ~500k lines of code or ~5k files, enough to fully cover most repositories.

LTM-1 is a prototype of a neural network architecture we designed for giant context windows. pic.twitter.com/neNIfTVipt
— Magic.dev (@magicailabs) June 6, 2023

Billions of tokens!

As we build out larger knowledge sets, we need new ways to feed our AI’s creativity. Fortunately, we are seeing various models get significantly larger capacity for prompt tokens, including updates today from OpenAI:

“gpt-3.5-turbo-16k offers 4 times the context length of gpt-3.5-turbo at twice the price: $0.003 per 1K input tokens and $0.004 per 1K output tokens. 16k context means the model can now support ~20 pages of text in a single request.”
OpenAI announcement on June 13th 2023

We are also getting smarter with how we can chain reasoning together. Instead of firing off one prompt as a shot, you can do multiple, and ask various questions to drastically improve quality too.

E.g.

Ask questions differently in parallel
- Use different prompts
- With different settings (e.g. multiple temperature values)
- With different context
- And even different models entirely
Using the output from above, ask for a critique
Feed the critique AND the options from above, and ask for a unified solution.

Ecosystem scratches it’s own back

Trying out @sourcegraph's Batch Changes to sunset a GitLab CI configuration across bunch of repositories at once. This is trully magic! pic.twitter.com/aWtm72TlIA
— Gvntr 零 (@47px) June 8, 2023

Large refactoring that works

By coming together and curating the information, we not only scratch the backs of all developers using our products, but in turn we are helping ecosystem health.

If you have worked on a platform, you know that one of the most important things to do is setup incentives to keep the platform evolving and healthy.

This is hard to do, and often goes wrong. You have probably worked with tools where updates break things more often than not. What does that teach you to do? Lock in to a particular version that is working, and only do upgrades when you have the time to deal with it.

If instead, upgrades work well? Then you should be game to continuously keep up to date. Doing well would look like:

Clear understanding of what’s in the upgrade
Codemods that can run to help you update. We will soon be in a world where our nano bots will see an update, create a PR, run all of our tests, and you will have a strong starting point, or maybe even more. I’m ready for my nanobots to be cleaning things up for me, handling updates, performance improvements, checking for security, for accessibility, etc etc.
- We will just need great UX so we don’t feel swamped with this work. I don’t want a poor open source maintainer to be slammed with PRs in a way that it feels like spam. Categorization etc on GitHub will be a big winner here 🙂

An entirely revolution is happening when the world of developers and shared knowledge combines with the new world of LLMs.

We don’t want LLMs to use the overall corpus of code that is out there, because if the surface of the code is small, it may not have any great answers, and when the commons is huge, you can end up with a lowest common denominator such as:

Copilot always knows exactly what I'm about to do. pic.twitter.com/iewWWbg0kq
— antony  (@antony) June 9, 2023

The commons sometimes catches div-itis and worse

When it comes to code, hallucination is the enemy not just because it is very unhelpful, but also due to side effects such as security:

* People ask LLMs to write code
* LLMs recommend imports that don't actually exist
* Attackers work out what these imports' names are, and create & upload them with malicious payloads
* People using LLM-written code then auto-add malware themselves https://t.co/Va9w18RpWu
— LLM Security (@llm_sec) June 10, 2023

Humans write a lot of bad code too, so let’s be more vigilant for all?

It’s not just about the quality of the code and the ease of use. It’s also about the speed at which we can learn, adapt, and grow as developers. With a wealth of knowledge at our fingertips, we can quickly understand new technologies, make informed decisions about the tools we use, and stay ahead of the curve in an ever-evolving industry. This revolution in software development is empowering us to build better, more efficient, and more innovative applications than ever before.

I’m very much here for it, and Yet Another robots.txt^H^H^Hjson.

/fin

ps. In some time some of the expert turtles will become robot turtles 😉