Programming

GenAI: Lessons working with LLMs

February 14, 2023

Creativity & Constraints, Foundations & Flywheels

The developer community is buzzing around the new world of LLMs. Roadmaps for the year are getting ripped up one month in, and there is a whole lot of tinkering… and I love the smell of tinkering.

At Shopify we shared a new Winter Edition, which packaged up 100+ features for merchants and developers. Some of the launches had a lil Shopify Magic in them, using LLMs to make life better for our users.

I had a lot of fun, shipping something for developers that used LLMs, and I thought I would write about a few things that I learned going through the process of getting to shipping.

UI for mock.shop — The mock.shop homepage

What did we ship? mock.shop

We want to make it as easy as possible for developers to learn and explore commerce, by playing. We wanted to take as much friction as possible from being able to explore a commerce data model, and build a custom frontend to show off your frontend.

This is where mock.shop comes in, it sits in front of a Shopify store, but doesn’t require you to create one yourself. Just start playing with it and hitting it directly!

One thing we have heard from some developers is that they are new to GraphQL and/or new to the particulars of the commerce domain. We show examples, and the GraphQL and code examples of how to work with it, but could we go even further?

Generate query with AI

What if you could just use your words and ask us to generate the GraphQL for you? That’s exactly what we did. And here’s what we learned…

Foundations & Flywheels

We used OpenAI for this work, and when working with LLMs you are working with a black box. While GPT3 had some knowledge of GraphQL, and Shopify, it’s knowledge was out dated and often wrong. Out of the box you are working with anything that the model has sucked up, and you can’t trust this data at all.

You need to do all you can to feed the black box information so that it can come up with the best results. Given the black box, you will need to experiment and keep poking it to see if you are making it better or worse.

Here are some of the foundational things that we did:

Feed it the best input

Gather all of information that you think will nudge the model in the right direction. In our case we gathered the GraphQL schema (SDL) for the Shopify storefront APIs, and then a bunch of good examples. With these in hand, we would chunk them up and create OpenAI embeddings from them. You end up with a library of these embeddings, which are vectors that represent the chunks of text.

With these embeddings we can take user queries (eg. “Get me 7 of the most recent products”), get an embedding from that query, and then look for similar embeddings from the library that you have created. Those will contain snippets such as the schema for the products GraphQL section, and some of the good examples that work with products. We call this context and you will pass that to the OpenAI completions endpoint as part of a prompt.

Customize the prompt

You will want to play with prompts that result in the right kind of output for your use case. In our case we are looking for the black box to not just start completing with sentences, but rather give back valid GraphQL.

You end up with a prompt such as:Answer the question as truthfully as possible using the provided context, and if don’t have the answer, say “I don’t know”.\nContext:\n${context}\n\nQuestion:\nWhat is a Shopify GraphQL query, formatted with tabs, for: ${query}\n\nAnswer:

You can see how the prompt is:

Politely asking for the answer to be truthful
Nudging for the answer to be tied to the given context (from the embeddings) vs. making it up from full cloth, and saying that it’s ok to say “I don’t know”!
Asking for a formatted GraphQL query

One other way that we try to stop any hallucinating from the model is via setting the temperature to 0 when we make the completion call:What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

It’s quite funny to see how we do everything to try to get the model to speak the truth with this type of use case!

Feedback and Flywheels

Now it’s time for the flywheels to kick in. You want to keep feeding the context with high quality examples, sometimes show what NOT to do, play with different prompts, and start getting feedback.

You will see lots of examples where users are asked for feedback. E.g. in support systems and documentation: did this help? is it accurate? To train the model as best as possible, you can look for ways to get this information from the experts (humans!) and feed it on back, as well as simply tracking what your users are asking for and how well you are acting on those needs!

Creativity & Constraints

We have the foundations in place, and the quality of data will improve through the flywheels. Now it’s time to get more constrained. We are doing all we can to nudge for truth, but you can’t trust these things, so what guardrails should you put in place?

We really want the GraphQL that we show to be valid, so… how about we do some validation?

We take the GraphQL that comes back and we can do a couple things:

We would tweak it, when possible, to place valid IDs and content, for the given dataset that we have in the mock.shop instance.
Validate the GraphQL to make sure the syntax is correct
Run it against the mock.shop, since we have real IDs, and show the results to the user!

You can’t assume anything, so you often will have to have a guard step once you get results.

ChatGPT vs. Stockfish

There was a lot of hubbub when someone pit ChatGPT vs. Stockfish in a game of chess. Many used it as a way to laugh at ChatGPT. This thing is crazy! It did all kinds of invalid moves! No doy! You have to assume that and build systems to tame it… a chess engine wouldn’t allow invalid moves.

Defensive

You have to be incredibly defensive. You are poking a brain with electrodes. It comes out with amazing things, but you can’t trust everything that comes back. Making remote calls to OpenAI itself is flaky, and often goes down.

Now only will you be checking for timeouts and errors in results, but you should consider a feature flag toggle. In the case of mock.shop, the tool is usable without any of the AI features. They are progressive enhancements to the product.

We can add checks to automatically turn it off if something really bad is happening with OpenAI. Marry both:

const openAIStatusRequest = fetch("https://status.openai.com/api/v2/status.json");

and check the results for the type of incident:

openAIStatus.status.indicator === "major"

It’s incredibly fun, getting creative with how you can use the power of LLMs, which are getting better and faster all the time. The black box nature can be frustrating at times, but it’s worth it.

I hope you are having some fun tinkering!

There are so many helpful libraries out there. I have been working with some friends on Polymath to make it simple to import and create the libraries, as well as query it all.

Generative AI: It’s Time to Get Into First Gear

January 25, 2023

Don’t sit and wait, get tinkering!

We are almost at the end of the first month of 2023, and you are working on executing on the year’s strategy, but we are witnessing an explosion, hopefully a Cambrian one, in front of our eyes… Generative AI.

I wrote about how it can be a helpful tool for us with respect to documentation and beyond and we are seeing changes every week as we learn what works and what doesn’t.

We are seeing developers jump on this, playing with ideas such as commit bots, app generators, ways to generate backends, IDEs, and so much more.

First Gear? Why Now?

There is so much promise, people are already using these tools, so instead of sitting on it and being conservative, now is the time for us to jump in and get into first gear. There is always a fear of being too early into a hype cycle, but the reason I think the time is right is that you see people getting value today. I have been coding with tools like Copilot and ChatGPT and it’s helpful enough that I wouldn’t want to go back to the Before Times. Does it get everything right? No. Is it great for all of my development needs? No, it’s not as good as it should be.

Training all of my content, and being able to query it

What does it mean to get into first gear now?:

Be thinking about use cases that you can start trying. I have been building things such as:
- Using embeddings to bring a chat/search interface to our docs and samples.
- Discord bots to start answering questions
- Super-codemods that help you upgrade, and generally help you build
Build small experiments that one or two people can execute on and start to validate
Build a core competence in the technology, so you can quickly go from ideas to experiments

In first gear you have the pedal down, and a driver is quickly accelerating. This is a technology that is changing fast, and is all about tinkering. Get tinkering.

When do we move to second gear?

You will learn so much through these experiments. What actually works, what doesn’t, and what needs more tuning and tweaking to be valuable. If things are going well and we see how efforts are impacting our key results, you can ramp up and shift more effort into this work. That would be a sign we are seeing something somewhat revolutionary.

But wait, isn’t this a fad? Are we being sheep?

Maybe it turns out that this isn’t as big of a sea change as many imagine. I have been very skeptical of recent webN hype in the past few years, and I don’t think that Gen AI is a silver bullet of any form. I am well aware that it hallucinates, and gives wacky answers at times. However, as mentioned above, I have already witnessed great value, and we have truly just started. I believe it can offer UX improvements for our developer community that are substantial. Sometimes you have to take a calculated risk. Worse case, you learn, and provide some much desired feature food along the way.

/fin

Can Syntactic Sugar be Nutritious?

December 11, 2017 Leave a Comment

When it comes to nutrition and food, there is nothing that we vilify more than sugar, as we find out how wrong we were with fat. As programmers, we often poo poo syntactical sugar, but two interactions that I witnessed reminded me look deeper.

The two stories were of Kotlin and ES6.

Once you kot, you can’t stop

Let’s start with Kotlin. There has long been excitement with the language, and when we announced our support at Google I/O there was a huge cheer, as this signaled that, not only will we not be doing anything that could harm Kotlin support, but that we would invest in making it great. Java isn’t going away, but everyone knows they can safely dig into Kotlin. I would go even further and say that you should at least be playing with it. Many top apps have been using it in production for some time, but if you aren’t quite ready for that, look for other areas to get your feet wet such as writing unit tests to get a feel.

Pat still wasn’t sold. Why bother with a new language? Functionally what can it do that can’t be done with Java? What about finding knowledgable engineers? Or the fact that we don’t have as much documentation for it yet? At a high level this is logical, but then something happened. Pat tried it, and after a weekend of hacking with Kotlin realized how productive and fun it was. As often happens, the expectation bar rose and it meant that going back to the verbose Java code seemed…. old fashioned. Less code, fewer NullPointerExceptions, and new libraries got the juices flowing in a new way, and it didn’t hurt that Android Studio was a nice helper along the way.

I don’t need no stinking classes

Devyn saw ES4 come and go. After years of working with JavaScript, with The Good Parts on the office desk, the notion of prototypical inheritance was a huge feature, and there was no need for class syntax.

CoffeeScript came around, and although Devyn actually liked many of the features (arrow functions, the role of spaces, rest…, lexical scoping) they never seemed worth the cost.

Then we started to see changes in JavaScript itself via ES6/2015, and Devyn was still skeptical. Do I really want to use babel in my flow and set things up for older browsers? Once again, a coworker pleaded with the TL to give it a go on a small project. Fast forward a couple months and Devyn is one of the biggest proponents on the team. When asked why, a huge turning point was the road from callback city, to promise mountain, to async/await lake. Finally, it was all making sense.

const makeRequest = async () => {

  try {

    // this parse may fail

    const data = JSON.parse(await getJSON())

    console.log(data)

  } catch (err) {

    console.log(err)

Say what you mean

All of this was wrapped up in our project’s mantra: “say what you mean”. —Alex Russell

With this cleanup, we are moving developers closer to the point where they can cleanly say what they mean, without scaffolding getting in the way. Gone is some of the verbosity, and we can huffman encode things that we use all the time.

Don’t function make me function in my function chain
With custom elements we can get away from div div div divitis
Not having type information shoved in your face all the time gives your code space to breath (Dion (a Person) went home (a House))

It turns out that paper cuts really do matter. Speaking of types, one area that ES2015 has stopped short on is optional types, leaving room for TypeScript to become yet another example of something that many developers thought they didn’t want. Casey thought types only get in the way, and pattern matched “types” to “the way Java does types” (static, strong, and traditionally in your face), which happens far too often.

We have had myriad attempts of adding types in a variety of ways (beyond ES4). The Closure Compiler was one version, which has always been a phenomenal tool (to this day it normally beats the pants off of other tools when it comes to final code), but I always did a lil body shake when seeing context in comments:

/** @const */ var MY_BEER = ‘stout’;

/** @typedef {(string|number)} */

Pragmatic. But I admit to a preference of getting optional types into the language itself. Ideally even without type erasure!

One of the features in TypeScript that really tickles me is String Literal Types which allow you to get the benefit of enum without having to create enums.

String literal types allow you to specify the exact value a string must have. In practice string literal types combine nicely with union types, type guards, and type aliases. You can use these features together to get enum-like behavior with strings:

type Easing = “ease-in” | “ease-out” | “ease-in-out”;

Language features can often get rid of the need for more verbose patterns. These allow you to get around, or change fashion.

One fashion that comes to mind was when fluent APIs hit the scene hard with the growth of jQuery. I often like the way a chain looks, and they are very natural in the functional world, but when writing an API that has `return this` throughout feels a lil bit smelly to me at this point.

Dart fixes the itch without you needing to return this. Their cascade feature gives you the same flow and ability to ditch the object name shouting at you:

querySelector(‘#confirm’) // Get an object.

  ..text = ‘Confirm’ // Use its members.

  ..classes.add(‘important’)

  ..onClick.listen((e) => window.alert(‘Confirmed!’));


// and with nesting

final addressBook = (new AddressBookBuilder()

  ..name = ‘kris’

  ..email = ‘kris@example.com’

  ..phone = (new PhoneNumberBuilder()

    ..number = ‘123–456–7899’

    ..label = ‘home’)

    .build())

  .build();

Is this all nitpicky? Do you enjoy being on the flip side once you get there?

On the one hand, the hard part isn’t the syntax. If you understand the Android lifecycle thoroughly, you are good to go… picking Kotlin will be a (fun) breeze. On the other hand, your language is your expression, and you live with it for hours a day.

As I become more curmudgeon-y with age, I am trying to remember: at least give something a try before I come to a strong held belief.

I am much happier to have modern ES6/TypeScript, Kotlin, and Dart 2.0 to play with in 2017, and I look forward to more improvements, and richer support, in 2018. Some sugar over the holidays is OK isn’t it?