• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Dion Almaer

Software, Development, Products

  • @dalmaer
  • LinkedIn
  • Medium
  • RSS
  • Show Search
Hide Search

Development

GenAI: Lessons working with LLMs

February 14, 2023

Creativity & Constraints, Foundations & Flywheels

The developer community is buzzing around the new world of LLMs. Roadmaps for the year are getting ripped up one month in, and there is a whole lot of tinkering… and I love the smell of tinkering.

At Shopify we shared a new Winter Edition, which packaged up 100+ features for merchants and developers. Some of the launches had a lil Shopify Magic in them, using LLMs to make life better for our users.

I had a lot of fun, shipping something for developers that used LLMs, and I thought I would write about a few things that I learned going through the process of getting to shipping.

UI for mock.shop
The mock.shop homepage

What did we ship? mock.shop

We want to make it as easy as possible for developers to learn and explore commerce, by playing. We wanted to take as much friction as possible from being able to explore a commerce data model, and build a custom frontend to show off your frontend.

This is where mock.shop comes in, it sits in front of a Shopify store, but doesn’t require you to create one yourself. Just start playing with it and hitting it directly!

One thing we have heard from some developers is that they are new to GraphQL and/or new to the particulars of the commerce domain. We show examples, and the GraphQL and code examples of how to work with it, but could we go even further?

Gil seeing mock.shop

Generate query with AI

What if you could just use your words and ask us to generate the GraphQL for you? That’s exactly what we did. And here’s what we learned…

Foundations & Flywheels

We used OpenAI for this work, and when working with LLMs you are working with a black box. While GPT3 had some knowledge of GraphQL, and Shopify, it’s knowledge was out dated and often wrong. Out of the box you are working with anything that the model has sucked up, and you can’t trust this data at all.

You need to do all you can to feed the black box information so that it can come up with the best results. Given the black box, you will need to experiment and keep poking it to see if you are making it better or worse.

Here are some of the foundational things that we did:

Feed it the best input

Gather all of information that you think will nudge the model in the right direction. In our case we gathered the GraphQL schema (SDL) for the Shopify storefront APIs, and then a bunch of good examples. With these in hand, we would chunk them up and create OpenAI embeddings from them. You end up with a library of these embeddings, which are vectors that represent the chunks of text.

With these embeddings we can take user queries (eg. “Get me 7 of the most recent products”), get an embedding from that query, and then look for similar embeddings from the library that you have created. Those will contain snippets such as the schema for the products GraphQL section, and some of the good examples that work with products. We call this context and you will pass that to the OpenAI completions endpoint as part of a prompt.

Customize the prompt

You will want to play with prompts that result in the right kind of output for your use case. In our case we are looking for the black box to not just start completing with sentences, but rather give back valid GraphQL.

You end up with a prompt such as:Answer the question as truthfully as possible using the provided context, and if don’t have the answer, say “I don’t know”.\nContext:\n${context}\n\nQuestion:\nWhat is a Shopify GraphQL query, formatted with tabs, for: ${query}\n\nAnswer:

You can see how the prompt is:

  • Politely asking for the answer to be truthful
  • Nudging for the answer to be tied to the given context (from the embeddings) vs. making it up from full cloth, and saying that it’s ok to say “I don’t know”!
  • Asking for a formatted GraphQL query

One other way that we try to stop any hallucinating from the model is via setting the temperature to 0 when we make the completion call:What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

It’s quite funny to see how we do everything to try to get the model to speak the truth with this type of use case!

Feedback and Flywheels

Now it’s time for the flywheels to kick in. You want to keep feeding the context with high quality examples, sometimes show what NOT to do, play with different prompts, and start getting feedback.

You will see lots of examples where users are asked for feedback. E.g. in support systems and documentation: did this help? is it accurate? To train the model as best as possible, you can look for ways to get this information from the experts (humans!) and feed it on back, as well as simply tracking what your users are asking for and how well you are acting on those needs!

Creativity & Constraints

We have the foundations in place, and the quality of data will improve through the flywheels. Now it’s time to get more constrained. We are doing all we can to nudge for truth, but you can’t trust these things, so what guardrails should you put in place?

We really want the GraphQL that we show to be valid, so… how about we do some validation?

We take the GraphQL that comes back and we can do a couple things:

  • We would tweak it, when possible, to place valid IDs and content, for the given dataset that we have in the mock.shop instance.
  • Validate the GraphQL to make sure the syntax is correct
  • Run it against the mock.shop, since we have real IDs, and show the results to the user!

You can’t assume anything, so you often will have to have a guard step once you get results.

ChatGPT vs. Stockfish

There was a lot of hubbub when someone pit ChatGPT vs. Stockfish in a game of chess. Many used it as a way to laugh at ChatGPT. This thing is crazy! It did all kinds of invalid moves! No doy! You have to assume that and build systems to tame it… a chess engine wouldn’t allow invalid moves.

Defensive

You have to be incredibly defensive. You are poking a brain with electrodes. It comes out with amazing things, but you can’t trust everything that comes back. Making remote calls to OpenAI itself is flaky, and often goes down.

Now only will you be checking for timeouts and errors in results, but you should consider a feature flag toggle. In the case of mock.shop, the tool is usable without any of the AI features. They are progressive enhancements to the product.

We can add checks to automatically turn it off if something really bad is happening with OpenAI. Marry both:

const openAIStatusRequest = fetch("https://status.openai.com/api/v2/status.json");

and check the results for the type of incident:

openAIStatus.status.indicator === "major"

It’s incredibly fun, getting creative with how you can use the power of LLMs, which are getting better and faster all the time. The black box nature can be frustrating at times, but it’s worth it.

I hope you are having some fun tinkering!


https://polymath.almaer.com/

There are so many helpful libraries out there. I have been working with some friends on Polymath to make it simple to import and create the libraries, as well as query it all.

Generative AI: It’s Time to Get Into First Gear

January 25, 2023

Don’t sit and wait, get tinkering!

We are almost at the end of the first month of 2023, and you are working on executing on the year’s strategy, but we are witnessing an explosion, hopefully a Cambrian one, in front of our eyes… Generative AI.

I wrote about how it can be a helpful tool for us with respect to documentation and beyond and we are seeing changes every week as we learn what works and what doesn’t.

Gil making GraphQL more approachable!

We are seeing developers jump on this, playing with ideas such as commit bots, app generators, ways to generate backends, IDEs, and so much more.

First Gear? Why Now?

There is so much promise, people are already using these tools, so instead of sitting on it and being conservative, now is the time for us to jump in and get into first gear. There is always a fear of being too early into a hype cycle, but the reason I think the time is right is that you see people getting value today. I have been coding with tools like Copilot and ChatGPT and it’s helpful enough that I wouldn’t want to go back to the Before Times. Does it get everything right? No. Is it great for all of my development needs? No, it’s not as good as it should be.

Training all of my content, and being able to query it

What does it mean to get into first gear now?:

  • Be thinking about use cases that you can start trying. I have been building things such as:
    • Using embeddings to bring a chat/search interface to our docs and samples.
    • Discord bots to start answering questions
    • Super-codemods that help you upgrade, and generally help you build
  • Build small experiments that one or two people can execute on and start to validate
  • Build a core competence in the technology, so you can quickly go from ideas to experiments

In first gear you have the pedal down, and a driver is quickly accelerating. This is a technology that is changing fast, and is all about tinkering. Get tinkering.

When do we move to second gear?

You will learn so much through these experiments. What actually works, what doesn’t, and what needs more tuning and tweaking to be valuable. If things are going well and we see how efforts are impacting our key results, you can ramp up and shift more effort into this work. That would be a sign we are seeing something somewhat revolutionary.

But wait, isn’t this a fad? Are we being sheep?

Maybe it turns out that this isn’t as big of a sea change as many imagine. I have been very skeptical of recent webN hype in the past few years, and I don’t think that Gen AI is a silver bullet of any form. I am well aware that it hallucinates, and gives wacky answers at times. However, as mentioned above, I have already witnessed great value, and we have truly just started. I believe it can offer UX improvements for our developer community that are substantial. Sometimes you have to take a calculated risk. Worse case, you learn, and provide some much desired feature food along the way.

/fin

Agency developers are underrated

April 21, 2022

You hear about the developer who created Wordle, or who went on to found a large company, or contributed an open source project to the commons. You don’t often hear about the agency developer, and they are both important and often on their own journeys.

The Value of Agency

Agencies, and consultants, are out there helping make businesses a reality. They deliver expertise when it doesn’t exist in house. They quickly expand the workforce and when done right, leave employees better equipped for growth.

By working at multiple companies in a domain, they can bring learnings, just as employees do when they change companies as they journey through their career.

I have found that high quality agencies are true experts who have bet their business on your platform, understand your competition, and know what your users really want. They build true empathy on what it takes to be successful.

If a platform company doesn’t have programs that include agencies as a tier one cohort they are probably doing it wrong. Ask yourself:

  • Am I training the developers at agencies to have a great understanding of what my platform or product offers? If they are asked, or are given freedom to choose, what solution to build on… would they choose you?
  • Are these developers external advocates in the community? Is there a community for them to show their chops, be rewarded for their knowledge, and celebrated?
  • Does the business team at agencies understand your offering and are you supporting them so they can be an extended sales force for you?
  • Do you have agencies not only servicing customers directly, but also through self-service opportunities (e.g. building apps / extensions / themes)?

At Shopify for example, our agencies are a vital part of our ecosystem, working with us on a joint mission to be merchant obsessed as a way to improve commerce for all. As I have dived into the ecosystem I am constantly finding agencies who deeply understand commerce and our platform, and are at the heart of delivering for our merchants to make their experiences unique and high quality.

We often talk about learning the tech, and the product, but learning commerce is an important key, and agencies have a lot of that knowledge. And once you understand the domain, competition, and environment, opportunities are unlocked.

The Entrepreneurial Path

Many of the solo or small team entrepreneurial developers that I have met came from a past life working at a merchant or at agencies. That was the training ground for their knowledge.

I have seen some common patterns when getting to know our developers, including one very strong one:

“I worked at an agency working on commerce sites for $YEARS. I started to notice that several of our clients were asking for $FEATURE, so I decided that I would build a Shopify app that delivers the feature and enables any merchant the ability to unlock it!”

— Pat

This takes so much risk away from your app development. For one, you can do work for clients directly to prove things out, and this gives you a direct line to a customer with the clear need (else they wouldn’t pay!) Then by working with other merchants you can learn what needs to be customizable, and then when ready an app version unlocks scale. It’s nice to get paid decent money from a merchant to do guaranteed work, and it’s nice to get money whenever someone installs your app.

This is yet another example of the power of de-risking app development with Shopify.

Thank you agencies, and those of you working at them. You are at the heart of it all.


Others in the series:

  • Tech writers are underrated
  • Project managers are underrated
  • QA engineers are underrated.

Next Page »

Primary Sidebar

Twitter

My Tweets

Recent Posts

  • I have scissors all over my house
  • GenAI: Lessons working with LLMs
  • Generative AI: It’s Time to Get Into First Gear
  • Developer Docs + GenAI = ❤️
  • We keep confusing efficacy for effectiveness

Follow

  • LinkedIn
  • Medium
  • RSS
  • Twitter

Tags

3d Touch 2016 Active Recall Adaptive Design Agile Amazon Echo Android Android Development Apple Application Apps Artificial Intelligence Autocorrect blog Bots Brain Calendar Career Advice Cloud Computing Coding Cognitive Bias Commerce Communication Companies Conference Consciousness Cooking Cricket Cross Platform Deadline Delivery Design Desktop Developer Advocacy Developer Experience Developer Platform Developer Productivity Developer Relations Developers Developer Tools Development Distributed Teams Documentation DX Ecosystem Education Energy Engineering Engineering Mangement Entrepreneurship Exercise Family Fitness Founders Future GenAI Gender Equality Google Google Developer Google IO Habits Health HR Integrations JavaScript Jobs Jquery Kids Stories Kotlin Language Leadership Learning Lottery Machine Learning Management Messaging Metrics Micro Learning Microservices Microsoft Mobile Mobile App Development Mobile Apps Mobile Web Moving On NPM Open Source Organization Organization Design Pair Programming Paren Parenting Path Performance Platform Platform Thinking Politics Product Design Product Development Productivity Product Management Product Metrics Programming Progress Progressive Enhancement Progressive Web App Project Management Psychology Push Notifications pwa QA Rails React Reactive Remix Remote Working Resilience Ruby on Rails Screentime Self Improvement Service Worker Sharing Economy Shipping Shopify Short Story Silicon Valley Slack Software Software Development Spaced Repetition Speaking Startup Steve Jobs Study Teaching Team Building Tech Tech Ecosystems Technical Writing Technology Tools Transportation TV Series Twitter Typescript Uber UI Unknown User Experience User Testing UX vitals Voice Walmart Web Web Components Web Development Web Extensions Web Frameworks Web Performance Web Platform WWDC Yarn

Subscribe via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Archives

  • February 2023
  • January 2023
  • September 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • November 2021
  • August 2021
  • July 2021
  • February 2021
  • January 2021
  • May 2020
  • April 2020
  • October 2019
  • August 2019
  • July 2019
  • June 2019
  • April 2019
  • March 2019
  • January 2019
  • October 2018
  • August 2018
  • July 2018
  • May 2018
  • February 2018
  • December 2017
  • November 2017
  • September 2017
  • August 2017
  • July 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012

Search

Subscribe

RSS feed RSS - Posts

The right thing to do, is the right thing to do.

The right thing to do, is the right thing to do.

Dion Almaer

Copyright © 2023 · Log in

 

Loading Comments...