AI Builder's Handbook // Part 1: Principles

Principles for building with large language models

Feb 14, 2023

Preface

My notes on building with AI quickly ballooned and I found myself with nearly enough material to write a short handbook. I thought it unwise to try fitting a handbook into a newsletter so I’ve summarized my thoughts. Treat this post as a condensed version of a hypothetical AI Builder’s Handbook–contained within are evergreen mental models that will get you the most milage out of using AI.

For the curious: I encourage you to follow my footnotes for further reading on various topics.

What to expect from this handbook series

Part 1: General principles for building with AI

Future parts:

The reasoning abilities of LLMs
A hands on demo on how to prototype in Google Sheets
And more if there’s interest

Goal of this handbook

I want to equip you with a mental model of how to build with AI. I believe it’s important for as many people to build with AI as possible. It’s a new foundational technology that will allow us to build more with less and squeeze more productivity out of the limited resources of this planet1.

Note that when I talk about building with AI in this post, I’m talking about building with LLMs. I will not be discussing in detail how to build with image diffusion models.

Building with AI is a new paradigm

Among the AI discourse, there’s this dissonance. On one side there’s much hype about AI with warnings of job displacement and headlines about how ChatGPT can pass business, law, and medical exams. On the other side we have headlines stating that ChatGPT is dumber than you think along with it’s limitations such as its inability to do math or its ability to bullshit. I think the hype and criticism is mostly a misunderstanding of what ChatGPT really is, an early demo of what’s possible using large language models.

Sam Altman @sama

ChatGPT is incredibly limited, but good enough at some things to create a misleading impression of greatness. it's a mistake to be relying on it for anything important right now. it’s a preview of progress; we have lots of work to do on robustness and truthfulness.

The reality is that you must see past the noise surrounding AI that hyperbolize it in either extremes. Once you do that, you’ll see that AI is a new magic that we can all wield. The recent advancements in AI can be compared to the introduction of the transistor. A transistor is the tiny switch found in the tiny chips that acts as the logic center in electronics. Without transistors we won’t have smartphones, the internet, or AI for that matter2. The past 70 years of tech has been built on the back of the mighty transistor and I predict the next several of decades of innovation will be built on top of LLMs.

What makes LLMs magical

Three things make modern LLMs magical. Here’s an overview of each attribute.

1/ Probabilistic not deterministic

What: A probabilistic system is like talking to a live person for customer support. A deterministic system is like pressing buttons and navigating the maze of a phone menu.

Why it is magical: Probabilistic systems can handle complexity by making predictions, while deterministic systems require builders to explicitly define every detail and users to navigate a potentially unintuitive interface. Like energy, complexity cannot be destroyed, only shifted3. Probabilistic systems shift the burden of complexity to AI, making the builder and user experience much smoother.

Implication for builders: LLMs give us the ability to build with fuzzy inputs and outputs. Building with LLMs gives us the ability to rapidly connect puzzle pieces that previously required precision.

2/ Natural language

What: Like the name suggests, large language models use natural language which makes them compatible with human reasoning.

Why it is magical: Language is perhaps the biggest innovation for humans as it allowed us to store and communicate information at increasingly larger scales. We’re able to convey more complex ideas by building up a vocabulary of words and transport that information through time and space using speech, print, and now the internet4. With LLMs we now have technology that understands language in roughly the same way we do.

Implication for builders:

Andrej Karpathy says it best.

Andrej Karpathy @karpathy

The hottest new programming language is English

3/ General availability

What: Large language models are now widely available and accessible to a general audience, including individuals and small companies.

Why it is magical: Historically, most AI systems were only available to large companies and academics. This is because most AI models could only accomplish specific tasks like sorting your social feeds and large companies had the capital and the reason to invest heavily into these specialized models. Today LLMs represents an all-purpose AI that’s available to all.

Implication for builders: LLMs are available to everyone today for free through products like ChatGPT and Zapier without extensive machine learning knowledge. For developers, you can start using the most advance models for a fraction of a penny using publicly available APIs.

How to Best Use AI

1/ Learn Prompting

“Garbage in, garbage out” is a popular saying in computer science. This is doubly true for working with LLMs. Your results will vastly improve by giving more context and clear instructions in your prompts. Think of your prompts as the lens that you would use to focus the entire knowledge base of the internet (up to 2021 of course if you’re using GPT-3), books, and other text archives. In short, bad prompts lead to bland completions, good prompts lead to great completions.

Here’s the simple framework I like to use when prompting:

Task: What are we asking the LLM to do?

Context: What additional background info does it need? Think of answering the who, when, where, and why.

Who: What role does the LLM assume? Is it a leading expert in marine biology? Is it a creative writer?
When and Where: Where will the generated completion live? Ground the situation in time and space. Will the text completion be an email? Perhaps the completion is a command that will be passed along to another step in a larger LLM-powered system.
Why: What’s the motivation behind the task? What’s the higher level problem you’re trying to solve?

Examples: Relevant pairs of prompts and responses that demonstrates to the LLM how to accomplish its task. This also helps the LLM understand how you want the completion to be formatted.

Facts: Any details that you may need to include for the LLM to reference. This represents information or data that the LLM may need to cite when preparing a response.

Another good way of providing the right information in your prompt if you’re using something like ChatGPT is to simply tell ChatGPT to ask you clarifying questions. If you start off with a vague direction, ChatGPT can follow-up with specific clarifying questions.

There’s a whole emerging field of prompt engineering and in a future chapter of AI Builder's Handbook (Abridged), we’ll explore a concrete example of refining prompts in a workspace powered by Google Sheets.

Resources for Prompt Engineering

Resources below are ordered from beginner friendly to more advanced.

The following Twitter thread is a nice illustrative way of understanding best practices for prompts

Jessica @JessicaShieh

For people who are learning prompt engineering to leverage powerful large language models (LLM) like ChatGPT, use the the "genius in a room" mental model when crafting your questions/prompts.

Learn Prompt Engineering: A great resource covering everything from the basics to advanced techniques.

LLM Company Documentation: These are official guides and documentation released by the companies that provide LLMs such as OpenAI (creator of ChatGPT).
Prompt Engineering Guide: This is a more academic list that links out to research papers.

2/ Build multi-step not single-step

The publication Every has a post that uses the analogy of flavored software that examines this phenomenon of thin wrappers among software products. There’s a similar phenomenon when it comes to AI. Until recently, the most popular demos and products using AI have been but a thin wrapper around a single LLM step such as copywriting.

I believe the most meaningful creations will come from creating LLM products and services that mimic System 2 thinking. A quick refresher and introduction to the two systems of thinking:

System 1 operates automatically and quickly, with little or no effort and no sense of voluntary control.
System 2 allocates attention to the effortful mental activities that demand it, including complex computations. The operations of System 2 are often associated with the subjective experience of agency, choice, and concentration.
– Thinking, Fast and Slow by Daniel Kahneman

System 1 is the kind of thinking that LLMs do. Even with the best prompts, I fear that System 1 thinking is not good enough for the complex problems we want AI to handle. To realize the true promise of AI-powered products, we need to build toward System 2 thinking. To achieve System 2 thinking, the secret is to create multi-step workflows that combine multiple LLM thinking steps.

Lastly, recall the transistor comparison I made earlier. In the 1950s if you made a circuit that flashes a light using a two transistors, you’d have an impressive tech demo. Fast forward to today, simple circuits won’t dazzle anyone. The latest Apple M2 chip has 20 billion transistors.

Tools for building multi-step

This sentiment is rapidly catching on among the generative AI developer community. The leading LLM library LangChain is what developers are using to build this new wave of mind-blowing AI demos. Other tools and products that help builders chain LLMs include dust.tt and Scale.ai’s Spellbook.

Through my conversations with founders in the space, it seems many are also building their own libraries and architecture in house as latency is still a big issue when building multi-step. I still think this space is nascent as an operating system layer around LLMs has yet to be created. We’re just beginning to figure out the best practices for incorporating long and long-term memory, agents, and other foundational optimizations needed to build useful LLM products. I highly recommend reading through the LangChain docs (even if you’re not technical) to get an understanding of the concepts needed to start building multi-step.

For the non-technical, tools like Zapier and even Google Sheets is enough to whip up together some interesting multi-step workflows. Ultimately, you’ll get better results with the full customization afforded by coding so depending on your needs it might be worth teaming up with a technical friend or rolling up your sleeves and learning to code. With time, no-code products dedicated to building with LLMs will pop up like brancher.ai. Keep an eye out for emerging platforms.

3/ Connect LLMs to the real world

The importance of connecting LLMs to the world dawned on me when I watched this early demo of Uminal from Subhash that was released before ChatGPT and the new Bing search.

Subhash Ramesh @subby_tech

✨ Introducing Uminal: A new way to infinitely extend LLMs with custom apps that auto-compose with each other. Here's a quick demo:

loom.comLoom | Free Screen & Video Recording SoftwareUse Loom to record quick videos of your screen and cam. Explain anything clearly and easily – and skip the meeting. An essential tool for hybrid workplaces.

With this demo, I realized connecting LLMs to the right tools and resources can resolve many of the common issues people have with using AI tools:

“It makes things up” → Give it the ability to browse the web and reference internal documentation.
“It can’t do math” → Give it a calculator and the ability to write and execute Python code.
“I still have to copy paste; this won’t scale” → Teach it to use specific APIs and connect it to your existing tools like your calendar to help you schedule events.
“It can only do text” → Connect your LLM with image models and more to make it multi-modal.

Integrations allow you to bring the superpowers of LLMs anywhere and everywhere while covering its blind spots.

A note about connecting LLMs for developers

I am personally excited about folks who are working on building a standard protocol for describing an integration to LLMs (if you’re working on this problem or know someone who is do let me know). LLMs are already smart enough to read documentation and teach itself to use an API but a standard way to introduce the tools and resources to LLMs can make everything easier.

Imagine a user manual or a readme specifically for LLMs. I believe defining a protocol in this space will get us closer to a package management system (e.g. npm, pip, maven) made for LLMs. This will unleash the power of the open source community and proliferate all the interoperable utilities, workflows, and tools needed to power AI products. A package management protocol will make LLMs even more composable and allow us to share our multi-step LLM workflows with each other. Perhaps unlocking all this may even lead to artificial general intelligence? More on that in a future post.

Next time…

Phew, I’m nearing the email length limit and may have already surpassed the attention span limit. That’s my cue to wrap up.

In the next AI Builder’s Handbook post, I’ll explore some of the reasoning abilities LLMs have and provide a mental model that you can use when delegating thinking to LLMs. For now, I’ll leave you with this tweet from Linus, an AI researcher at Notion, as motivation for why making full use of a LLM’s thinking abilities matter.

Linus @thesephist

Small rant about LLMs and how I see them being put, rather thoughtlessly IMO, into productivity tools. 📄 TL;DR — Most knowledge work isn't a text-generation task, and your product shouldn't ship an implementation detail of LLMs as the end-user interface stream.thesephist.com/updates/166861…

Share All The AI

For those concerned about climate change and our carbon footprints, it’s especially important as we should aim to get the best milage out of our pollution. I wrote a post exploring this idea on my personal blog.

Real Engineering has an eight minute video on why transistors are such as big deal.

Complexity Has to Live Somewhere is a great article about complexity in software.

I recommend the book The Information: A History, a Theory, a Flood for those who are interested in learning more about the importance of language and information theory as a whole when it comes to human progress.