The AI Product Stack in Plain English

Most people confuse the model with the product. They're not the same thing - and confusing them is how you make the wrong decisions.

8 min read

When someone says 'we're building an AI product,' they usually mean one of three very different things. They might mean they're fine-tuning a model. They might mean they're building a workflow on top of an existing model. Or they might mean they're building a user-facing experience that happens to use AI somewhere underneath. These require completely different teams, budgets, and timelines - and the founder who doesn't know which one they're doing will run out of money before they figure it out.

The AI product stack has three layers. The model layer is the raw intelligence - GPT-4, Claude, Gemini, Llama. Most founders don't need to touch this layer. The model exists. It's already smarter than anything you'll build in the next two years with a small team. Competing at the model layer requires billions of dollars and hundreds of researchers. This is not your game unless you are specifically raising money to play it.

The orchestration layer is where most real product engineering happens. This is how you retrieve the right information to give the model context, how you chain multiple model calls together to complete a complex task, how you validate and route outputs, and how you handle failures gracefully. A model that can answer any question is useless without an orchestration layer that asks the right question, in the right way, at the right time, with the right context attached.

The experience layer is what users actually touch. The interface, the response latency, the tone of the outputs, the way errors are communicated, the onboarding that teaches users what the product can and can't do. This layer is underinvested in almost every AI product built today. Founders obsess over model quality and ignore the fact that a slightly worse model with a dramatically better experience will win the market almost every time.

Model quality and product quality are not the same thing, and they don't always correlate. The differentiator is almost never the model. It's everything around the model.

TERMS

Instead of asking the model to remember everything (it can't), you retrieve relevant documents or data at query time and inject them into the model's context window. The model reasons over fresh, specific information rather than relying on what it learned during training. This is how you build AI products that know about your users, your data, and your business.

The amount of text a model can "see" at once during a single inference call. GPT-4 Turbo's context window is 128,000 tokens - roughly 100,000 words. Claude's is 200,000. Everything outside the context window doesn't exist for that call. Orchestration design is largely about deciding what deserves to be in the context window and what doesn't.

The act of running a prompt through a model to get an output. Every time a user asks your product a question, that's one inference call. Inference costs money - fractions of a cent to dollars depending on model and length. At scale, inference cost is your primary variable cost. Design your product with this in mind from day one.

The craft of structuring inputs to a model to reliably produce better outputs. Not magic, not a permanent skill moat - but genuinely important in the short term. A well-engineered prompt can reduce hallucinations, enforce output format, set tone, and constrain scope. Treat it as a product decision, not an engineering detail.

When a model generates confident, fluent, wrong information. Not a bug that will be fixed - a structural property of how language models work. Products must be designed with hallucination in mind: either the failure mode is low-stakes and acceptable, or there's a verification layer before outputs reach users. The worst AI products assume the model is always right.

BEFORE YOUR NEXT MEETING

— "What layer of the AI stack is our product actually competing at - model, orchestration, or experience?"

— "If our model provider raised prices by 10x tomorrow, what would we do? Is our moat in the model or around it?"

— "Where in our product does a hallucination cause the most damage? What's our current mitigation for that specific failure?"

— "Are we building RAG, fine-tuning, or prompt engineering our way to quality - and do we know which one our problem actually requires?"

REALITY CHECK

SOURCES

↗Andrej Karpathy — 'The Unreasonable Effectiveness of Recurrent Neural Networks'

↗Simon Willison — 'Prompt injection attacks against GPT-3'

↗Lilian Weng — 'LLM Powered Autonomous Agents'

↗OpenAI — 'GPT-4 Technical Report'

↗Anthropic — 'Core Views on AI Safety'

LESSON 01 OF 03