This chapter is for someone who has never shipped an AI product. By the end of the six sections you will know what every layer of a real production AI system does, what every term means, and how to build one. The patterns are vendor-neutral. Where a working code example needs a concrete tool, the implementations use Respan, but the patterns work the same with any other infrastructure.

A 60-second glossary

Before anything else, the terms that show up throughout:

LLM (Large Language Model): a model trained on huge amounts of text that takes input text and produces output text. GPT-4o, Claude Sonnet 4.5, Gemini 2.5 are LLMs.
Prompt: the input you send to the LLM. Usually a system message (instructions) and a user message (the actual question).
Completion / response: the text the LLM gives back.
Token: a chunk of text. LLMs charge by tokens in and tokens out.
Context window: the maximum tokens an LLM can read in one call.
Hallucination: when the LLM confidently says something that is not true.
RAG (Retrieval-Augmented Generation): fetch your own documents first, then put them into the prompt so the LLM answers based on real source material.
Tool / function call: when the LLM decides it needs to run code (look up a database, call an API) and the surrounding application runs it.
Agent: an AI system that decides its own next action in a loop. A workflow is the simpler cousin where steps are fixed.
Trace / span: a record of what the system actually did. A trace covers one full request; spans are the steps inside it.
Evaluator (eval): a way to score the quality of an LLM output, automatically.

The six sections

Each section builds on the previous one. You can read in order or jump to the skill you need.

Your first LLM call: what chat.completions actually does, and why one call is not a product.
Calling LLMs in production: route every call through one URL so you get logging, fallbacks, caching, and per-customer cost caps without rewriting your app.
Designing and managing prompts: prompts are artifacts, not strings in code. Versions, environments, A/B, rollback.
Workflows and tracing: how to compose multiple LLM calls into one system, and how to actually see what happened inside.
Measuring quality with evals: how to know if a change made things better, before customers find out.
Agents and tool use: when one call is not enough, when a workflow is enough, and when you need a real agent.

After Chapter 1

Once you understand these six skills, Chapter 2: Choose the Right Stack covers the tools you might layer on top (RAG, agent frameworks, fine-tuning, memory) and when each is actually worth adding. Chapter 3: Build a Customer Support Agent is the worked example that uses every layer in this chapter on one real product.

To set up any of the layers in this chapter, Respan ships them on a free tier. Read the docs for the full reference.

A 60-second glossary

Before anything else, the terms that show up throughout:

LLM (Large Language Model): a model trained on huge amounts of text that takes input text and produces output text. GPT-4o, Claude Sonnet 4.5, Gemini 2.5 are LLMs.

Prompt: the input you send to the LLM. Usually a system message (instructions) and a user message (the actual question).

Completion / response: the text the LLM gives back.

Token: a chunk of text. LLMs charge by tokens in and tokens out.

Context window: the maximum tokens an LLM can read in one call.

Hallucination: when the LLM confidently says something that is not true.

RAG (Retrieval-Augmented Generation): fetch your own documents first, then put them into the prompt so the LLM answers based on real source material.

Tool / function call: when the LLM decides it needs to run code (look up a database, call an API) and the surrounding application runs it.

Agent: an AI system that decides its own next action in a loop. A workflow is the simpler cousin where steps are fixed.

Trace / span: a record of what the system actually did. A trace covers one full request; spans are the steps inside it.

Evaluator (eval): a way to score the quality of an LLM output, automatically.

The six sections

Each section builds on the previous one. You can read in order or jump to the skill you need.

Your first LLM call: what chat.completions actually does, and why one call is not a product.

Calling LLMs in production: route every call through one URL so you get logging, fallbacks, caching, and per-customer cost caps without rewriting your app.

Designing and managing prompts: prompts are artifacts, not strings in code. Versions, environments, A/B, rollback.

Workflows and tracing: how to compose multiple LLM calls into one system, and how to actually see what happened inside.

Measuring quality with evals: how to know if a change made things better, before customers find out.

Agents and tool use: when one call is not enough, when a workflow is enough, and when you need a real agent.

After Chapter 1

To set up any of the layers in this chapter, Respan ships them on a free tier. Read the docs for the full reference.

How to Build AI

A 60-second glossary

The six sections

After Chapter 1

Built for AI agents.
Break less.
Ship more.

How to Build AI

A 60-second glossary

The six sections

After Chapter 1

Built for AI agents.
Break less.
Ship more.

How to Build AI

A 60-second glossary

The six sections

After Chapter 1

Built for AI agents. Break less. Ship more.

How to Build AI

A 60-second glossary

The six sections

After Chapter 1

Built for AI agents. Break less. Ship more.

Built for AI agents.
Break less.
Ship more.

Built for AI agents.
Break less.
Ship more.