What is Chain of Thought? | AI & LLM Glossary

Chain of thought (CoT) is a prompting technique that encourages large language models to break down complex problems into intermediate reasoning steps before arriving at a final answer. By making the model show its work, chain of thought significantly improves accuracy on tasks that require logic, math, or multi-step reasoning.

Chain of thought prompting was popularized by a 2022 Google Research paper that demonstrated dramatic improvements in LLM performance on reasoning tasks. The key insight is simple: when a model is asked to explain its reasoning step by step, it performs much better than when asked to produce only a final answer. This works because the intermediate steps act as a form of scratchpad computation, allowing the model to build up to the correct answer incrementally.

There are several variants of chain of thought prompting. Zero-shot CoT simply adds the phrase 'Let's think step by step' to a prompt, which often triggers the model to produce reasoning chains without any examples. Few-shot CoT provides examples of problems solved with explicit reasoning steps, showing the model the expected format. More advanced techniques like Tree of Thought explore multiple reasoning paths in parallel and select the best one.

Chain of thought has become so effective that many modern models are now trained to reason this way by default. Models like OpenAI's o1 and o3 series use internal chain of thought reasoning before producing their response, spending extra computation on thinking tokens that are not always visible to the user. This represents a shift in the field toward inference-time compute scaling, where models improve by thinking longer rather than just being larger.

Despite its power, chain of thought has limitations. It increases output token count and therefore latency and cost. The reasoning steps can sometimes be unfaithful, meaning the model produces plausible-looking reasoning that does not actually reflect its decision process. And for simple tasks, CoT can actually hurt performance by overcomplicating straightforward questions.

How It Works

Present the problem with reasoning instruction

The model receives a prompt that either explicitly asks it to think step by step, includes examples of step-by-step reasoning, or is structured to naturally encourage intermediate reasoning before the final answer.

Generate intermediate reasoning steps

Instead of jumping to the answer, the model produces a series of intermediate thoughts, calculations, or logical deductions. Each step builds on the previous ones, creating a chain of reasoning that progresses toward the solution.

Arrive at the final answer

After working through the reasoning chain, the model synthesizes its intermediate steps into a final answer. Because the reasoning is explicit, errors in logic are often caught and corrected during the chain rather than propagating to the conclusion.

Verify and refine (optional)

Advanced implementations add a verification step where the model reviews its own reasoning chain for errors, inconsistencies, or missed considerations, and revises its answer if needed. This self-reflection further improves accuracy.

Examples

Math word problem solving

Without CoT, a model might incorrectly answer 'A store has 3 shelves with 8 books each, and receives 15 more. How many books total?' With CoT, the model writes: '3 shelves x 8 books = 24 books. 24 + 15 new books = 39 books total.' The explicit calculation prevents errors.

Legal document analysis

A lawyer asks an LLM whether a contract clause is enforceable. With chain of thought, the model first identifies the relevant legal principles, then analyzes each element of the clause against those principles, considers potential counterarguments, and finally provides a well-reasoned conclusion.

Debugging code errors

A developer pastes a buggy function and asks the model to find the error. Using CoT, the model traces through the code line by line, tracking variable states and identifying where the logic diverges from the expected behavior, rather than guessing the bug from pattern matching alone.

Why It Matters

Chain of thought transforms LLMs from pattern-matching systems into more capable reasoning engines. It enables models to tackle problems that were previously beyond their abilities, including complex math, multi-step logic, and nuanced analysis. For practitioners, understanding CoT is essential for getting the best results from any LLM, as the way you prompt directly impacts the quality of reasoning.

Frequently Asked Questions

Does chain of thought work with all LLMs?

Chain of thought is most effective with larger, more capable models. Smaller models may produce reasoning chains that appear logical but reach incorrect conclusions. Generally, models with at least 10-20 billion parameters show meaningful improvements from CoT, with the benefits increasing as model size grows.

What is the difference between chain of thought and prompt chaining?

Chain of thought happens within a single model response, where the model generates intermediate reasoning steps before its answer. Prompt chaining involves multiple separate LLM calls, where the output of one call becomes the input for the next. They can be used together for complex tasks.

Does chain of thought increase costs?

Yes, because chain of thought produces more output tokens for the reasoning steps. However, this cost increase is often justified by significantly improved accuracy, especially for complex tasks where incorrect answers are more expensive than additional tokens. Some applications use CoT selectively, only for questions that require complex reasoning.

What is Tree of Thought prompting?

Tree of Thought extends chain of thought by exploring multiple reasoning paths simultaneously rather than following a single chain. The model generates several possible next steps at each point, evaluates which paths are most promising, and can backtrack from dead ends. This approach is more computationally expensive but can solve problems that linear chain of thought cannot.

Monitor Chain of Thought Reasoning with Respan

Respan helps you track and analyze chain of thought reasoning in production. Monitor thinking token usage and costs, compare reasoning quality across model versions, and identify prompts where CoT improves or degrades performance to optimize your prompt engineering strategy.

Try Respan free

What is Chain of Thought? | AI & LLM Glossary

How It Works

Present the problem with reasoning instruction

Generate intermediate reasoning steps

Arrive at the final answer

Verify and refine (optional)

Examples

Math word problem solving

Legal document analysis

Debugging code errors

Why It Matters

Frequently Asked Questions

Does chain of thought work with all LLMs?

What is the difference between chain of thought and prompt chaining?

Does chain of thought increase costs?

What is Tree of Thought prompting?

Monitor Chain of Thought Reasoning with Respan

Try Respan free

What is Chain of Thought? | AI & LLM Glossary

How It Works

Examples

Why It Matters

Related Terms

Frequently Asked Questions

Monitor Chain of Thought Reasoning with Respan

What is Chain of Thought? | AI & LLM Glossary

How It Works

Examples

Why It Matters

Related Terms

Frequently Asked Questions

Monitor Chain of Thought Reasoning with Respan