Chain of thought (CoT) is a prompting technique that encourages large language models to break down complex problems into intermediate reasoning steps before arriving at a final answer. By making the model show its work, chain of thought significantly improves accuracy on tasks that require logic, math, or multi-step reasoning.
Chain of thought prompting was popularized by a 2022 Google Research paper that demonstrated dramatic improvements in LLM performance on reasoning tasks. The key insight is simple: when a model is asked to explain its reasoning step by step, it performs much better than when asked to produce only a final answer. This works because the intermediate steps act as a form of scratchpad computation, allowing the model to build up to the correct answer incrementally.
There are several variants of chain of thought prompting. Zero-shot CoT simply adds the phrase 'Let's think step by step' to a prompt, which often triggers the model to produce reasoning chains without any examples. Few-shot CoT provides examples of problems solved with explicit reasoning steps, showing the model the expected format. More advanced techniques like Tree of Thought explore multiple reasoning paths in parallel and select the best one.
Chain of thought has become so effective that many modern models are now trained to reason this way by default. Models like OpenAI's o1 and o3 series use internal chain of thought reasoning before producing their response, spending extra computation on thinking tokens that are not always visible to the user. This represents a shift in the field toward inference-time compute scaling, where models improve by thinking longer rather than just being larger.
Despite its power, chain of thought has limitations. It increases output token count and therefore latency and cost. The reasoning steps can sometimes be unfaithful, meaning the model produces plausible-looking reasoning that does not actually reflect its decision process. And for simple tasks, CoT can actually hurt performance by overcomplicating straightforward questions.
The model receives a prompt that either explicitly asks it to think step by step, includes examples of step-by-step reasoning, or is structured to naturally encourage intermediate reasoning before the final answer.
Instead of jumping to the answer, the model produces a series of intermediate thoughts, calculations, or logical deductions. Each step builds on the previous ones, creating a chain of reasoning that progresses toward the solution.
After working through the reasoning chain, the model synthesizes its intermediate steps into a final answer. Because the reasoning is explicit, errors in logic are often caught and corrected during the chain rather than propagating to the conclusion.
Advanced implementations add a verification step where the model reviews its own reasoning chain for errors, inconsistencies, or missed considerations, and revises its answer if needed. This self-reflection further improves accuracy.
Without CoT, a model might incorrectly answer 'A store has 3 shelves with 8 books each, and receives 15 more. How many books total?' With CoT, the model writes: '3 shelves x 8 books = 24 books. 24 + 15 new books = 39 books total.' The explicit calculation prevents errors.
A lawyer asks an LLM whether a contract clause is enforceable. With chain of thought, the model first identifies the relevant legal principles, then analyzes each element of the clause against those principles, considers potential counterarguments, and finally provides a well-reasoned conclusion.
A developer pastes a buggy function and asks the model to find the error. Using CoT, the model traces through the code line by line, tracking variable states and identifying where the logic diverges from the expected behavior, rather than guessing the bug from pattern matching alone.
Chain of thought transforms LLMs from pattern-matching systems into more capable reasoning engines. It enables models to tackle problems that were previously beyond their abilities, including complex math, multi-step logic, and nuanced analysis. For practitioners, understanding CoT is essential for getting the best results from any LLM, as the way you prompt directly impacts the quality of reasoning.
Respan helps you track and analyze chain of thought reasoning in production. Monitor thinking token usage and costs, compare reasoning quality across model versions, and identify prompts where CoT improves or degrades performance to optimize your prompt engineering strategy.
Try Respan free