What is Chunking? | AI & LLM Glossary

Chunking is the process of splitting large documents or text into smaller, semantically meaningful segments for storage, retrieval, and processing by AI systems. It is a critical step in retrieval-augmented generation (RAG) pipelines and vector search systems, where the quality of chunks directly impacts the relevance and accuracy of results.

Large language models have finite context windows, and vector databases work best with focused, coherent pieces of text. Chunking bridges the gap between large source documents (which may be thousands of pages) and the smaller text segments that AI systems can effectively process and retrieve. The goal is to create chunks that are small enough to be specific and retrievable, but large enough to contain sufficient context to be useful.

The simplest approach to chunking splits text at fixed character or token counts. While easy to implement, this naive method often breaks sentences, separates related paragraphs, and creates chunks that lack semantic coherence. More sophisticated strategies include splitting by sentences, paragraphs, or document sections; using recursive splitting that tries multiple separators in order; and semantic chunking that uses embeddings to identify natural topic boundaries.

Chunk size is one of the most impactful parameters in a RAG system. Smaller chunks (100-200 tokens) are more precise for retrieval but may lack context. Larger chunks (500-1000 tokens) provide more context but may include irrelevant information that dilutes the embedding and reduces retrieval accuracy. Many systems use chunk overlap, where adjacent chunks share some text, to prevent information from being lost at chunk boundaries.

Advanced chunking strategies adapt to the structure of the source document. Code files can be chunked by functions or classes. Legal documents can be chunked by sections and subsections. Tables and structured data require special handling. The best chunking strategy depends on the type of content, the retrieval use case, and the downstream model's capabilities.

How It Works

Analyze document structure

Before chunking, the system examines the document to identify its structure: headings, paragraphs, lists, code blocks, tables, and other elements. This structural information guides where to place chunk boundaries for maximum semantic coherence.

Apply chunking strategy

The chosen strategy (fixed-size, sentence-based, recursive, or semantic) splits the document into segments. Parameters like chunk size, overlap percentage, and minimum chunk size are configured based on the use case.

Enrich chunks with metadata

Each chunk is annotated with metadata such as the source document, page number, section heading, and position within the document. This metadata enables more precise retrieval and helps provide context when chunks are retrieved.

Generate embeddings and store

Each chunk is converted into a vector embedding and stored in a vector database along with the original text and metadata. The embeddings enable semantic search, allowing retrieval of chunks based on meaning rather than just keyword matching.

Examples

Building a documentation Q&A system

A software company chunks their product documentation by section headings, creating one chunk per documentation section. They add 50-token overlap between chunks and include the section title as metadata, enabling their RAG system to retrieve precise answers to user questions about specific features.

Legal contract analysis

A legal tech startup chunks contracts by clauses and sub-clauses rather than by arbitrary character counts. Each chunk preserves the clause number and hierarchy as metadata, allowing lawyers to ask questions and receive answers that reference specific contract provisions with their locations.

Codebase search for developers

A developer tool chunks source code files by functions and classes, preserving import statements and docstrings with each chunk. When a developer searches for how to implement a feature, the system retrieves complete, usable code examples rather than broken fragments.

Why It Matters

Chunking quality is often the single biggest factor determining the success or failure of a RAG system. Poor chunking leads to irrelevant retrieval, incomplete answers, and hallucinated responses. Getting chunking right means the right information reaches the model at the right time, directly improving the accuracy and usefulness of AI-powered applications.

Frequently Asked Questions

What is the ideal chunk size for RAG?

There is no universal ideal chunk size; it depends on your content and use case. A common starting point is 256-512 tokens with 10-20% overlap. Shorter chunks work better for precise factual retrieval, while longer chunks suit questions requiring broader context. The best approach is to experiment with different sizes and measure retrieval quality on your specific data.

What is chunk overlap and why is it used?

Chunk overlap means that adjacent chunks share some text at their boundaries. For example, with 10% overlap, the last 50 tokens of one chunk are also the first 50 tokens of the next chunk. This prevents important information that spans a chunk boundary from being lost and improves retrieval for queries that relate to content near chunk edges.

What is the difference between fixed-size and semantic chunking?

Fixed-size chunking splits text at regular intervals (every N characters or tokens), regardless of content. Semantic chunking uses the meaning of the text to find natural boundaries, such as topic changes. Semantic chunking produces more coherent chunks but is more complex and computationally expensive to implement.

How do I handle tables and images when chunking?

Tables should generally be kept intact as single chunks, even if they exceed your normal chunk size, because splitting a table breaks its structure. Include the table caption and any surrounding context. For images, extract and chunk any associated text (captions, alt text, OCR results) while maintaining a reference to the source image.

Optimize Chunking Quality with Respan

Respan helps you evaluate and optimize your chunking strategy by tracking retrieval relevance, monitoring which chunks are actually used by the model, and identifying cases where poor chunking leads to incomplete or incorrect answers. Compare different chunk sizes and strategies with real production data.

Try Respan free

What is Chunking? | AI & LLM Glossary

How It Works

Analyze document structure

Apply chunking strategy

Enrich chunks with metadata

Generate embeddings and store

Examples

Building a documentation Q&A system

Legal contract analysis

Codebase search for developers

Why It Matters

Frequently Asked Questions

What is the ideal chunk size for RAG?

What is chunk overlap and why is it used?

What is the difference between fixed-size and semantic chunking?

How do I handle tables and images when chunking?

Optimize Chunking Quality with Respan

Try Respan free

What is Chunking? | AI & LLM Glossary

How It Works

Examples

Why It Matters

Related Terms

Frequently Asked Questions

Optimize Chunking Quality with Respan

What is Chunking? | AI & LLM Glossary

How It Works

Examples

Why It Matters

Related Terms

Frequently Asked Questions

Optimize Chunking Quality with Respan