Advanced Chunking Strategies

Chunking decides what the retriever can find.

Bad chunks make good models look bad.

Chunking goals

Good chunks are:

Strategy	Best for
fixed token chunks	simple baseline
recursive chunks	markdown/docs with headings
semantic chunks	topic-based boundaries
parent document retrieval	small search chunk, larger answer context
sentence window retrieval	exact sentence plus nearby context
table-aware chunking	forms, CSVs, financial docs

Store:

Without metadata, citations and debugging are weak.

Test chunking with real questions. Measure whether expected evidence appears in top results.

Q1: Why not use one giant chunk per document?

It retrieves too broadly and adds noise to the model context.

Q2: Why use parent document retrieval?

To search precisely but answer with enough surrounding context.