If you build a retrieval system you have to choose: lexical (BM25, TF-IDF) or semantic (dense embeddings, vector search). For a long time people argued one would replace the other. In practice, neither does. The reason is that real user queries have both a lexical and a semantic shape, and a single signal misses half the time.

What BM25 Is Good At

BM25 (Best Matching 25) scores a document D against a query Q using term frequency, inverse document frequency, and document length normalization:

score(D, Q) = Σ IDF(qi)  (f(qi, D)  (k1 + 1)) /
              (f(qi, D) + k1  (1 - b + b  |D| / avgdl))

It is fast (a single pass over an inverted index), interpretable (you can see which terms matched), and unbeatable on queries that share exact vocabulary with the corpus.

A user asking "What is the Shapley value?" needs the document that literally contains "Shapley value." BM25 finds it instantly. Dense retrieval might also find it — but only because the embedding happened to learn that phrase's neighborhood. BM25 is a contract: it will not fail on exact terms.

What Dense Retrieval Is Good At

Dense retrieval embeds both the query and the documents into the same vector space, then retrieves by cosine similarity. A query like "how do I split profits fairly" should match a document about "Shapley value" even though no words overlap. That is exactly the kind of paraphrase BM25 cannot solve.

Dense retrieval also handles synonyms, morphological variants, and the noise of natural language. If a user types "NASH equlibrium" (typo), a good embedding model still finds "Nash equilibrium." BM25 treats that as a totally different token.

Where Each One Fails

BM25 fails when:

Dense retrieval fails when: In a chatbot knowledge base, the third case is constant. The user is often searching for a name, a theorem, a specific tool — and a 22 MB MiniLM that was trained on web text has no idea what "Shapley value" is as a named entity even if it can paraphrase it.

Fusing the Two

ReLU.chat runs both signals in parallel and fuses them into a single score per document. The signal layer produces 25 features, but the two main ones are:

The fusion is a weighted sum, with weights tuned via the policy network (see our RL post):
final = w_bm25  normalize(bm25_score) + w_dense  normalize(dense_score)

Where w_bm25 + w_dense ≈ 1 and the policy decides per-query how to split them. For a query that looks lexical ("What is the formula for X?"), the policy weights BM25 higher. For a paraphrase ("how do I split fairly"), it weights dense higher.

A Concrete Example

Query: "auction second price"

Query: "how do I split profits fairly among players"

What This Costs

Almost nothing. BM25 is a single inverted-index scan — microseconds. The embedding is the expensive part, and we already compute it for dense retrieval. Adding BM25 adds <1 ms to retrieval.

The bookkeeping is the real cost: you need an inverted index, IDF precomputation, length normalization, and a serialization format. We precompute the BM25 stats at build time and ship them as a JSON file (~30 KB for our knowledge base).

When to Use What

Use BM25 only when:

Use dense only when: Use both when: ReLU.chat uses both. It is the right default.