BM25 Ranking Function: Definition, Formula, and Usage

BM25 (Best Matching 25) is a ranking function that search engines use to estimate the relevance of a document to a specific search query. Also known as Okapi BM25, it represents [TF-IDF-like retrieval functions] (Wikipedia) used in modern document retrieval.

For SEO practitioners and marketers, BM25 determines how keywords influence rankings based on their frequency, document length, and rarity across the entire web.

What is BM25?

BM25 is a "bag-of-words" retrieval function. This means it ranks documents based on the query terms appearing in them, without considering the proximity of those words to each other.

The name includes "Okapi" because it was first implemented in the [Okapi information retrieval system at London's City University in the 1980s and 1990s] (Wikipedia). It sits on a probabilistic retrieval framework developed by Stephen E. Robertson and Karen Spärck Jones. Today, [BM25 is the default similarity ranking algorithm in Elasticsearch] (Elastic), making it a cornerstone of modern search technology.

Why BM25 matters

BM25 solves several problems found in older ranking models like TF-IDF.

Prevents keyword stuffing. It uses "term frequency saturation" to ensure that repeating a keyword 100 times does not give 100 times the ranking power.
Normalizes document length. It prevents long, rambling documents from outranking concise, relevant pages simply because they contain more words.
Prioritizes rare terms. It recognizes that a match for a rare word (like "axolotl") is more important than a match for a common word (like "the").
Improving RAG workflows. Variants of the algorithm help improve recall for [short texts or chunked documents used in retrieval-augmented generation] (LangChain).

How BM25 works

The algorithm calculates a score for each document based on three main components.

1. Inverse Document Frequency (IDF)

IDF penalizes common terms and rewards rare ones. If a word appears in almost every document in a collection (like "and" or "to"), its IDF score is very low. If a word is rare, it acts as a higher multiplier for the final score.

2. Term Frequency (TF) and k1

This measures how many times a query term appears in a specific document. However, BM25 applies a "saturation" limit using a variable called $k_1$.

In absence of advanced optimization, [standard values for $k_1$ are usually chosen between 1.2 and 2.0] (Wikipedia). This parameter controls how quickly the "reward" for repeating a keyword diminishes. With a lower $k_1$, the score saturates faster.

3. Document Length and b

The algorithm compares the length of a document to the average length of all documents in the collection. A variable called $b$ controls how much this length ratio affects the score.

If $b$ is 1, the length is fully normalized; if $b$ is 0, the length is ignored. The [standard value for $b$ is 0.75] (Wikipedia) in most implementations, including Elasticsearch.

Variations

There are two primary variants used depending on the document structure:

Variation	Description	Best Use Case
BM25F	A version that accounts for document structure.	When you need to weigh anchor text or headers differently than body text.
BM25+	Addresses a deficiency where long documents matching a term are scored lower than short documents that don't match at all.	Short passages or chunks used in AI (RAG) systems.

[BM25+ adds a delta parameter with a default value of 1.0] (Wikipedia) to ensure that matched terms always contribute a positive score, regardless of document length.

Best practices

Focus on term relevance over density. Because of $k_1$ saturation, repeating a keyword beyond its natural usage provides diminishing returns.

Use BM25+ for chunked content. If you are building a retriever for a chatbot or a passage search, explicitly enable the BM25Plus variant to reduce bias against short text snippets.

Tune the b parameter for your corpus. If your content consists of very long-form guides that are comprehensive, you may need to decrease $b$ to ensure they aren't unfairly penalized for their length.

Common mistakes

Mistake: Assuming more mentions always equal a higher score. Fix: Understand that the $k_1$ variable creates an asymptote where the score eventually stops increasing, no matter how many times you use the keyword.

Mistake: Ignoring average document length. Fix: Content that is significantly longer than the average page in your niche will have its score depressed unless it contains a higher concentration of relevant terms.

Mistake: Using standard BM25 for very short documents. Fix: Use [BM25Plus with a delta of 0.5 to 1.0] (LangChain) to improve recall for short passages.

Examples

Example scenario (Saturation): A user searches for "marketing." Document A mentions "marketing" 2 times. Document B mentions "marketing" 20 times. With a standard $k_1$ of 1.2, Document B will score higher than Document A, but it will not score 10 times higher. The impact of those extra 18 mentions is heavily muffled.

Example scenario (Length Normalization): A 300-page document mentions a name once. A single tweet mentions that same name once. BM25 will score the tweet higher because the name represents a larger portion of the total content, making it more likely to be the primary topic.

BM25 vs. TF-IDF

Feature	TF-IDF	BM25
Term Frequency	Increases linearly (more words = higher score).	Saturates (diminishing returns).
Doc Length	Usually ignores length or requires manual normalization.	Built-in normalization using the $b$ parameter.
Goal	Simple term weighting.	Modern, probabilistic relevance ranking.
Risk	Highly vulnerable to keyword stuffing.	Resilient against keyword stuffing.

Rule of thumb: BM25 is almost always superior for search ranking because it mimics how humans perceive relevance, recognizing that the first few mentions of a word are more important than the 50th.

FAQ

What does the "BM" in BM25 stand for? BM stands for "Best Matching." It is the 25th iteration of the ranking formula developed for the Okapi system.

How does BM25 handle synonyms? Standard BM25 is a bag-of-words model and does not inherently understand synonyms. However, implementations like Elasticsearch have a "discount_overlaps" flag to handle synonyms if they are specified in the analyzer.

When should I use BM25 instead of Vector Search? BM25 is excellent for "keyword" or "exact" matches. While vector search is better for semantic meaning, BM25 is often more precise for specific names, product codes, or technical terms.

What is the default value for b in Elasticsearch? [By default, $b$ has a value of 0.75 in Elasticsearch] (Elastic).

Does BM25 consider word order? No. It is a bag-of-words model, meaning it looks at the frequency of terms appearing in a document but ignores the sequence or proximity of those words.