

WTF is BM25? The Intuition Behind the Algorithm
❓ Why Should You Give a Toss About BM25?
BM25 is an unsung hero powering heaps of stuff you use every day – think search bars, recommendation systems, or even those fancy AI chatbots that somehow dig up the right info from a mountain of text (we're looking at you, Retrieval-Augmented Generation).
If you’re building anything that needs to sift through data – like a search tool for your blog, a customer support bot, or a system to find the juiciest research papers – BM25 is part of your secret sauce. It’s fast, it’s clever, and it’s been battle-tested for decades, quietly outranking simpler methods while the modern AI models somehow get all the Instagram likes. Understanding it means you can make sense of why some results pop to the top and others sink, whether you’re coding it yourself or just trying to impress your boss with some “I know how this works” swagger.
Plus, in a world drowning in info, knowing how to rank what matters is important. BM25 isn’t just maths – it’s the art of cutting through the noise. So, whether you’re a dev, a data geek, or just someone who hates wading through irrelevant Google results, stick around. This algo’s got lessons for us all – and it might just save your sanity when the AI Apocalypse hits and we’re all searching for “how to reboot the robot overlords.”
We will build the intuition for how BM25 works, step by step, and unravel this:
$$ \text{Score}(D, Q) = \sum_{i=1}^{n} \text{IDF}(q_i) \cdot \frac{\text{TF}(q_i, D) \cdot (k_1 + 1)}{\text{TF}(q_i, D) + k_1 \cdot (1 - b + b \cdot \frac{|D|}{\text{avgdl}})} $$
into this:
$Score = Sum\ for\ Each\ Term(\ Adjusted\ Count\ of\ Term \times Rarity\ of\ Term\ )$
🚀 Ground Control to Major Tom
Imagine you're searching the Web for "Mars exploration" among this tiny library of 5 books (we'll use the term "document" interchangeably with "book"). How on Earth (or Mars) does a search engine perform keyword search to know which documents are most relevant to your search?
📱Mobile Readers Note: If you're reading this on mobile, and any of the tables or formulas overflow the page, they will scroll if you swipe left on them like you're ducking a rando on Tinder!
Here are our books:
Book | Title | Length (pages) | "Mars" count | "exploration" count |
---|---|---|---|---|
A | Mars Exploration Guide | 50 | 8 | 6 |
B | Space Encyclopedia | 200 | 10 | 4 |
C | Planetary Science | 100 | 0 | 0 |
D | Exploration Techniques | 75 | 0 | 12 |
E | Solar System | 150 | 0 | 0 |
Left to your own considerable cleverness, how would you devise an algorithm to execute this search?
🔢 Step 1: Tonight We're Counting Words Like It's 1995 (Term Frequency)
If we were to try to solve this problem in the simplest possible way, perhaps the most basic way to determine relevance is to simply count how many times each search term appears in a document (book). This is how early attempts at search worked, by the way, and isn't too dissimilar to how search capabilities on the Web may have worked in 1995.
(Yes, I know the lyrics are "1999" ... Google was doing more sophisticated things than this by then ... just stay with me in 1995.)
Let's score each document as follows:
$Score = \text{Count of "Mars"} + \text{Count of "exploration"}$
So, the book with the highest score is the highest ranked, and so on.
Using this naive approach:
- Book A: 8 + 6 = 14 points
- Book B: 10 + 4 = 14 points
- Book C: 0 + 0 = 0 points
- Book D: 0 + 12 = 12 points
- Book E: 0 + 0 = 0 points
Good enough? Nay, I say!
🔍 The Problem: Shortcomings of Simple Term Counting
Books A and B tie with 14 points – but, mate, surely the "Mars Exploration Guide" (Book A) is more focused on our search than some general "Space Encyclopedia" (Book B)! This approach just won't do. Let's fix it!
💡 Key Insight: Simple word counting treats every word equally, which doesn't quite work well.
❗🟰 Step 2: Not All Words Are Created Equal (Inverse Document Frequency)
In our simple counting approach, we treated each occurrence of "Mars" and "exploration" as equally valuable ... and it kind of worked, until it didn't. "Mars Exploration Guide" and "Space Encyclopedia" tied, despite one clearly being more about our search query. This happened because we treated all words equally. But in searches, some words are better at targeting relevant results. "Mars" is a laser beam. "Exploration" is a flashlight. "The"? A fog machine.
This suggests that the less frequent -- i.e. more rare -- a word is, the more useful it is as a search term. That makes intuitive sense, also. Imagine flipping through a phone book (no, really -- people actually did this at a distant point in human history) looking for someone's phone number. If your search is for a fellow named "Smith", you're going to have a much harder time than if you're looking for a professionally good-looking fellow called "Zoolander"! The lesson? The rarer a search term is, the faster it narrows or filters your search.
How about we weight the counts of terms by their "rarity"? Let’s adjust our scoring:
$Score = (\text{Count of Term}) \times (\text{Rarity of Term}) + (\text{Count of Term}) \times (\text{Rarity of Term})$
This immediately fixes a big part of our problem: words that appear everywhere ("exploration") get less credit than words that are rare ("Mars"), even if they appear just as often in a document.
💡 Key Insight: The rarer a term is in the whole collection, the more useful it is for identifying relevant documents.
So how do we calculate that rarity score?
✨ Simply Rare: It's Hardly There
Again, barely taxing our considerable intellect, we could calculate the rarity score simply by expressing the rarity of a term as the percentage of documents that the term appears in. Yeah?
$$ Rarity\ Score = \frac{1}{\text{# of documents the term appears in}} $$
We could, then, simply weight the counts of each term by the rarity of each term. Tracking?
$Score = \text{Count of "Mars"} \times \text{Rarity of "Mars"} + \text{Count of "exploration"} \times \text{Rarity of "exploration"}$
So, then, we get the following scores for "Mars exploration" across each of the books in our library:
Book | Title | Mars Count | Exploration Count | Mars Rarity | Exploration Rarity | Score |
---|---|---|---|---|---|---|
A | Mars Exploration Guide | 8 | 6 | 0.5 | 0.333 | 6.0 |
B | Space Encyclopedia | 10 | 4 | 0.5 | 0.333 | 6.333 |
C | Planetary Science | 0 | 0 | 0.5 | 0.333 | 0.0 |
D | Exploration Techniques | 0 | 12 | 0.5 | 0.333 | 4.0 |
E | Solar System | 0 | 0 | 0.5 | 0.333 | 0.0 |
Whilst this, again, kind of works, it's got some fairly significant shortcomings.
🔍 The Problem: Shortcomings of the Simplified Rarity
Overvalues Rare Terms: Simplified rarity over-rewards extremely rare words, treating a term in 1 document as 10× more valuable than one in 10 docs. A word in 1 document gets a rarity score of 1, while a word in 10 documents gets a rarity score of 0.10. Similarly, a word in just 2 documents gets a rarity score of 0.5. This says that the word is half as valuable in our search, but is that really true? Is a term that appears in 2 documents really only half as useful as one that appears in 1?
Drops Off Too Quickly: As term frequency increases significantly, simplified rarity becomes very small. In large collections, simplified rarity for frequent (common) terms becomes vanishingly small and meaningless.
This simplified calculation of rarity also misses a key fact about what rarity means in terms of filtering power for search terms.
💪 Rarity = Power: Why Uncommon Words Matter More
This is the core idea behind filtering power (fancily called "discriminative power") – the ability of a term to eliminate irrelevant documents. The more irrelevant documents it can eliminate in a single step, the more power the term has.
Let’s take an example:
- A word in 50% of documents gives you 1 "unit" of filtering power. (Think about asking 1 yes/no question that perfectly eliminates half of the documents in the collection.)
- A word in 25% gives you 2 units. (Think of asking 2 yes/no questions that, each, perfectly eliminate half of the remaining documents.)
- A word in 12.5% gives you 3. (3 yes/no questions ...)
- A word in 1% gives you 6.6 units. (Woah!)
These units "filtering power" are like asking the # of units of yes/no questions all at once in order to reduce the size of the search area. How the hell did we know that the last one was 6.6 units?
Each unit of filtering power halves your remaining search space. And this halving pattern is captured perfectly by our old mate from maths: 💥the logarithm.💥 This means that a word that's 10× rarer (i.e. less frequent) isn't 10× more useful -- it's more like 2× more useful. Why? Because filtering power is logarithmic!
$$Term\ Rarity\ \text{or}\ Term\ Filtering\ Power = \log_2\left(\frac{\text{Total # of Docs}}{\text{# of Docs Containing the Term}}\right)$$
This gives us a clean measure of a term’s power to shrink the search haystack. So, we were able to calculate the filtering power of a word that only appears in 1% of the documents collection as follows:
$$Term\ Filtering\ Power = \log_2\left(\frac{1}{0.01}\right) = 6.6$$
BM25 calls this, more formally, "Inverse Document Frequency (IDF)". One way to think about IDF that you may find helpful is as follows:
$$ \text{Term Frequency} = \frac{\text{# of Docs Containing the Term}}{\text{Total # of Docs}} $$
Soooo ...
$$ \text{Inverse of Term Frequency} = \frac{\text{Total # of Docs}}{\text{# of Docs Containing the Term}} $$
Sooooooooo ... dressing this up in BM25-specific evening attire ...
$$Inverse\ Document\ Frequency\ (IDF) = \log\left(\frac{\text{Total # of Docs}}{\text{# of Docs Containing the Term}}\right)$$
Bam!
💡 Key Insight: The less frequently a word appears in the document collection, the more filtering (discriminative) power it has – and that power grows logarithmically.
What a veritable buffet of intellectual satisfaction this is proving!
Let’s apply it to our lil' library:
-
"Mars" appears in 2 of 5 books →
IDF = log₂(5 / 2) ≈ 1.32
-
"Exploration" appears in 3 of 5 books →
IDF = log₂(5 / 3) ≈ 0.74
We can rewrite our intuitive formula with some $10 words, so that:
$\text{Score} = (\text{Count of Term}_1) \times (\text{Rarity of Term}_1) + (\text{Count of Term}_2) \times (\text{Rarity of Term}_2)$
Becomes:
$Score = TF("Mars") \times IDF("Mars") + TF("exploration") \times IDF("exploration")$
🧮 It's Time for the Calculator!
(Only the real ones from Chicago get that ...)
Let's calc it down now:
Book | Title | Count of "Mars" | Rarity of "Mars" | Count of "exploration" | Rarity of "exploration" | Score |
---|---|---|---|---|---|---|
A | Mars Exploration Guide | 8 | 1.32 | 6 | 0.74 | 15.00 |
B | Space Encyclopedia | 10 | 1.32 | 4 | 0.74 | 16.17 |
C | Planetary Science | 0 | 1.32 | 0 | 0.74 | 0.00 |
D | Exploration Techniques | 0 | 1.32 | 12 | 0.74 | 8.84 |
E | Solar System | 0 | 1.32 | 0 | 0.74 | 0.00 |
🔍 The Problem: Say Less, Baby!
Our scoring is still not quite right. Book B edges out Book A because it says “Mars” more, even though it’s not focused on it. (Like so many "AI influencers" on social media ...) And Book D scores surprisingly high despite not mentioning “Mars” at all!
So even with IDF (weighting by the rarity of the term), we’re missing something crucial – document focus. Or maybe… document length?
Let’s fix that next!
🤵 Sound Smart at Dinner Parties
When "filtering power" puts on a tuxedo, it's known as "discriminative power"
🤦♂️ Step 3: The "Enough, Already – We Get It!" Approach (Diminishing Returns)
At the end of Step 2, we fixed a huge flaw: we now value rarer words more (thanks to the homie IDF). But two problems remain:
- Book B still scores higher than Book A, despite A being more focused on Mars exploration.
- Book D scores surprisingly well despite never mentioning "Mars" at all.
Recall that we've got the following expressions of BM25 thus far. In simple terms:
$Score = (Count\ of\ Term) \times (Rarity\ of\ Term) + (Count\ of\ Term) \times (Rarity\ of\ Term)$
Fine, here's the more formal version:
$Score = TF("Mars") \times IDF("Mars") + TF("exploration") \times IDF("exploration")$
Let’s dig into problem #1 first. We need to find a way to adjust how we calculate "Count of Term" -- i.e. Term Frequency (TF) -- to address this problem.
❗ The 10th “Mars” ≠ The 1st “Mars”
Book B mentions "Mars" 10 times. Book A mentions it 8 times. Our current scoring says more mentions = better score. But think about it:
- First mention of "Mars": "Okay, this is about Mars – right on!"
- Second mention: "Yup, still on about Mars. Cool."
- Tenth mention: "Enough, already – we GET IT! You're about Mars!."
💡 Key Insight: After a few mentions, additional occurrences of a term provide less and less new information about relevance.
This is what's called diminishing returns. How can we reflect the diminishing returns of additional term frequency?
✨ Keeping It Simple: The “One-And-Done” Approach
Let’s say we score a term as 1 if it shows up at all, and 0 if it doesn’t. Simple!
- If the term appears at least once → score = 1
- If not → score = 0
This implies that our simple Score formula becomes:
$Score = (1\ if\ Term\ found,\ else\ 0) \times (Rarity\ of\ Term) + (1\ if\ Term\ found,\ else\ 0) \times (Rarity\ of\ Term)$
Calc it out, now!
Book | Title | "Mars" Present | "Exploration" Present | Rarity of "Mars" | Rarity of "exploration" | Score |
---|---|---|---|---|---|---|
A | Mars Exploration Guide | 1 | 1 | 1.32 | 0.74 | 2.06 |
B | Space Encyclopedia | 1 | 1 | 1.32 | 0.74 | 2.06 |
C | Planetary Science | 0 | 0 | 1.32 | 0.74 | 0.00 |
D | Exploration Techniques | 0 | 1 | 1.32 | 0.74 | 0.74 |
E | Solar System | 0 | 0 | 1.32 | 0.74 | 0.00 |
🔍 The Problem: Too Hamfisted
This flattens everything too much – Book A and B tie again, and there's no reward for mentioning a term more than once.
We need something better: a curve that rises at first mention, but levels off with more and more mentions. This is consistent with the key insight above.
🤓 BM25's Solution: Smoothing Term Frequency with Saturation
What we want, is to revise our Score like this:
$Score = (\mathbf{Smoothed}\ Count\ of\ Term) \times (Rarity\ of\ Term) + (\mathbf{Smoothed}\ Count\ of\ Term) \times (Rarity\ of\ Term)$
BM25 uses a clever formula to model diminishing returns. It smooths the impact of each extra mention, and calls it "Saturated Term Frequency" like this:
$$ Saturated\ TF = \frac{Term\ Frequency \times (k + 1)}{Term\ Frequency + k} $$
Where k controls how fast the saturation kicks in. (Think of it like a knob for “how quickly we get tired of hearing the word.”) Oh, you're a smooth operator, BM25!
Let’s set k = 1.2, which is common in practice. We'll compute "Saturated" Term Frequency for Book A (8 mentions) and Book B (10 mentions):
-
Book A:
(8 × 2.2) / (8 + 1.2) = 17.6 / 9.2 ≈ 1.91
-
Book B:
(10 × 2.2) / (10 + 1.2) = 22 / 11.2 ≈ 1.96
Only a tiny bump in score for Book B, even with 2 more mentions. Exactly what we want!
Now we recalculate our score as follows:
$Score = Saturated\ TF(term) \times IDF(term) + Saturated\ TF(term) \times IDF(term)$
🧮 It's Time for the Calculator!
Book | Title | "Mars" Count | "Exploration" Count | Saturated TF ("Mars") | Saturated TF ("exploration") | Score |
---|---|---|---|---|---|---|
A | Mars Exploration Guide | 8 | 6 | 1.91 | 1.83 | 3.88 |
B | Space Encyclopedia | 10 | 4 | 1.96 | 1.69 | 3.84 |
C | Planetary Science | 0 | 0 | 0.00 | 0.00 | 0.00 |
D | Exploration Techniques | 0 | 12 | 0.00 | 2.00 | 1.47 |
E | Solar System | 0 | 0 | 0.00 | 0.00 | 0.00 |
Victory! Book A now edges out Book B – just like it should. The repeated mentions in Book B still count, but no longer dominate.
Book D also ranks lower than both, which makes sense – it never even mentions Mars. Now we’re cooking!
🍸 Sound Smart and Be Sexy at Cocktail Parties:
"BM25 models diminishing returns using a saturation function."
📏 Step 4: Size Really Does Matter -- But Not How You Think (Document Length Normalization)
So, we’ve handled word rarity. We’ve handled diminishing returns. But there’s still a problem lurking in our scoring.
Imagine we add Book F:
Book | Title | Length (pages) | "Mars" count | "exploration" count |
---|---|---|---|---|
F | The Everything Book | 1,000 | 15 | 12 |
This book is long. Like, Tolstoy long. It has 15 mentions of “Mars”! 12 mentions of “exploration”! No big deal -- we'll just sort it out in smooooothing, right?
Slow your roll.
A long document might mention "Mars" a bunch, even if it’s not the focus. A shorter document with fewer mentions might still be way more relevant if it’s concentrated on the topic.
💡 Key Insight: Longer documents mention everything more – not because they’re more relevant, but because they’re longer.
That means we need to adjust our scores based on document length.
🤔 The Intuition: Proportional Relevance
Take Book A and newly-added Book F (which we shall immediately drop and never consider again):
- A 1-page document that mentions “Mars” 5 times
- A 1,000-page book that mentions “Mars” 15 times
Which is more “about” Mars? The short one, obviously. It’s basically banging on about Mars every few lines.
So we want to reward concentration of terms, not just raw frequency. How could we do this?
✨ Simple, Yo!
We could just take our current scoring function, and adjust the frequency by the length of the document, right? That checks out.
Recall our simply worded score function:
$Score = (Smoothed\ Count\ of\ Term) \times (Rarity\ of\ Term) + (Smoothed\ Count\ of\ Term) \times (Rarity\ of\ Term)$
Which, when it graduates from university and gets an MBA from Wharton, becomes:
$Score = Saturated\ TF(term) \times IDF(term) + Saturated\ TF(term) \times IDF(term)$
What if we just adjusted our Smoothed Count by the length of the document thusly:
$$ Score = \frac{(Smoothed\ Count\ of\ Term)}{Document\ Length} \times (Rarity\ of\ Term) + \frac{(Smoothed\ Count\ of\ Term)}{Document\ Length} \times (Rarity\ of\ Term) $$
That would do just nicely! Let's run those calcs:
Book | Smoothed "Mars" Count | Smoothed "Exploration" Count | Document Length | Rarity of "Mars" | Rarity of "exploration" | Score |
---|---|---|---|---|---|---|
A | 1.9130 | 1.8333 | 50 | 1.32 | 0.74 | 0.0776 |
B | 1.9643 | 1.6923 | 200 | 1.32 | 0.74 | 0.0192 |
C | 0.0000 | 0.0000 | 100 | 1.32 | 0.74 | 0.0000 |
D | 0.0000 | 2.0000 | 75 | 1.32 | 0.74 | 0.0197 |
E | 0.0000 | 0.0000 | 150 | 1.32 | 0.74 | 0.0000 |
🔍 The Problem: Where Did It Go?
Our simple, adjusted score gives short documents an edge (good!) and prevents long ones from coasting on raw term count (also good!). This works (-ish) but notice that our score is now reeeaaaalllyyy small. Look, in particular, at Books A and B.
- Book A is short and focused – but its score is tiny.
- Book B is long – and it scores basically nothing, even though it’s somewhat relevant to our search.
💡 Key Problem: We’re dividing by the raw length of the document, and it’s punishing longer documents a little too aggressively.
Long documents aren't necessarily bad – they just need a fair chance to prove their focus. And short documents shouldn't get an automatic VIP pass just because they’re compact.
⚖️📏 Size Does Matter, But It's Relative
Okay, dividing by raw length was a start, but it’s a bit too harsh–short docs get VIP passes, long ones get squashed. The real trick? Make it relative to the average document length, so everyone gets a fair shake.
Here’s how BM25 rolls: instead of just slapping a length penalty on after smoothing, we bake it right into the count adjustment. Think of it as tweaking how much repetition matters based on how chatty the document is compared to the norm. We call this the "Length Tweak":
$$
Length\ Tweak = Base\ Penalty + (Length\ Factor \times \frac{Document\ Length}{Average\ Document\ Length})
$$
- Base Penalty = A number like 1.2
(same as our k
from smoothing–caps those extra mentions).
- Length Factor = A dial, say 0.9
, to decide how much length should nudge things (it’s Base Penalty × 0.75
, but don’t sweat the math yet).
See how the Length Tweak, overall, calibrates the document being compared to the average document length? We then take this Length Tweak and scale the term count up or down, based on the length of the document it was found in. So, instead of:
$$ Score = \frac{(Smoothed\ Count)}{Length} \times Rarity $$
We mix it all together like this:
$$ Adjusted\ Count = \frac{(Count\ of\ Term \times Boost)}{(Count\ of\ Term + Length\ Tweak)} $$
For now, Boost = 2.2
– it's a little kick to Count of Term which we’ll explain soon.
Note that our Adjusted Count is scaled up if the document was relatively short -- i.e. we divide Count of Term by our Length Tweak which is between 0 and 1. The Adjusted Count is scaled down if the document was relatively long -- i.e. we divide Ccount of Term by our Length Tweak, which is >= 1.
Then, what we're left with, now, is our simplified Score of:
$Score = (Adjusted\ Count\ of\ Term) \times (Rarity\ of\ Term) + (Adjusted\ Count\ of\ Term) \times (Rarity\ of\ Term)$
💡 Key Insight: Long docs don’t get slammed just for being long – they only take a hit if they’re longer than average and still blathering.
See? Size does matter, but it’s all relative! 😜 We'll bring it all together in the next section.
💡 The BM25 Reveal: Why This Works 💡
Taking it all in, then, from Steps 1 - 4, BM25 ranks documents based on search terms by calculating:
$Score = \sum_{each\ Term} [ Adjusted\ Count\ of\ Term \times Rarity\ of\ Term ]$
Where:
Adjusted Count of Term =
$$
\frac{Term\ Frequency \times (k + 1)}{Term\ Frequency + k \times ((1 - b) + b \times \frac{Document\ Length}{Average\ Document\ Length})}
$$
This takes the term’s count and gives it a little boost – called Boost and often set to 2.2
– then tones it down based on how often it repeats and how long the document is compared to the average. Why 2.2
for the Boost? Back in Step 3, we smoothed counts with a knob called k
– set to 1.2
– to chill out extra mentions. The Boost is just k + 1
, or 1.2 + 1 = 2.2
, like the high-five between Maverick and Goose on the runway, before we calm things down with diminishing returns (saturation) and length normalization, all in one go.
Rarity of Term = Inverse Document Frequency =
$$
\log\left(\frac{Total\ Docs}{Docs\ containing\ the\ term}\right)
$$
This measures how rare the term is across all documents–rarer terms get a bigger score because they’re more special.
OR, said in words that Mum will understand over Sunday dinner, for each search term in the query:
- Count how often it pops up in the document (but don’t go overboard – we give it a boost then smooth it out so a few mentions are enough).
- Multiply by how rare it is in the whole pile of books (rare words are gold).
- Tweak it based on the document’s length (short and snappy gets a lift, long and wordy gets a gentle nudge down).
- Add up the scores for all the terms you’re searching for.
🍷 Summarize BM25 at Gala Affairs
“Each term gets credit for how often it appears, how rare it is, and how focused the document is on it – with soft penalties for verbosity and overuse.”
This addresses the problems we saw along the way:
- Raw counts aren't enough – a term appearing more often doesn't always mean the document is more relevant. (See: Book B beating Book A early on)
- Some words matter more than others – rare terms provide sharper clues about relevance. (IDF to the rescue)
- Extra mentions don’t add linear value – the 10th “Mars” isn’t as informative as the 1st. (Saturation fixes that)
- Longer docs talk more – but that doesn’t mean they’re better matches. (Normalization keeps verbosity in check)
✅ BM25 Ranking of "Mars exploration" for our Baby Library
It's the last time for The Calculator! 😢 Here's how the books in our wee library rank:
Book | Title | Length | Adjusted Count (Mars) | Adjusted Count (exploration) | BM25 Score |
---|---|---|---|---|---|
A | Mars Exploration Guide | 50 | 1.83 | 1.74 | 3.71 |
B | Space Encyclopedia | 200 | 1.72 | 1.30 | 3.24 |
C | Planetary Science | 100 | 0.00 | 0.00 | 0.00 |
D | Exploration Techniques | 75 | 0.00 | 1.93 | 1.42 |
E | Solar System | 150 | 0.00 | 0.00 | 0.00 |
So, all's well that ends well. Book A -- Mars Exploration Guide -- ranked highest in our search and provided us with all of the knowhow we need to put boots on the ground on Mars when Elon gets us there.
BM25 FTW! And best of all? It’s fast. It’s robust. And it works – in RAG pipelines, search engines, recommendation systems ... you name it!
🟰 Mathing All the BM25 Maths
Here's BM25 formally expressed, so that your mates don't bully you at your next gathering.
$$ \text{Score}(D, Q) = \sum_{i=1}^{n} \text{IDF}(q_i) \cdot \frac{\text{TF}(q_i, D) \cdot (k_1 + 1)}{\text{TF}(q_i, D) + k_1 \cdot (1 - b + b \cdot \frac{|D|}{\text{avgdl}})} $$
Where:
- $D$ is the document,
- $Q$ is the query with terms $q_i$.
- $\text{TF}(q_i, D)$ is the term frequency of query term $q_i$ in document $D$.
- $k_1$ is a parameter controlling term frequency saturation (commonly 1.2).
- $b$ is a parameter controlling length normalization (commonly 0.75).
- $|D|$ is the document length, $\text{avgdl}$ is the average document length.
- $\text{IDF}(q_i) = \log\left(\frac{N - n(q_i) + 0.5}{n(q_i) + 0.5}\right)$ or a simpler variant like $\log\left(\frac{N}{n(q_i)}\right)$, where $N$ is total documents and $n(q_i)$ is documents containing $q_i$.
Be gone, maths bullies!
Feature Image Prompt:
Generate an image. The aesthetic should be cyberpunk with colors of neon pink, blue and purple. Do not add any people. Imagine a futuristic holographic interface floating in a dense digital matrix: abstract, glowing documents and cascading streams of neon equations that represent a sophisticated search ranking algorithm. Scattered holograms and digital grids display intricate formulas and logarithmic graphs, evoking the essence of BM25’s balancing of term frequency with filtering power. The scene is awash in swirling neon lights, pulsating data streams, and a high-tech, otherworldly ambiance that highlights the interplay between mathematical precision and futuristic innovation.