Are spam filters getting better in 2026?

Filters are getting marginally better at catching mass mechanical attacks but losing ground against well-crafted spam and cold outreach. The net effectiveness has plateaued because the attacker side has access to the same AI tools as the defender side, and the attackers iterate faster against any new heuristic.

Why has filter improvement plateaued?

Content-based filters are in a structural arms race. Better defenders produce better attackers because the attacker side is paid to study the filter and route around it. Each new heuristic raises the bar slightly, but the attacker iterates and the bar comes back down. The structure of the race does not change with more compute or larger models.

Will AI eventually solve the spam problem?

AI will keep changing the volume composition. AI defenders will catch some attacks AI attackers produce. AI attackers will iterate around AI defenders. The arms race continues. The thing AI cannot do is change the underlying economic property: that reaching an additional inbox costs the sender approximately zero. That is the layer where filters cannot operate.

What is the structural alternative to better content filters?

Filters that operate on something other than content. Identity (is this sender on the recipient's guest list?) and cost (did the sender pay a small cover charge?) are structural properties the attacker cannot model their way out of. Identity-and-cost filters are deterministic: the input is the recipient's preferences and the sender's payment, not the content of the message.

How does this affect what users should do?

Run native filters as the first layer because they catch the obvious attacks well. Add structural filters that do not depend on content classification because they catch what content filters cannot. Stop expecting any single filter to solve the entire problem; the layered stack is what works.

Why Email Spam Filters Aren't Getting Better

Spam filter effectiveness has plateaued despite growing AI capability. Here is the structural reason and what actually closes the gap.

The spam filter you use today is not meaningfully better than the one you used three years ago. The improvement curve has flattened. The volume of unwanted mail keeps growing. The user experience of the inbox keeps degrading. Most explanations for this involve attacker creativity or filter design choices. The deeper explanation is structural.

This post is the honest version of why email filters are not improving the way users would hope they would, and what actually closes the gap.

The Plateau

The major email providers have invested heavily in spam filtering. Google reports that Gmail blocks roughly 99.9% of mass spam. Microsoft Defender for Office 365 has an equivalently strong story. Apple’s Mail and Yahoo’s filters do similar work. The numbers from any major provider in 2026 are roughly the same as the numbers from 2022. The improvement curve is flat at the top of the distribution.

The user experience is also flat or degrading. Inboxes feel more cluttered with cold outreach, recruiter pitches, AI-generated solicitation, and a long tail of unwanted-but-not-technically-spam mail. The mass attacks are caught; the survivors are the harder cases.

The combination is the plateau. Filters are doing roughly the same volume of work they were doing three years ago. The work the filters cannot do has grown. The user is the one who feels the gap.

Why Better Models Do Not Help

The intuitive answer to “why are filters not improving” is “they need better AI models.” The intuitive answer is wrong, and understanding why is the key insight.

Content-based filters score incoming mail against patterns. The patterns include sender features, message structure, language patterns, link patterns, and attachment signatures. Modern filters use machine learning models trained on enormous corpora of known-spam and known-not-spam.

The attacker’s job is to defeat the score. Modern attackers have access to the same AI tools the defenders have. They can generate prose that does not match known spam patterns. They can vary phrasing to avoid pattern lock-in. They can register fresh domains, configure clean authentication, and generate authentic-looking messages at industrial scale.

When the defender deploys a better model, the attacker iterates against it. The attacker has economic motivation (BEC payouts in the six figures, ransomware payouts higher) to study the deployed model and route around it. The defender has the same motivation in reverse, but the defender’s release cadence is slower than the attacker’s adaptation cadence.

The structure of the race favors the attacker for a specific reason. The defender publishes the model (or the model is reverse-engineerable through query traffic). The attacker studies the published model and adjusts. Each round of adjustment requires another round of defender response. The cycle has run for two decades, and the surviving attacks at any given time are the ones that have already adapted to the latest defender state.

This is why throwing more compute at the problem does not change the slope. Better models produce better attackers. The arms race continues, with the defender perpetually reacting to last quarter’s attacker output.

The Asymmetry

There is also an asymmetry that compounds the problem.

The defender has to be right in aggregate: the false positive rate (real mail flagged as spam) and the false negative rate (spam reaching the inbox) both need to stay low across billions of messages. The defender has a hard constraint on false positives because users get angry when real mail goes to spam.

The attacker has to be right occasionally. A campaign sending 100,000 messages succeeds if even a few thousand reach inboxes. The attacker is allowed to fail 90% of the time and still profit.

The economic asymmetry favors the attacker as long as the defense is content-based. The defender cannot afford to be wrong; the attacker only needs to be partially right. As filters get more aggressive, false positive rates climb, real mail goes to spam, users complain, and the filter has to be tuned back. The system has a natural equilibrium where the false positive rate is at the maximum users will tolerate, and the false negative rate is at whatever level the attacker can achieve given that constraint.

This equilibrium is why filters are not getting better in user-perceived ways. The system is at the operating point that the underlying tradeoff produces. Pushing for better detection produces more false positives. Pulling back from false positives produces more spam in the inbox. The frontier has not moved much in years because the tradeoff has not changed.

What Has Changed

The composition of unwanted mail has shifted, even if the filter effectiveness has not.

In 2018, the inbox volume was dominated by mass mechanical spam: Nigerian princes, malware attachments, credential-harvesting links from disposable domains. The filters caught most of it.

In 2026, the inbox volume is dominated by content-clean unsolicited mail: cold outreach, AI-generated solicitation, recruiter pitches, marketing from companies the user did not consciously sign up for. The filters cannot reliably catch this category, by design, because the content looks like legitimate mail.

The shift is qualitative as much as quantitative. The total volume of unwanted mail has grown. The cleanly-formatted unwanted mail has grown faster than the mass mechanical mail. The filters are catching the same proportion of the messages they were designed to catch, while the surviving mail looks more and more like legitimate business correspondence.

The shift means even an improving filter would have a smaller effect on user experience than it used to. The work the filter does is on the part of the distribution that was already mostly handled. The work the user actually feels (the cold outreach piling up in the inbox) is in the part of the distribution the filter cannot address.

The Layer That Is Missing

The honest read on the plateau is that content-based filters have approximately reached the limit of what their mechanism can do. Better models do not change the structure. The arms race continues. The user experience degrades because the volume in the unfilterable category keeps growing.

What is missing is a different mechanism. Not a better content filter. A filter that operates on something other than content.

Identity-and-cost filters are the structural alternative. The mechanism: check whether the sender is on the recipient’s guest list. If yes, deliver. If no, ask for a small cover charge or hold for review. The decision does not depend on content analysis. It does not iterate against attacker prose. It does not have a false positive rate the way content classifiers do.

Why this works where content filters do not: the attacker cannot model their way around an identity check. They are either a known sender or they are not. They cannot adapt their way around a cover charge. They either pay or they do not. The mechanism does not have an arms race because the inputs (recipient’s guest list, sender’s payment) are not features the attacker can manipulate by changing the message.

We covered this structural-vs-content distinction in detail in why we don’t use AI to fight AI phishing and what is economic email filtering.

The Realistic Defense Architecture

For users, the implication is that a single filter (no matter how sophisticated) is not going to keep getting better. The arms race has settled into an equilibrium that the user experiences as the volume problem.

The realistic architecture is layered, with content filters as the first pass and identity-and-cost filters as the second:

Content layer. Native Gmail or Outlook filtering, plus enterprise content scanners (Defender, Proofpoint, Mimecast) where the organization has them. This layer catches the bottom 80% by volume: mass mechanical attacks, malware, known-bad domains, formulaic deception.

Structural layer. A cover charge on unknown senders that does not depend on content analysis. This layer catches the top 20% by volume: well-crafted spam, AI-generated solicitation, cold outreach, and the long tail of unwanted-but-clean mail. The mechanism is economic, not predictive.

The two layers cover most of the surface area between them. Neither alone is sufficient. The content layer cannot improve much beyond where it is now, for structural reasons. The structural layer is where the new gains are available.

What This Means for Users

If you are running a modern email account and the inbox feels worse than it used to despite the filter doing its job, this is the explanation. The filter is doing the same volume of work it always did. The category of mail that the filter cannot address is the part that has grown. Expecting the filter to improve enough to fix the volume problem is expecting something the mechanism cannot deliver.

The structural layer is the thing that addresses the gap. Email paywalls are the consumer-scale version. Rythm implements the layer for Gmail and Outlook at $1.65 per month. The cover charge changes the economic properties of mass-volume reach, which is the layer the content filter cannot operate on.

For more on the related topics, see the hidden cost of 30 minutes per day on email triage and why am I getting so much spam.

The Bottom Line

Spam filters are not improving because the mechanism they use has approximately reached its limit. Better models do not break the arms race; they just shift the equilibrium. The volume of unwanted mail keeps growing because the senders driving the volume operate at the cost-structure layer, which content filters cannot affect.

The structural alternative exists. It is recent, it is not part of the standard 2010s email defense playbook, and it works on a different layer of the problem. Users who have been frustrated by filter plateaus for years are usually surprised at how different the inbox feels with a structural layer added. The frustration was real. It just was not addressable through the mechanism the user was expecting to fix it.

The Real Reason Email Filters Aren't Improving