It shouldn't be surprising that the global volume of spam is on the rise. Some estimates say that 90 percent of all mail is junk, and recently we've been bombarded with reports like, "Spam Volume Increases 35 percent in November." What is being done, and why is spam such a hard problem?
The volume of spam is definitely increasing, just look at your own mailbox for evidence. Some reports claim that 100 billion junk e-mail messages are sent on the busy days. Spam is big business, and laws passed last year made spam legal, but subject to some regulation. The battle continues, and the mass-mailing marketers, as they call themselves, aren't showing any signs of fatigue.
Bayesian filtering is one method that mail scanning software such as SpamAssassin, and even mail servers themselves, implement in an effort to distinguish spam from ham (legitimate) messages. Bayes filtering works based on a database trained to remember words in both spam and ham. The algorithm compares the probability of finding words in the message it's scanning, with the probability of finding the same word in all mail.
Bayes filters are limited, and like all other technologies the spammers have found a way to work around them. The limitation is roughly 5,000 messages, according to research and the authors of SpamAssassin, at which point diminishing returns begin. Regardless of limitations, the effectiveness of bayes analysis is questionable, since spammers normally include random words with most e-mail they send. Sometimes you're lucky enough to get a well-written poem. Unfortunately, that's not the type of mail Bayesian filtering can identify as spam. Keyword searching, however, is useful right? Chances are good that you'll never want e-mail with the word Viagra in it, so that can just be blocked. Of course the spammers are smart enough to work around that.
At first they just started sending e-mail with horrible misspellings, using symbols and numbers that look similar to the original letters. When filters started identifying and blocking those messages, highly motivated spammers put on their thinking caps. Spam is a wonderful driving factor of invention, both on the good and bad sides.
Somebody had the idea to send the "bad" content in images. At the time, current filters couldn't identify any of these messages, but that quickly improved. The technology already existed: Optical Character Recognition (OCR). If you want to scan a paper document, OCR software enables you to convert the image to editable text. It actually converts the picture of a letter into a letter, and OCR software has steadily improved over the years. So we started running OCR programs on the images in spam, and started blocking effectively again.
The evildoers very quickly started getting even more creative, as was expected. It turns out that you can put all kinds of light-colored lines and squiggles over text, and still leave it easy to read by humans. OCR doesn't fare too well against random lines through letters though. It's clear that we aren't very good at blocking spam.
So who is to blame for the increases in spam? The business of spam has also fueled another facet of organized cybercrime: botnets. Spam is frequently referred to as an Internet security risk, not because it can clog up mail servers, but because the number one use for compromised machines is to send spam.
One of the most effective mechanisms for blocking spam is through the use of real-time blacklists. These lists are updated constantly, and contain the IP addresses where spam originates. Spammers must keep moving, sending messages from many different IP addresses. What better way to do this than through the use of compromised Windows machines? Malware authors have solved very large-scale systems management issue. If you think about it, they are managing thousands and thousands of machines all over the worldmore than most IT organizations.
IRC botnets are bought and sold on the open market, and the owners can easily tell thousands of machines to begin sending spam all at once. Again, it's pretty darn clever if you think about the logistics of it all.
We're left with a world full of spam, and the effectiveness of spam detection the same as it was 5 years ago.
But spammers aren't the only ones raking in the millions. IronPort, Sophos, and many other software vendors are capitalizing on the need for better spam scanning. The blackbox solutions are sometimes better than open source alternatives, sometimes not. The real advantage is that updates are automatic and frequent; for example Sophos's PureMessage product updates itself every five minutes.
People are experimenting with DomainKeys, SPF, and similar technologies that seek to verify the senders of e-mail, but the fact remains: they make e-mail cumbersome. People need to be able to forward mail automatically, but SPF breaks that. DomainKeys cryptographically sign a message, stating that it really was "From:" a certain domain and that the message hasn't changed since it left. There are gotchas with DomainKeys, but it's the most likely solution. Unfortunately, widespread adoption is very unlikely to happen, and if it does, it'll take years for all the mail server software on the Internet to get updated.
So what do we see for the future of spam? Even if spam is made illegal, it will still exist. We can't prosecute people in other countries, and we certainly can't do anything about compromised Windows machines. The most likely course for spam is that it will stay the same as it is currently. That is, a game of catch-up for the anti-spam software writers.