Unfortunately spam has become one of the most pervasive problems in the Internet age. It seems that as spam filters improve, spam evolves along with it in order to avoid the latest blacklists and checks. Like many people spam had become a real problem for me, and I tried a number of different filters with no avail. Many of the “top” filters suffered from either letting too much spam through, or worse yet classifying real mail (“ham”) as spam where it can often get missed. However recently I’ve begun using a spam filter called Bogofilter which I’ve found to be a big step above the other filters out there.
Bayesian Spam Filtering
Bogofilter uses a statistical analysis called Bayesian filtering, first described for use with spam in Paul Graham’s A Plan For Spam. The Bayesian method looks at texts of mail known to be spam and ham. For each word found a probability is calculated that an email containing that particular word is spam. For example, words like “Viagra” and “lottery” would likely have high probabilities.
Bayesian filtering has the advantage that it can “learn” over time. The database of tokens is constantly updated every time a new email is found, improving accuracy. For example, a word like “Nigeria” might typically suggest a spam message, but if a user were actually from Nigeria, therefore receiving many legitimate emails with the term, Bogofilter would learn from this and avoid false positives.
Bogofilter in Use
I’ve found Bogofilter to be the best spam filter among all I’ve tried. It has relatively few misses (which decrease even more over time as it learns) and better yet has had extremely few false positives (mistakenly labeling ham as spam), only 2 or 3 among the tens of thousands of emails I’ve received.
It’s also very to install and setup, simply downloading and running `./configure ; make ; make install’. It works with your Mail Delivery Agent, which passes each email through bogofilter as it delivers emails. Bogofilter adds a header called “X-Bogosity” to the email along with its classification as spam or ham. Your mail client or the MDA itself can then place the message in the appropriate folder.
For example, I use Procmail as my MDA, and applying Bogofilter only required adding a few lines to the top of my .procmailrc file:
:0fw
| bogofilter -u -e -p
:0e
{ EXITCODE=75 HOST }
:0:
* ^X-Bogosity: Spam, tests=bogofilter
spam-bogofilter
The first two rules pipe each email through bogofilter before any mail filtering is done. Then those emails that Bogofilter determines are spam get placed into my spam folder.
In addition, Bogofilters learning methods are also very simple to apply, by just filtering the email through `bogofilter -s’ if it should learn as spam or `bogofilter -n’ if it should learn as ham. This means that for a mail client like mutt, I only had to add 4 lines to the mutt config file:
macro index S "|bogofilter -s\ns=junkmail" "Learn as spam and save to junk"
macro pager S "|bogofilter -s\ns=junkmail" "Learn as spam and save to junk"
macro index H "|bogofilter -n\ns=" "Learn as ham and save"
macro pager H "|bogofilter -n\ns=" "Learn as ham and save"
This means when viewing my emails I only have to press ‘S’ or ‘H’ and let Bogofilter do the rest.
In my opinion Bogofilter’s combination of simple installation, ease of use, and good results make it one of the best choices for spam filtering out there today.



