Spam Filter (SpamAssassin)

<< Click to Display Table of Contents >>

Navigation:  »No topics above this level«

Spam Filter (SpamAssassin)

About SpamAssassin

 

SpamAssassin is a mail filter which attempts to identify spam using a variety of mechanisms including text analysis, Bayesian filtering, DNS block lists and collaborative filtering databases. SpamAssassin is a project of the Apache Software Foundation (ASF) and is subject to the Apache license.

 

Exchange Server Toolbox uses a Windows adaption of SpamAssassin that was developed by JAM Software: SpamAssassin in a Box. The adaption installs the spam filter as an independent Windows service to make it usable for the anti spam functionality of Exchange Server Toolbox.

 

You can find more information on SpamAssassin at http://www.spamassassin.org/.

 

What is the power of SpamAssassin?

SpamAssassin has a modular structure. Each of these modules returns a score of examined mail messages. The sums of all points of the modules result in the "Spam score". The higher this score the more likely it is that the mail is spam. The modules for evaluation rely on different methods. The most important methods are listed here:

Tests of the mail header, like queries against the servers over which the mail was allegedly passed on, to find out whether these really exist.  

Static tests: Usually lexical investigations on the mail header and trunk up to the investigation of complete clichés.

Analysis of character sets in connection with located use.

Inquiry RBL's (Realtime Blackhole Lists): IP addresses of known spammers are published on these servers. SpamAssassin compares the addresses of incoming mails with those listed on the RBL servers.

Usage of checksum-based, distributed filtering networks (Razor): Hash values of known spam mails are stored on an central server to which SpamAssassin connects in order to identify incoming spam mail.

Inquiry of URL Blacklists: Investigation whether URLs within mail messages refer to internet sites which were already recognized as an advertising goal of spam messages.

Automatic "White listing": Putting the sender mail addresses on the Whitelist if the total score did not achieve a certain value.  

Use of a Bayes filter: A filter which evaluates the message by complex statistic algorithms.

 

Some of these methods produce a negative score like for example the Whitelist. Therefore most desired mail messages often obtain a negative total score.

All methods, so called "rules", are written text files of SpamAssassin. This means it is rather simple to add your own rule collections in additional files. The rule files are periodically updated by the SpamAssassin service in order to guarantee a maximum of filter effectiveness.

Finally the user decides whether a mail is classified as spam. In the spam rules of Exchange Server Toolbox you can set the threshold value from which a mail is classified as spam. The result of an examined message can be found in the head of the mail again.  

 

 

What is a Bayes filter and why should I care (or not)?

A Bayes filter is a statistical filter identifying spam by gathering common patterns in spam mail. A Bayes filter has to be trained properly to successful identify spam / no spam. To train a statistical filter you must offer a sample of messages that are definitely spam and a sample that is definitively no spam. The filter breaks these messages down to tokens and every possible pattern of certain tokens is evaluated as spam to a certain degree respectively as no spam to a certain degree.

The more spam-typical patterns occur in a mail, the greater is the likelihood of being a spam mail. Likewise, the more no spam-typical patterns occur in a mail, the greater the likelihood of being a no spam mail.

Training the filter by providing only spam mail as training material fails to achieve the training goal. The filter should not only learn the unwanted patterns but it must also learn the wanted patterns in a message to be of any use.

Under the right conditions a Bayes filter can be a valuable means of spam identification, especially if your mail volume is very high and you have a lot of collected training material (i.e. messages that are spam but were not recognized as spam by the rules). In this case a Bayes filter can decrease system load during spam identification.

The smaller your mail volume the less beneficial a Bayes filter will be simply because of a lack of training material. We recommended using the pre-trained Bayes filter setting provided by Exchange Server Toolbox in the beginning.

 

 

What can I do in order to increase the efficiency of the filter?  

Decreasing the threshold:

A simple method for enhancing the spam filter-ratio is changing the spam threshold in the settings of Exchange Server Toolbox. However, this actually does not improve the spam filter effectiveness. Instead it will cause SpamAssassin to mark mails with a lower (or higher, as the case may be) spam score as spam. This should be used carefully as it may result in desired mails being marked as spam.

Training the Bayes filter:

You can train the Bayes filter manually with mail not recognized by the SpamAssassin. To do this you can collect spam and no spam mail in MIME format saved as text files and select these by using the "Train Spam" and/or "Train no spam" buttons. Since the Bayes filter with activated "autolearn" function already learns all mail messages with a score over 120 as spam and below 0 as no spam this is particularly important for unrecognized spam. The Bayes filter works with words, clichés and structures which arise frequently in spam / no spam mail. Therefore, it may learn nearly as much from recognized spam as from unrecognized spam (and/or no spam).  

Add Rules:

SpamAssassin uses a variety of files containing anti spam rules (most of these based on regular expressions). These files are located in the 'share' and 'etc' folders of the SpamAssassin installation directory and will automatically be used when the spam filter function of Exchange Server Toolbox is activated. Experienced users are able to improve the effectiveness of the spam filter greatly by creating their own anti spam rules. Please consider that any modifications to the share folder will be lost as this is periodically replaced by the automatic rules update function of the spam filter. Instead the 'etc' folder should be used for this purpose. More information on how to create your own anti spam rules can be found here: https://cwiki.apache.org/confluence/display/SPAMASSASSIN/WritingRules.