Walter,
Assuming you have the first question figured out, the Bayesian filter initially learns from the existing, older-style, filters. Using MAPS RBL filters, along with keyword lists and possibly rejecting initially emails from server without revese DNS entries, will already cause SpamFilter to detect a LOT of spam. The statistical filter learn from this traffic, and is "primed" with an initial database as your traffic builds up. Once a valid "statistical sample" of a few thousand emails has been processed, the filter will be able to detect spam by itself (even though we recommend leaving the other filters in place to have better accuracy).
You most likely cannot add existing emails as (1) some emails clients, like MS Outlook, completely change the email source of the original message, and DNA fingerprinting requires an exact match of the source, and (2) by feeding only bad emails, and not the good ones that every now and then you were receiving inthe midst of the bad ones, cases all the statistical formulas to be "thrown off" and would thus yeald very inaccurate results.
Roberto F. LogSat Software
|