Re: [OT] Confirmation Spam Blocking was: List 'linux-dvb' closed topublic posts

From: Linus Torvalds
Date: Sat Jan 24 2004 - 16:14:27 EST




On Sat, 24 Jan 2004, Kevin O'Connor wrote:
>
> A good Bayesian spam filter isn't nearly as susceptible to random words as
> some people think. Words that are likely to be spam (along with words that
> are frequently "ham") are given _exponentially_ more weight than other
> words.

Especially if the "random words" in the spam end up being weighted by real
frequency, you just _cannot_ use single-word bayes filters on it. Or if
you do, you'll eventually have those words either being neutral, or (worst
of all cases) you'll have real mail be marked as spam after having
aggressively trained the filter for the spams.

It might not be that big of a deal especially if you have a fairly narrow
scope of emails in your ham-list, but people who get mail from varied
sources _will_ get screwed by this, one way or the other.

Of course, the spam filters will catch on to other things. I find that the
DNS lookups take care of most of it, to the point where the other rules
don't even much matter.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/