Re: [OT] Confirmation Spam Blocking was: List 'linux-dvb' closedto public posts

From: David Ford
Date: Thu Jan 22 2004 - 13:50:02 EST


I've been amusing myself once or twice a week by studying some of these emails. Due to the use of common words just like your email below, bayesian score is far too low (granting it a negative point value in SA).

The problem is that properly trained is too fluid. It'd be far more achievable if I only talked geek.. Or if I only talked automotive. Or that I only talked medical. However, my "vocabulary" is far to varied to train a bayesian filter that the use of medical terms, computer terms, or a given topic, is taboo.

It cuts the gray area far to close to the middle of the road and thus makes marking the email as probable spam useless. All I'm doing now is wasting CPU because in the end I'm doing the job of dealing with the spam myself.

Yes, I did see this. I'm not so spiteful and actively pay attention to my queue when having this type of correspondence.

David

David Lang wrote:

On Thu, 22 Jan 2004, David Ford wrote:


Considering that Bayesian filters are useless against the new spam that
is proliferating these days, that's laughable. Spam now comes with a
good 5-10K of random dictionary words.


so we need to extend the Bayesian filters to deal with multi-word combos,
how many legit mail has those dictionary words in them? properly traind
their presence should help identify the spam.

not that you will ever see this (other then through the list) as I won't
respond to your confirmation message.

David Lang


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/