[TOOL] matchreply: make procmail aware of threads and dups

From: Tejun Heo
Date: Thu Sep 03 2009 - 04:55:49 EST


Hello,

I've been reading the rather unfortunate LAK stuff on this week's LWN
and it seems like other people are having similar problems with
duplicated messages from different mailing lists. Here's a program
I've been using for a couple of years now to solve the issue. It
builds index of Maildir folders and can match for duplicates or a
thread a message belongs to and very easy to use with procmail.

With proper procmailrc rules, this allows me not to worry about dups
(there's race window and some might escape from time to time but no
biggie) and easily follow the threads I'm interested in by simply
moving the thread to one of the folders I keep closer eye on
regardless of the actual target delivery addresses.

The attached tarball contains README but it's outdated and doesn't
explain the -d option which matches for duplicates.

Here's my .procmailrc with some explanations which I grew over the
years and am sure can be made much prettier somehow but it should show
how it can be used.

PATH=/usr/local/bin:/usr/bin/:/bin
MAILDIR=$HOME/Maildir
DEFAULT=$HOME/Maildir/
LOGFILE=$MAILDIR/log
VERBOSE=yes
MREPLY="$HOME/bin/matchreply -l $HOME/Maildir/matchreply.log -v"

#
# matchreply duplicates
#
:0 Wc
| $MREPLY -d -i waiting $MAILDIR/.Work-waiting/cur $MAILDIR/.Work-waiting/new

:0 a
.Duplicate/

:0 Wc
| $MREPLY -d -i interesting $MAILDIR/.Interesting/cur $MAILDIR/.Interesting/new

:0 a
.Duplicate/

:0 Wc
| $MREPLY -d -i inbox $MAILDIR/cur $MAILDIR/new

:0 a
.Duplicate/

#### The above rules match duplicates against folders I keep close eye
#### on. All dups are collected in Duplicate folder which I sometimes
#### open when I'm suspicious whether things are working as expected.
#### It hasn't failed me yet.

#
# deliver by threads
#
:0 Wc
| $MREPLY -i waiting $MAILDIR/.Work-waiting/cur $MAILDIR/.Work-waiting/new

:0 a
.Work-waiting/

:0 Wc
| $MREPLY -i interesting $MAILDIR/.Interesting/cur $MAILDIR/.Interesting/new

:0 a
.Interesting/

:0 Wc
| $MREPLY -i inbox $MAILDIR/cur $MAILDIR/new

:0 a
$DEFAULT

#### And the above match look for a matching thread so that a reply
#### always ends up on the thread regardless of the actual delivery
#### address. So, when I spot some interesting thread in lkml, I
#### simply move the thread to one of the above folders and any
#### further messages on that thread will be delivered to that folder
#### instead of the unholily large lkml folder.

#
# filter spam
#
:0fw: spamassassin.lock
* < 256000
| spamassassin

:0:
* ^X-Spam-Status: Yes
.Spam/

#### And only after matching for threads in the interested folders, I
#### run it through spamfilter. So, if you're a spammer and is
#### specifically targetting me, you can sent the spam as a reply to
#### the threads I was involved in. Thank you very much.

#
# Mailing list rules
#
:0
* ^(X-Mailing-List:.*(linux-.*|git)@vger\.kernel\.org|^List-Id:.*\.suse\.de|.*List-Id:.*suse\.de|.*List-Id:.*linuxdriverproject\.org)
{
#### For all the messages sent to mailing lists

#
# Deliver mails addressed to me into INBOX
#

:0
* ^(To|Cc):.*(htejun@gmail\.com|teheo@novell\.com|teheo@suse\.de|tj@kernel\.org)
$DEFAULT

#### As dups have been filtered already, anything with me on the to/cc
#### list can be delivered to my INBOX.

#
# Don't let cross-posted messages of a thread end up in
# different folders.
#

:0 Wc
| $MREPLY -d -i lide $MAILDIR/.Linux-ide/cur $MAILDIR/.Linux-ide/new

:0 a
.Duplicate/

:0 Wc
| $MREPLY -d -i lscsi $MAILDIR/.Linux-scsi/cur $MAILDIR/.Linux-scsi/new

:0 a
.Duplicate/

:0 Wc
| $MREPLY -i lide $MAILDIR/.Linux-ide/cur $MAILDIR/.Linux-ide/new

:0 a
.Linux-ide/

:0 Wc
| $MREPLY -i lscsi $MAILDIR/.Linux-scsi/cur $MAILDIR/.Linux-scsi/new

:0 a
.Linux-scsi/

#### And for the rest, dups are check and all threads are grouped
#### together depending on where they currently are. I pay more
#### attention to linux-ide, so any cross posted threads will end up
#### there instead of other mailing lists.

#
# Okay, deliver according to address
#

:0
* ^(X-Mailing-List|To|Cc):.*linux-ide@vger\.kernel\.org
.Linux-ide/

:0
* ^(X-Mailing-List|To|Cc):.*linux-scsi@vger\.kernel\.org
.Linux-scsi/

:0
* ^(X-Mailing-List|To|Cc):.*linux-kernel@vger\.kernel\.org
.LKML/

:0
$DEFAULT
}

Hope it's helpful for somebody.

Thanks.

--
tejun

Attachment: matchreplay.tar.gz
Description: GNU Zip compressed data