Bug in IP accounting's output (2.0.30): a nibble is missing

Marc (marcl@magic.metawire.com)
Mon, 26 May 1997 18:43:29 -0700 (PDT)


Hi,

I use IP accounting on a busy net with my interface in promiscuous mode, and
I currently use 66 rules.
They basically look all like that:
porky:~# /sbin/ipfwadm -A -l | more
IP accounting rules
pkts bytes dir prot source destination ports
39696 14M i/o all anywhere anywhere n/a
0 0 i/o all 204.80.98.0/24 anywhere n/a
0 0 i/o all anywhere 204.80.98.0/24 n/a
527 335K i/o all 204.80.99.0/24 anywhere n/a
39 2184 i/o all anywhere 204.80.99.0/24 n/a
(...)
158 6360 i/o tcp macaws95.metawire.com anywhere nntp -> any
421 226K i/o tcp anywhere macaws95.metawire.com any -> nntp
100 17758 i/o udp macaws95.metawire.com anywhere domain -> any
100 8336 i/o udp anywhere macaws95.metawire.com any -> domain
(...)

I currently dump the whole thing to logs every hour (because 32 bit counters
wrap way too fast) with the following commands:

# IP accouting flush
58 * * * * root (echo "|--||--| `date`"; ipfwadm -Alx; echo; echo) >>
/var/log/accounting/ip_accounting
59 * * * * root (echo "|--||--| `date`"; /var/local/scr/dumpproc; echo;
echo) >> /var/log/accounting/ip_accounting.dumpproc

I used to have problems with the ipfwadm dump (where an IP would be mangled
during certain hours of the day (more or less the same hours everyday)).
I contacted Jos Vos (ipfwadm's author), and he told me that it could not
possibly be his fine, bug free [tm], piece of software :-D (just kidding
here, he simply hinted that the bug pattern was most likely caused by
the kernel output).
(I was having those problems when I was launching "ipfwadm -Alxz" instead of
"ipfwadm -Alx").

I wrote, after his advise, a simple perl script that opened the proc entry
in read write mode and made a dump of it.

---
#!/usr/local/bin/perl
open(LOG, "+< /proc/net/ip_acct") || die "Can't open proc file";
@log=<LOG>;
print (join("\n",@log));
---

This allowed me to see that the data returned by the proc fs was correct as long as it was only opened in r/o mode. As soon as it is opened in r/w mode, some random problems occur: a nibble gets sometimes munged (typically it is during daytime, when there is more traffic on the network).

|--||--| Sat May 24 12:59:00 PDT 1997 IP accounting rules 00000000/00000000->00000000/00000000 - 00000000 0 0 0 3120419 1067278732 0 0 0 0 0 0 0 0 0 0 AFF X00 CC506200/FFFFFF00->00000000/00000000 - 00000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 AFF X00 00000000/00000000->CC506200/FFFFFF00 - 00000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 AFF X00 CC506300/FFFFFF00->00000000/00000000 - 00000000 0 0 0 56236 44767652 0 0 0 0 0 0 0 0 0 0 AFF X00 00000000/00000000->CC506300/FFFFFF00 - 00000000 0 0 0 688 38528 0 0 0 0 0 0 0 0 0 0 AFF X00 CC507600/FFFFFF00->00000000/00000000 - 00000000 0 0 0 414 411617 0 0 0 0 0 0 0 0 0 0 AFF X00 00000000/00000000->CC507600/FFFFFF00 - 00000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 AFF X00 CC507A00/FFFFFF00->00000000/00000000 - 00000000 0 0 0 101387 7120451 0 0 0 0 0 0 0 0 0 0 AFF X00 00000000/00000000->CC507A00/FFFFFF00 - 00000000 0 0 0 199400 124428939 0 0 0 0 0 0 0 0 0 0 AFF X00 C507B00/FFFFFF00->00000000/00000000 - 00000000 0 0 0 83349 5416768 0 0 0 0 0 0 0 0 0 0 AFF X00 ^^^^^^^ CC507B00

The ipfwadm dump (that doesn't zero counters) is correct: |--||--| Sat May 24 12:58:00 PDT 1997 IP accounting rules pkts bytes dir prot source destination ports 3075933 1052127918 i/o all anywhere anywhere n/a 0 0 i/o all 204.80.98.0/24 anywhere n/a 0 0 i/o all anywhere 204.80.98.0/24 n/a 55399 43983820 i/o all 204.80.99.0/24 anywhere n/a 685 38360 i/o all anywhere 204.80.99.0/24 n/a 414 411617 i/o all 204.80.118.0/24 anywhere n/a 0 0 i/o all anywhere 204.80.118.0/24 n/a 99947 7041561 i/o all 204.80.122.0/24 anywhere n/a 196321 122105233 i/o all anywhere 204.80.122.0/24 n/a 82100 5341565 i/o all 204.80.123.0/24 anywhere n/a ^^^^^^^^^^^^^^^

I looked at ip_fw.c to see if I could find the bug, but didn't see anything wrong offhand (it has to be in the "while(i!=NULL)" loop line 1187 though)

I now have the following problem: This is a production machine doing important logging (and now that I just dump the logs first, and then reset the counters, I don't have any problems), so I can't just play with it, try a bunch of other settings, reboot the machine every half hour and so on. Also, since I don't have much kernel programming experience, finding the bug will take me a while (I can't reproduce in on machine sitting on other nets) and my boss will get unhappy at me if I reboot the machine too often.

If someone more knowledgeable than me can have a look, please let me know. If no one can see what's wrong (because you can't reproduce the bug), and no one tells me that he/she'll have a shot at it, I'll try myself as soon as I can get back to working half time (namely 8h/day) :-)

Email me, if you need any more info, or you'd like me to try something.

Marc