Re: kernel BUG at fs/buffer.c:3205 (stable 3.5.3)

From: Alexander Holler
Date: Thu Sep 27 2012 - 14:02:32 EST


Am 27.09.2012 18:20, schrieb Alexander Holler:
Am 27.09.2012 17:46, schrieb Alexander Holler:
Hello,

Am 27.09.2012 17:12, schrieb Jan Kara:
Just some thoughts about your oops:
The assertion which fails is:
BUG_ON(!list_empty(&bh->b_assoc_buffers));

Now b_assoc_buffers isn't used very much. In particular ext4 which you
seem
to be using doesn't use this list at all (except when mounted in
nojournal
mode but that doesn't seem to be your case). That would point rather
strongly at a memory corruption issue.

So if you can reproduce the oops, it might be interesting to print
bh->b_assoc_buffers.next and &bh->b_assoc_buffers.next if the list is
found
to be non-empty.

Hmm, a loose pointer would explain it all too. Especially the cases when
I just have seen wrong content in the archive without having any oops. I
try to reproduce it with

pr_info("AHO: %p %p\n", bh->b_assoc_buffers.next,
&bh->b_assoc_buffers.next);

after the BUG_ON().

Thanks for the hint. I wasn't already that far to know that
b_assoc_buffers isn't used that much.

Hmm, that doesn't look very practicable because b_assoc_buffers seems to
be used a lot here. ;)
Maybe I should have mentioned that I'm mounting the source filesystem
(root with ext4) with
nodelalloc(rw,noatime,nodelalloc,errors=remount-ro,data=ordered), and to
backup it, I'm using a bind-mount (mount -o bind / /foo) as source.

But the debug output starts very early on boot, where no bind-mount is
used:

---------------------
Sep 27 18:03:23 krabat udevd[1254]: invalid rule
'/etc/udev/rules.d/80-aho.rules:26'
Sep 27 18:03:23 krabat kernel: [ 4.562670] usb usb8: New USB device
found, idVendor=1d6b, idProduct=0001
Sep 27 18:03:23 krabat kernel: [ 4.562671] usb usb8: New USB device
strings: Mfr=3, Product=2, SerialNumber=1
Sep 27 18:03:23 krabat systemd-uaccess[1363]: Failed to apply ACL on
/dev/kvm: Operation not supported
Sep 27 18:03:23 krabat kernel: [ 4.562673] usb usb8: Product: UHCI
Host Controller
Sep 27 18:03:23 krabat kernel: [ 4.562674] usb usb8: Manufacturer:
Linux 3.5.4-00009-gfa43f23-dirty uhci_hcd
Sep 27 18:03:23 krabat kernel: [ 4.562676] usb usb8: SerialNumber:
0000:00:1d.0
Sep 27 18:03:23 krabat systemd-uaccess[1716]: Failed to apply ACL on
/dev/kvm: Operation not supported
Sep 27 18:03:23 krabat kernel: [ 4.563285] hub 8-0:1.0: USB hub found
Sep 27 18:03:23 krabat kernel: [ 4.563288] hub 8-0:1.0: 2 ports detected
Sep 27 18:03:23 krabat systemd-uaccess[2324]: Failed to apply ACL on
/dev/snd/timer: Operation not supported
Sep 27 18:03:23 krabat kernel: [ 4.563316] AHO: ffff880212e4b048
ffff880212e4b048
Sep 27 18:03:23 krabat kernel: [ 4.563318] AHO: ffff880212e4b0b0
ffff880212e4b0b0
Sep 27 18:03:23 krabat kernel: [ 4.563319] AHO: ffff880212e4b118
ffff880212e4b118
---------------------

And afterwards I see tons of those messages, so it doesn't look usable.
Anyway, I retry to repdroduce the problem without that debug line, just
to see if still can reproduce the problem with F17 as userspace (and
kernel 3.5.4 instead of 3.5.3).

After 2 successful tries in sequence, the third failed (sorry, LANG=de):

---------------------------------------------------------------
[root@krabat bind]# tar cp . | mbuffer | bzip2smp >/mnt/usb3/Krabat.Fedora17.sdb2.27.09.12.tar.bz2
in @ 33.1 MiB/s, out @ 38.3 MiB/s, 888 MiB total, buffer 95% fulltar: ./tmp/.X11-unix/X0: Socket ignoriert
in @ 0.0 KiB/s, out @ 20.8 MiB/s, 24.9 GiB total, buffer 22% full
summary: 24.9 GiByte in 19 min 53.0 sec - average of 21.4 MiB/s
[root@krabat bind]# tar djf /mnt/usb3/Krabat.Fedora17.sdb2.27.09.12.tar.bz2
./var/log/messages: Änderungszeit ist unterschiedlich
./var/log/messages: Größe ist unterschiedlich
./var/tmp/kdecache-aholler/icon-cache.kcache: Änderungszeit ist unterschiedlich
./var/tmp/kdecache-aholler/icon-cache.kcache: Unterschiedliche Inhalte
./var/tmp/kdecache-aholler/plasma_theme_oxygen.kcache: Änderungszeit ist unterschiedlich
./var/tmp/kdecache-aholler/plasma_theme_oxygen.kcache: Unterschiedliche Inhalte
./var/lib/chrony/drift: Änderungszeit ist unterschiedlich
./var/lib/chrony/drift: Unterschiedliche Inhalte
./home/aholler/.kde/share/apps/konqueror/autosave/_1.77: Änderungszeit ist unterschiedlich
./home/aholler/.kde/share/apps/konqueror/autosave/_1.77: Größe ist unterschiedlich
./home/aholler/.kde/share/apps/konqueror/konq_history: Änderungszeit ist unterschiedlich
./home/aholler/.kde/share/apps/konqueror/konq_history: Größe ist unterschiedlich
./home/aholler/.kde/share/apps/kcookiejar/cookies: Änderungszeit ist unterschiedlich
./home/aholler/thinstation_src-2.0beta2.tar.bz2: Unterschiedliche Inhalte

bzip2: Data integrity error when decompressing.
Input file = (stdin), output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

tar: Unerwartetes Dateiende im Archiv.
tar: Child returned status 2
tar: Error is not recoverable: exiting now
[root@krabat bind]#
---------------------------------------------------------------

This time without any oops, dmesg just shows some

---------------------------------------------------------------
[ 111.087356] EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: (null)
[ 672.868948] CPU4: Core temperature above threshold, cpu clock throttled (total events = 1)
[ 672.868949] CPU0: Core temperature above threshold, cpu clock throttled (total events = 1)
[ 672.869970] CPU0: Core temperature/speed normal
[ 672.869971] CPU4: Core temperature/speed normal
[ 688.285419] CPU6: Core temperature above threshold, cpu clock throttled (total events = 1)
[ 688.285421] CPU2: Core temperature above threshold, cpu clock throttled (total events = 1)
[ 688.286442] CPU2: Core temperature/speed normal
[ 688.286443] CPU6: Core temperature/speed normal
[ 698.822614] CPU3: Core temperature above threshold, cpu clock throttled (total events = 1)
[ 698.822615] CPU7: Core temperature above threshold, cpu clock throttled (total events = 1)
[ 698.824674] CPU3: Core temperature/speed normal
[ 698.824675] CPU7: Core temperature/speed normal
[ 706.979633] CPU1: Core temperature above threshold, cpu clock throttled (total events = 1)
[ 706.979635] CPU5: Core temperature above threshold, cpu clock throttled (total events = 1)
[ 706.980648] CPU1: Core temperature/speed normal
[ 706.980649] CPU5: Core temperature/speed normal
[ 899.540485] [Hardware Error]: Machine check events logged
---------------------------------------------------------------

Nothing else. Kernel is 3.5.4 userland now F17.

Regards.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/