Re: Kernel bug: Bad page state: related to generic symlink codeand mmap

From: Anton Altaparmakov
Date: Fri Aug 19 2005 - 10:44:42 EST


On Fri, 2005-08-19 at 15:20 +0100, Al Viro wrote:
> On Fri, Aug 19, 2005 at 12:14:48PM +0100, Anton Altaparmakov wrote:
> > There is a bug somewhere in 2.6.11.4 and I can't figure out where it is.
> > I assume it is present in older and newer kernels, too as the related
> > code hasn't changed much AFAICS and googling for "Bad page state"
> > returns rather a lot of hits relating to both older (up to 2.5.70!) and
> > newer kernels...
> >
> > Note: PLEASE do not stop reading because you read ncpfs below as I am
> > pretty sure it is not ncpfs related! And looking at google a lot of
> > people have reported such similar problems since 2.5.70 or so and they
> > were all told to go away as they have bad ram. That is impossible
> > because this happens on well over 600 workstations and several servers
> > 100% reproducible. Many different types of hardware, different makes,
> > difference age, all running smp kernels even if single cpu. You can't
> > tell me they all have bad ram. Windows works fine and Linux works fine
> > except for that one specific problem which is 100% reproducible...
> >
> > The bug only appears, but it appears 100% reproducibly when a cross
> > volume symlink on ncpfs is accessed using nautilus under gnome. I.e.
> > double click on a cross volume symlink on ncpfs in nautilus and the
> > machine locks up solid.
>
> Ugh... Could you at least tell what does nautilus attempt to do at that
> point? Something that wouldn't show up with simple ls -l <symlink> or
> cat <symlink> >/dev/null, judging by the above, but what?

One important thing I forgot to add is that the symlink must point to a
directory not a file. If it points to a file all works fine.

I tried stracing nautilus to answer your question. And this time for
the first time instead of a Bad page state I got a BUG() triggered in
fs/namei.c, the arrow below marks the spot:

void page_put_link(struct dentry *dentry, struct nameidata *nd)
{
if (!IS_ERR(nd_get_link(nd))) {
struct page *page;
page = find_get_page(dentry->d_inode->i_mapping, 0);
if (!page)
----> BUG();
kunmap(page);
page_cache_release(page);
page_cache_release(page);
}
}

Here is the BUG output:

kernel BUG at fs/namei.c:2329!
invalid operand: 0000 [#1]
SMP
Modules linked in: autofs4 nls_cp850 nls_utf8 ncpfs subfs fuse snd
soundcore speedstep_lib freq_table thermal processor fan button battery
ac nvram ipt_REJECT iptable_filter ip_tables af_packet edd joydev evdev
st video1394 ohci1394 raw1394 ieee1394 capability sg sr_mod dm_mod e1000
uhci_hcd usbcore i2c_i801 i2c_core hw_random parport_pc lp parport
reiserfs ide_cd cdrom ide_disk aic7xxx aic79xx via82cxxx ata_piix libata
piix ide_core sd_mod scsi_mod
CPU: 3
EIP: 0060:[<c0174e3e>] Tainted: G U VLI
EFLAGS: 00010246 (2.6.11.4-21.8-smp)
EIP is at page_put_link+0x3e/0x50
eax: 00000000 ebx: 00000000 ecx: d9c75654 edx: 00000000
esi: 00000000 edi: d9d30000 ebp: ffac2000 esp: d9d31eac
ds: 007b es: 007b ss: 0068
Process nautilus (pid: 21910, threadinfo=d9d30000 task=f5caf550)
Stack: d9c75654 c0171659 00000000 00004000 c2025060 00000001 d9d31f1c
dff3ec80
d9c75654 001d981f 00000003 d9eb2014 00000000 d9d30000 d9d31f1c
00000000
d9eb2000 c0171e96 d9eb2000 d9d31f1c 00000001 d9eb2000 c017211f
08542878
Call Trace:
[<c0171659>] link_path_walk+0x929/0xe80
[<c0171e96>] path_lookup+0xa6/0x1c0
[<c017211f>] __user_walk+0x2f/0x70
[<c016c6fd>] vfs_stat+0x1d/0x60
[<c016ce5f>] sys_stat64+0xf/0x30
[<c0109051>] do_syscall_trace+0xb1/0x175
[<c01040d7>] syscall_call+0x7/0xb
Code: 31 d2 8b 80 a8 00 00 00 e8 80 e6 fc ff 85 c0 89 c3 74 18 89 d8 e8
93 5e fa ff 89 d8 e8 2c 80 fd ff 89 d8 5b e9 24 80 fd ff 5b c3 <0f> 0b
19 09 76 bb 32 c0 eb de 90 8d b4 26 00 00 00 00 83 ec 28

The compressed strace output is attached. It was tracing the main
nautilus process (strace -f -F -o /strace.log -p 21910) which is where
the bug was hit as well. FWIW I attached strace to the process just
before double-clicking on the symlink in the nautilus window...

Looking at the bottom of the strace.log file it seems like two possibly
concurrent stat64() calls of the symlink were running, one of which did
not complete... (note this was done on a dual-cpu machine with
hyperthreading enabled, i.e. it appears to have four cpus).

Does this help?

Let me know if I can provide any other details... (I can provide an ssh
connection to a machine for testing purposes if required.)

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

Attachment: strace.log.bz2
Description: application/bzip