Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()

From: Miquel van Smoorenburg
Date: Fri Jan 30 2004 - 11:02:29 EST


On Thu, 29 Jan 2004 07:25:21, Andrew Morton wrote:
> "Miquel van Smoorenburg" <miquels@xxxxxxxxxx> wrote:
> >
> > I have a Linux 2.6.2-rc2 NFS file server and another similar
> > box as client. Kernel is compiled for SMP (hyperthreading).
> >
> > In a few seconds, the server locks up. It spins in
> > generic_fillattr(), apparently in the i_size_read() inline function.
> > Server responds to pings and sysrq, but nothing else.
> > Here's the sysrq output:
> >
> > Pid: 531, comm: nfsd
> > EIP: 0060:[<c01577c2>] CPU: 0
> > EIP is at generic_fillattr+0x82/0xa0
> > EFLAGS: 00000246 Not tainted
> > EAX: 00000000 EBX: 07ae7200 ECX: f650a0a0 EDX: 000113eb
> > ESI: 00000000 EDI: f66cfed4 EBP: f66e5804 DS: 007b ES: 007b
> > CR0: 8005003b CR2: 40019000 CR3: 37245000 CR4: 000006d0
> > Call Trace:
> > [<f8ab1f6b>] linvfs_getattr+0x2b/0x34 [xfs]
> > [<c0157805>] vfs_getattr+0x25/0x84
> > [<c01c19a3>] encode_post_op_attr+0x53/0x238
> > [<c01c1e27>] encode_wcc_data+0x29f/0x2a8
> > [<c01c4521>] nfs3svc_encode_commitres+0x19/0x5c
> > [<c01b709d>] nfsd_dispatch+0x14d/0x1a3
> > [<c02fb79b>] svc_process+0x3b3/0x640
> > [<c01b6ddc>] nfsd+0x1e4/0x358
> > [<c01b6bf8>] nfsd+0x0/0x358
> > [<c0107251>] kernel_thread_helper+0x5/0xc
> >
>
> Is the EIP _always_ inside generic_fillattr()?
>
> If so then yes, your analysis look right. I'd say that the inode has been
> corrupted and the seqcount counter has assumed an non-even value. That
> will cause i_size_read() to lock up.

Yes, but why does it lock up the whole machine? I'd say just that one
process should go into 'R', but other things should get scheduled right?
Or is some very common lock being held? The machine responds to pings,
so interrupts are working.

In fact as this is a hyperthreaded machine, why don't I see anything
running on the "other virtual cpu" ? I only see nfsd spinning on CPU#0

Could it be a CPU problem, the other CPU-thread is in i_size_write() but
that CPU-thread is simply never run ? To test this, I rebooted the exact
same 2.6.2-rc2 kernel with "maxcpus=1" .. and it doesn't lock up ..

(Still weird that -mm doesn't lock up)

Mike.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/