Re: bug tracking question (VFS crash)

Alexander Viro (viro@math.psu.edu)
Sun, 25 Apr 1999 07:21:17 -0400 (EDT)


On Sun, 25 Apr 1999, Manfred Spraul wrote:

> Alexander Viro wrote:
> > It's nothing but a gut feeling, but I suspect that we either have fput()
> > called outside of big lock (and thus SMP races) or net/socket.c playing
> > rough. I'll look at it ~6 hours later - right now I'm going down ;-/

D'oh. I've found one place where we *definitely* have a bug that could
generate such situation, but it's on Alpha - osf_getdentries() calls
fget() and fput() outside of big lock. There are several other suspicious
places - I'm tracing the suckers right now.

> I've attached an email I sent to Alan last night.
> It contains the .config file, and one intresting ksymoops.
Where? I see ksymoops, but no .config. Could you send it?

> I think that we can exclude net/socket.c: one crash occured in
> sys_lseek(),
> and another in remove_shared_vm_struct().

Not ironclad - see how it could happen:

process A opens a file. It gets struct file foo.
after a race we've lost one on foo.f_count and freed foo (i.e.
got a dangling pointer in file table of A).
A doesn't do anything (yet).
process B opens *another* file and gets the same foo.
foo.f_count is 1 now, foo.f_dentry points to reasonable place.
A closes its file descriptor. It sees nothing wrong - foo looks sane.
B now has dangling pointer on hands. The rest is obvious.

Moral: we can't assume anything basing on a place where the shit had
finally hit the fan - actual problem might happen way before with
completely different file and different process. Nasty...

Please, send .config. I'll continue tracing, indeed. One bug is already
caught, but since the box in question is Intel it's not our sucker.
If you can find any place where ->f_count is changed outside of big lock -
you've found a bug. Neither fget() nor fput() are SMP-safe and I'm
half-tempted to add
if (current->lock_depth<0)
*NULL=0;
to both fput() and fget() and see if we'll see an OOPS (and look at the
call stack, indeed).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/