Re: [maybe fixed.. i hope i hope i hope] Re: snipe hunt

From: Mike Galbraith (mikeg@weiden.de)
Date: Fri May 05 2000 - 16:33:27 EST

Next message: Gary E. Miller: "Re: IDE Controllers"
Previous message: Jan Rekorajski: "Re: oops in 2.3.99pre6 w/ ATM 0.77"
In reply to: Manfred Spraul: "Re: [maybe fixed.. i hope i hope i hope] Re: snipe hunt"
Next in thread: Alexander Viro: "Re: [maybe fixed.. i hope i hope i hope] Re: snipe hunt"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, 5 May 2000, Manfred Spraul wrote:

> Mike Galbraith wrote:
> >
> >
> > Minus lock_kernel(), that's what I tried (UP box).
> > <Andrea>
> > I think that's not the correct fix though because there's probably still a
> > window for a race if you happen to increase the mmap_sem when it was just
> > zero (too late).
> > </Andrea>
> > Only relevant on SMP?
>
> lock_kernel() closea that race:
> noone calls
> current->mm=new_value;
> mmdrop(old_value);
> without lock_kernel.
>
> >
> > The troubles I'm seeing begin with a reference to a freed task_struct
> > and things go down hill rapidly from there.
> >
>
> freed or TASK_ZOMBIE?
> TASK_ZOMBIE should be ok, and get_task_struct() should prevent the
> kernel from freeing the structure.

Freed. &exit_sem is in free space per slab allocator.

> Could you add a checkpoints to get_task_struct/free_task_struct: check
> if we call get_task_struct for a already freed task.

Will do.

> Btw, which test are you using?

Brute force mod to memleak. Track all addresses instead of only the
first address in an allocated chunk. (memleak is 1/32 scale model,
so there's an allocation map entry for every 32 bytes of ram. I currently
make sure to use them all) It does it's bookkeeping under the allocator's
locks, so should be (um no, had better be) accurate. I use no locking
to do the actual check (don't dare), just take a quick peek at allocation
map. It can't see a problem if someone reallocated the space before
you check so it's kind of a shotgun approach.. some luck required.

I had to try this because slab poisoning was blowing my box completely
out of the water (scribbling on semaphores, spinlocks.. pretty scary).

> On mu Dual P/ii, I found a bug with
>
> CPU1: execlp(argv[0],argv[0],NULL); in a tight loop
> CPU2: $ cd /proc/pid; while true;do cat cmdline > /dev/null;done.
>
> As soon as I start the cat on cpu2, the execlp on cpu1 fails. No oops,
> no kernel message, errno=2 (ENOENT).

Maybe I shouldn't say anything, but before I shut up...
I'm seeing some gfp poisoning when the semaphore deadlock detector fires,
so maybe there's another such beastie lurking somewhere (I hope not). In
particular, __wake_up() finds bad wq magic and sends me into kdb. Looking
shows 6b6b6b6b (gfp poison signature). I saw the same in a couple of other
oopsen, but that was a few days ago.. one of them was a kswapd oops.
I plan on beating the living snot out of my box for the next few days to
see if it pops up anywhere else. (i hope to have a mind numbingly boring
time doing so)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Gary E. Miller: "Re: IDE Controllers"
Previous message: Jan Rekorajski: "Re: oops in 2.3.99pre6 w/ ATM 0.77"
In reply to: Manfred Spraul: "Re: [maybe fixed.. i hope i hope i hope] Re: snipe hunt"
Next in thread: Alexander Viro: "Re: [maybe fixed.. i hope i hope i hope] Re: snipe hunt"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sun May 07 2000 - 21:00:18 EST