Clear enough. However, the change in question still is only an ordering
change, and one that should not make any difference what-so-ever (it
re-orders the setting of "current->state" only across things that don't
actually _care_ about the state). So we have it pinpointed to a (very
small) function, but I still don't like the fact that we haven't
pinpointed a cause..
The fact that this happens on an AMD K5 makes me wonder - does anybody
have an errata sheet for the AMD chips? There might be out-of-order
issues or something else strange going on that is K5-specific.
It's not that I don't want to apply your patch - the patch is no worse
than the current code. But I _do_ want to know why it makes any
difference for you, when it so patently shouldn't make any difference at
all..
>BTW, I checked the hung processes with kdebug. I did find that the
>buffer they where hanging on was:
>1. *NOT* locked anymore. The buffer state was 0x9, the locked mask is 0x2
> I believe.
Actually, it's bit #2, ie 0x4. But you're right: 0x9 is certainly not
locked (it's "Uptodate+Touched", which is pretty much normal).
>2. We were still in its wait queue (first on the list), but we were not the
> only one. If I recall correctly there was at least one other entry,
> if not more (I did not know yet at that point that the wait_queues
> where circular).
Maybe the bug is that something marked the buffers as not locked without
waking anything up? Then your change in ordering might make a
difference, if the buffer has been touched multiple times.
>Hmmm... Just thinking aloud here, so bear with me...
>Looking at wake_up_process() I see that it indeed writes over p->state
>in any case. What's p->next_run? When is that set?
>Could it be that p->next_run is set and about to be cleared without
>adding the task to the runqueue (in a different part of the kernel)?
No, if we forgot to add the process to the run-queue, it would still
have been marked as TASK_RUNNABLE - even though it would never have been
actually run. And you said that the stuck processes are always stuck in
disk wait according to "ps"... So wake_up_process() was never called at
all.
>One more thing... What if schedule() is being invoked with
>current->state == TASK_RUNNING ?
This is normal, and happens all the time. In particular, it happens
after any process has used up its time-slice..
>What if someone changes current->state while schedule() is running
>(that *should* not be possible, considering that most of it runs
>with interrupts disabled)?
Nothing but the task itself can ever set "state" to anything but
runnable (ie you can only wake a task up from another task, you can
never try to force it to sleep).
>>I wonder whether gcc might be changing the ordering here. That would
>>explain why your patch would make a difference..
>
>Possibly, but not in the parts where I patched. Maybe there are parts
>in sched.c which are being reordered?
The scheduling is safe - I've looked at that code after the compiler has
munged it often enough, and it's been looked at by lots of people.
>>In fact, the more I think about it, the more your patch makes sense
>>considering what gcc might do to the sources. Do you by any chance use
>>egcs or a recent snapshot of gcc?
>
>Sorry, no. gcc version 2.7.2.1 (from a recent Debian distribution).
Ok, 2.7.2.1 definitely doesn't do those kinds of re-ordering
optimizations (it would require memory alias analysis for correct code
generation, something that 2.7.x won't do). Hmm.. That leaves me with
two possibilities:
- something clears the locked state without waking people up. Do you
use "md" or anything else that plays around with buffers?
- really strange K5 bug
Can anybody else come up with any ideas?
Linus