Re: {interruptible_,}sleep_on question

Leonard N. Zubkoff (lnz@dandelion.com)
Mon, 1 Nov 1999 15:30:35 -0800


Date: Mon, 01 Nov 1999 23:46:29 +0100
From: Manfred Spraul <manfreds@colorfullife.com>

It must drop the lock, and that's the problem: I'm quite sure that the
current code could lock up: (The window is tiny, but nevertheless it
should be closed.)

CPU1: in DAC960_ProcessRequest(), owns the io_request_lock.
CPU2: in DAC960_InterruptHandler

AcquireControllerLock: spins because the io_request_lock is blocked.

DAC960_AllocateCommand(): returns NULL, no free Command struct
spin_unlock(&io_request_lock);
<the local NMI interrupt occurs and enlarges the window between
spin_unlock() and sleep_on()>
DAC960_InteruptHandler can continue.
calls DAC960_ProcessCompletedCommand().
<at the end of this function:>
DAC960_DeallocateCommand();
wake_up(&Controller->CommandWaitQueue);
<the wait queue is empty, nothing to do>
<the local NMI returns>
sleep_on()
<adds this thread to the wait queue>
<sets current->state=TASK_UNINTERRUPTIBLE>
<sleeps>

As you'll notice, the wakeup from CPU2 is lost, the free command
structure is not used. [At least this is what I understand from
the DAC driver, please correct me if I'm wrong.]

Your analysis seems correct to me (and I assume this means that similar code in
scsi.c may have the same problem). In the case of the DAC960 driver, the
probability of this code causing an actual problem is extremely low, because
when allocation of a Command structure fails the controller is full of active
commands, and so typically 128 command completions would have to occur during
that window before sleep_on is called.

I will fix this in the next driver release. Thanks for noticing the problem
and reporting it in detail.

Leonard

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/