This sounds like a plausible explanation of what is happening.
I was unable to test with the IKD patch, which reduced my
attempts to fixing this to guessing, mainly.
> In the path called by down(), I can see the run_task_queue() call,
> which is called by schedule() before the release_kernel_lock() call.
> This call, for example, can call unplug_device() for another IDE
> interface which has queued requests, and there we would do an __sti()
> in the ide_do_request() function.. Does moving the run_task_queue()
> call after the release_kernel_lock() call makes any difference with
> the original ide_do_drive_cmd()?
No, it still deadlocks when releasing the kernel lock before
run_task_queue(). I found out the hard way, again!
> BTW, our attempt to minimize the race potential between down() and
> up() is probably no longer needed with current kernels, as the semaphores
> are SMP and interrupt safe. The important part is to install the
> semaphore in the request before grabbing the io_request_lock and adding
> it to the queue, and we should safely be able to remove the save_flags(),
> cli() (both global and local) and restore_flags() around the call to
> down(), and let the standard kernel up()/down() functions handle the
> race between them.
This fixed the problem beatifully! I stressed ide_do_drive_cmd()
vigorously (4 drives active, ripping audio and playing video
CD's) for about an hour without a hitch. This still doesn't
explain why ide_do_drive_cmd_wait() works well too - I haven't
looked at the code generated, though. So could we just loose
the save_flags/cli/restore_flags in ide_do_drive_cmd()?
> On another point, the use of:
>
> struct request *cur_rq = drive->queue;
>
> in the rewritten function below outside of the io_request_lock()
> part is dangerous, as an interrupt could occur just between the
> above line and the spin_lock_irqsave(&io_request_lock, flags); line
> and change cur_rq from under us.
Good point. The function was just a desperate measure to attempt
to get _some_ debugging info out of this. Not even a hooked up
serial console could help.
-- * Jens Axboe <axboe@image.dk> * Linux CD-ROM Maintainer * http://www.kernel.dk- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/