Re: Hang in wait_on_inode with SMP 2.1.87

Steve Hsieh (steveh@eecs.umich.edu)
Mon, 23 Feb 1998 04:36:47 -0500 (EST)


Hi Linus,

The patch below which you suggest I try doesn't make any difference...
I am using 2.1.88 now, and with that additional line a disk-intensive
cp continues to hang.

root@ord:/# cp -a /usr /mnt/u1 &
<cp starts, but then gets stuck after copying some files...>
root@ord:/# ps -aux | grep cp
root 209 5.8 0.3 1296 812 p0 D 04:26 0:05 cp -a /usr /mnt/u1
root 237 0.0 0.1 860 364 p0 S 04:28 0:00 grep cp
root@ord:/# kill 209
root@ord:/# ps -aux | grep cp
root 209 5.5 0.3 1296 812 p0 D 04:26 0:05 cp -a /usr /mnt/u1
root 239 0.0 0.1 860 364 p0 S 04:28 0:00 grep cp
root@ord:/# kill -9 209
root@ord:/# ps -aux | grep cp
root 209 5.1 0.3 1296 812 p0 D 04:26 0:05 cp -a /usr /mnt/u1
root 241 0.0 0.1 860 364 p0 S 04:28 0:00 grep cp
root@ord:/# ps -axlw | grep cp
100 0 209 179 0 0 1296 812 wait_on_pa D p0 0:05 cp -a /usr /mnt/u1

The only hint that appears in the log file is

Feb 23 04:27:20 ord kernel: scsi0: CMDCMPLT without command for SCB 2, QOUTCNT 0, QINCNT 0, SCB flags 0x0, cmd 0x0

Is there anything else you or anyone out there wants me to try further to help debug this problem?

Thanks,
Steve

On 23 Feb 1998, Linus Torvalds wrote:

> In article <Pine.LNX.3.96.980222142802.13384A-100000@kanga.eecs.umich.edu>,
> Steve Hsieh <steveh@eecs.umich.edu> wrote:
> >
> >I think I have a similar problem, I believe starting around 2.1.8x.
> >If there's heavy disk activity, whatever process is involved gets
> >stuck, and I can't kill it. Unlike Carsten, though, it is repeatable
> >-- if I do a 'cp -a /usr /mnt' where a different drive partition is
> >mounted in /mnt, cp will hang.
>
> Could you try two things:
> - upgrade to 2.1.88 (unless you already have)
> - test this "strange" patch to __wait_on_inode():
>
> static void __wait_on_inode(struct inode * inode)
> {
> struct wait_queue wait = { current, NULL };
>
> add_wait_queue(&inode->i_wait, &wait);
> repeat:
> current->state = TASK_UNINTERRUPTIBLE;
> + __asm__ __volatile__("cpuid": : :"ax", "bx", "cx", "dx", "memory");
> if (inode->i_state & I_LOCK) {
> schedule();
> goto repeat;
> }
> remove_wait_queue(&inode->i_wait, &wait);
> current->state = TASK_RUNNING;
> }
>
>
> (The above is just a pseudo-patch, but as it's only one line you should
> get the idea).
>
> I haven't tested it myself, and for all I know it may be completely
> bogus, but this is something that came up withe the same function wrt
> the buffer cache, where another ordering change made a difference to
> some people. The only reason I can think of is a serialization thing,
> and while I don't actually believe in it it is certainly worth testing
> if this is reasonable easily repeatable for some people.
>
> (Intel documents "cpuid" as being a serializing instruction, so it will
> force the CPU to not re-order anything around that particular place. I
> currently cannot see how this could make a difference, but I'm not
> completely infallible and the above is easy enough to test ;)
>
> Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu