2.6.1 IO lockup on SMP systems

From: Sergey S. Kostyliov
Date: Sat Jan 31 2004 - 11:59:38 EST


Hello all,

I had experienced a lockups on three of my servers with 2.6.1. It doesn't
look like a deadlock, the box is still pingable and all tcp ports which were
in listen state before a lockup are remains in listen state, but I can't get
any data from this ports. According to sar(1) systems had not been overloaded
right before a lockup. And there is no log entries in all user services logs
for almost 10 hours after lockup.

So I think this is an IO lockup. On the other side it doesn't look like a bug
in particular controller driver, because they are different for each box.
And finally it doesn't look like a bug in particular io-scheduler because two
of boxes were runed with "deadline" and one with "as". Of course all
assumptions are valid only if all lockups I had seen have the same nature.

All of three boxes are SMP. Unfortunately all are remote and aren't attached
to a serial console yet (this is planed in next couple of weeks).

1) ope
01:02.1 RAID bus controller: Mylex Corporation: Unknown device 0050 (rev 02)
elevator=deadline
.config: http://sysadminday.org.ru/2.6.1-io_lockup/ope/.config
lspci: http://sysadminday.org.ru/2.6.1-io_lockup/ope/lspci
lspci -vvn: http://sysadminday.org.ru/2.6.1-io_lockup/ope/lspci_-vvn

2) white
02:04.0 RAID bus controller: American Megatrends Inc. MegaRAID (rev 02)
elevator=deadline
.config: http://sysadminday.org.ru/2.6.1-io_lockup/white/.config
lspci: http://sysadminday.org.ru/2.6.1-io_lockup/white/lspci
lspci -vvn: http://sysadminday.org.ru/2.6.1-io_lockup/white/lspci_-vvn

3) tiny
02:00.0 Unknown mass storage controller: Compaq Computer Corporation Smart-2/P RAID Controller (rev 03)
03:00.0 Unknown mass storage controller: Compaq Computer Corporation Smart-2/P RAID Controller (rev 03)
elevator=as
.config: http://sysadminday.org.ru/2.6.1-io_lockup/tiny/.config
lspci: http://sysadminday.org.ru/2.6.1-io_lockup/tiny/lspci
lspci -vvn: http://sysadminday.org.ru/2.6.1-io_lockup/tiny/lspci_-vvn

Any hints will be appreciated.

--
Best regards,
Sergey S. Kostyliov <rathamahata@xxxxxxx>
Public PGP key: http://sysadminday.org.ru/rathamahata.asc

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/