2.1.99 SMP w/md - utterly broken

F Harvell (fharvell@fts.net)
Fri, 01 May 1998 16:07:26 -0300


From reading the list, it appeared that the 2.1.99 pre's and final
were solving many of the IRQ/SCSI disk problems. I wasn't having any
luck. I didn't say anything because I had recently upgraded my cpu's
from Dual PPro 150 to Dual PPro 200 and my problems had gotten worse.
I wasn't sure that I wasn't seeing a hardware problem.

Well, I finally got tired of the constant hung processes, crashes,
and unreliability and decided to try and move my md 0 striped
partitions back to standard partitions. I haven't had a problem
since.

In hindsight, while I'm not particularly kernel savvy, I can guess
from what I've read on the list that all of the sti(), cli(),
save_flags(), and restore_flags() calls in the md.c code are not
particularly SMP friendly. (Actually, I was having a bad time running
a UP kernel as well.) I would highly recommend _not_ using md with
any of the IO-APIC enabled kernels (2.1.85+) with an SMP system until
md is brought into the bright new spin_lock()'ed future.

Also, a few observations:

a) the problems tend to show up (using ps ax) as processes stuck in
the "D" state.

b) the update process can get stuck which will lead into a death
spiral where, eventually, every process gets stuck and nothing can be
done on the box.

c) the system speed does make a difference, i.e., it shows up much
faster on PPro 200's than on PPro 150's.

d) When running with md, I constantly saw processes going into and
out of the "D" state, without md, it only happens rarely. (while
true; do ps axlw | grep D; sleep 1; done)

e) when md was running, the whole system was substantially slower
than without it. I would guess that this is related to (d) as there
were probably missed interrupts, etc.

If anyone else is already looking into the md problems, please let
me know.

-- 
Mr. F Harvell                          Phone:407 696-4340
FTS International, Systems Division    Phone:407 399-0342 (cell)
3498 Buffam Place                        Fax:407 696-4244
Casselberry, FL 32707                 mailto:fharvell@fts.net

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu