Chris Webb <chris@xxxxxxxxxxxx> writes:...
Mark Lord <liml@xxxxxx> writes:
Speaking of which..Hi Mark. I've got a test machine on its way at the moment, so I'll make sure
Chris: I wonder if the errors will also vanish in your situation
by disabling the onboard write-caches in the drives ?
Eg. hdparm -W0 /dev/sd?
I check this one out on it too.
Our test machine is still being built, but we had an opportunity to try this on
a couple of the live machines when their RAID arrays failed over the weekend.
We still got timeouts, but (predictably!) they're not on flushes any more:
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata2.00: cmd 35/00:08:98:c6:00/00:00:4e:00:00/e0 tag 0 dm
all the way through the night....
I also have these in the log, but they are immediately after turning off the
write caching in all drives, so may be a red herring with data still being
written out.
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata2.00: cmd c8/00:08:00:20:80/00:00:00:00:00/e0 tag 0 dm
On another machine, I saw this with write caching turned off:...
ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
ata2.00: cmd 61/08:00:28:1f:80/00:00:00:00:00/40 tag 0 ncq 4096 out