Re: DAC960 and latest kernels

From: Dave Olien
Date: Wed Nov 17 2004 - 16:33:05 EST



I'll look into this. I'll see if I can reproduce any of these
problems locally and get back to you.

Dave

On Wed, Nov 17, 2004 at 11:20:18AM -0800, ccantwel@xxxxxxx wrote:
> It seems that I have found a very serious bug relating to the DAC960
> driver, it is present in 2.4.27 and 2.6.8.1 but I haven't tried any
> development kernels newer than that. This message is meant to inform
> the people who may be able to figure out what the problem is and
> repair the bug. If anyone wants to respond to me or has further
> information on this problem, please cc me directly, as I am not a
> member of this mailing list.
>
> There is a bug in latest released kernels for both 2.4 and 2.6
> trees that seems to relate to the DAC960 driver. I have tried
> 2.4.27 and 2.6.8.1 and both have the following problem. After
> replacing and testing every piece of hardware I've determined
> the problem is in the kernel and not the machine. Also, going
> back to kernel 2.4.17 makes everything work perfectly.
>
> When copying a large number of files over the network using nc
> and tar and also stress testing on the receiving side with
> frequent du commands where the data is being written to (not
> required for the problem but it seems to make it more likely
> to happen) the machine randomly hangs. Sometimes it crashes
> completely with an Unable to handle kernel paging request at
> virtual address, with various processes, and sometimes it just
> completely halts and responds to nothing even though it seems
> to sort of be running (answers on sockets but transfers no data,
> no more communication of any kind on other open sockets, entering
> a login name and enter on the console also hangs before the
> password prompt.
>
> Upon rebooting the filesystem that was being written to is corrupt.
> When this file system is XFS I can still inspect it. If it is
> reiser the machine will often hang trying to run fsck.reiserfs.
>
> In addition, after enough network tar copying I can stop the process,
> and even if it has not hung the machine often the filesystem has
> become corrupt. Since this happens with both xfs and reiser and
> there are paging problems too I believe it has to relate to the
> dac960 driver itself.
>
> I've also tried swapping DAC960 cards, and that doesn't matter,
> they both work perfectly in 2.4.17 and the problems occur in
> both cases in 2.4.27 and 2.6.8.1. I'm planning to try some
> more kernels in between to narrow down when the problem was
> introduced but I'm not a kernel developer and that is about the
> most I will be able to do.
>
> I think the DAC960 driver isn't used very often and also without this
> specific type of stress testing it is hard to realize the problem is
> occuring before the filesystem is irreperably corrupt weeks down the
> road, which is why this problem may have not been found yet. Without
> stress testing using tar the machine can go for weeks and act like it
> is fine.
>
> Again, if anyone has any information, please cc me when you reply as
> I am not a member of this list.
>
> Thank you
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/