Gerard Roudier (groudier@club-internet.fr):
> Thanks for the informations.
no, no, I have to thank you spending your time.
> The phase change from COMMAND to STATUS is acceptable if the target
> detects wrong data in a CDB. Hopefully, if nothing more severe occured
> we can expect the target to return a CHECK CONDITION.
How do I know the target ( =drive? ) returns a CHECK CONDITION? Does the
driver indicate this in a way?
> But the phase change from COMMAND TO MESSAGE IN is not acceptable,
> because there is no relevant message a target can send in this situation.
> Anyway, the driver allow the target to behave as it want.
Do I get you right, that the drive must be nuts when it changes from COMMAND
to MESSAGE IN? (and there is no transmission problem)
> On Wed, 5 May 1999, Rainer Clasen wrote:
> > Driver were loaded as module.
Doesn't matter, compiling it into the kernel, didn't help. Using the
53c7,8xx module (CONFIG...FAST=y,CONFIG...DISCONNECT=y), I had data
corruption, too.
> > > You can think as long as you want but your computer (as mine) is just an
> > > assembly of pieces of crap and nothing better. :-)
> >
> > jepp - until I tested this with FreeBSD on the *same* system I definatly
> > thought I had hardware that is (too) defective. Maybe FreeBSD *does* special
> > workarounds, but why can't we, too?
>
> In order to work around something, we need some clue on the real problem.
> Btw, if I say you that there is no special work-around in the FreeBSD
> ncr driver that may address this problem, do you beleive me?
yes, I do. Computers are strange, aren't they?
> But if FreeBSD has black-listed some firmwares, this may appear as a
> work-around.
... or FreeBSD's access pattern on UFS doesn't trigger the problem by chance.
> People often omit to indicate:
[...]
> - If the system and devices are correctly cooled.
*g* yes, they are.
> - Etc ...
How about collecting a list of things to include in SCSI bugreports? I'd
offer as volunteer.
> You must start from some hypothesis and then try to change things that may
> confirm your hypothesis.
That was what I did at the beginning - until I had no further Ideas.
> For example, if you suspect a firmware problem you may disable features,
> one at a time, starting from the ones that are known to be often bogusly
> handled. In my opinion, the right order is:
>
> 1 - Disable Write Caching
Tried this now, didn't help.
> 2 - Reduce the tag depth
> 3 - Disable tags
tags:0? - done, didn't work.
> 4 - Reduce sync transfer speed
> 5 - Disable sync transfer
you mean sync:255? done.
> 6 - Disable Wide transfer
no-op to me since it's a narrow controller
> If you suspect a PCI problem, then you may:
>
> 1 - Disable PCI cache based transactions (for the chip)
> 2 - Decrease burst length
> 3 - Disable burst
How do I do this? BIOS? Are this options to the sym/ncr53c8xx driver? Is
safe:y enough?
> 4 - Remove any patch or option that claim to optimize the chipset.
no patches applied, pristine sources.
> 5 - Avoid using graphics or IDE during your testing
There is no IDE attached, IDE is turned of in the BIOS. graphics? Hmm,
Usually X was running, but not very busy. Stay a moment, I'm testing
without. No, doesn't help.
> > On Sat, 3 Apr 1999 19:20:15 +0200 Gerard Roudier wrote:
> > > - PCI BUS problem due to broken chip-set of misconfiguration.
> > > - Memory problem that affects DMA from the PCI BUS.
> >
> > In two different boards which dont show any problems otherwise? One of those
> > boxen usually runs 24/7 and is quiet busy. Yes, I know, it's still
> > possible, but isn't it rather improbable.
>
> The informations send to the list, seems to confirm this hypothesis, but
> at the time I wrote the above these informations haven't been made
> available as you certainly know.
*uhhm*, sorry.
> > > - Kernel or driver bug outside the ncr driver that makes some garbaged
> > > command go to the SCSI device.
> >
> > possible.. But why isn't the aic7xx driver vulnerable? I doubt, I'm capable
> > finding the guilty part in this case.
>
> I am also not in position to find it. It is often hard to find a problem
> we can reproduce and I have never got a single weird PHASE CHANGE problems
> on any of my systems since I started with ncr chips (1995).
To avoid the possibility of vfat/ext2 doing something evil, I wrote to the
device directly. (several dd if=/dev/zero of=/dev/sd?5) I didn't get any
"phase change", but I suppose this says nothing.
> If a firmware update exists, it should be tried, in my opinion.
Taking a short look at IBM's site I found none. I'll try harder next time.
> Disabling SCSI features, one at a time, as I mentionned above should also
> be interesting.
BTW, are there any debuging options that may help in this situation? The
driver seems very silent.
Thanks for your support
Rainer
-- KeyID=58341901 fingerprint=A5 57 04 B3 69 88 A1 FB 78 1D B5 64 E0 BF 72 EB
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/