Re: 2.0.34pre6

Doug Ledford (dledford@dialnet.net)
Sat, 04 Apr 1998 21:53:52 -0600


Alan Cox wrote:
>
> > What about Doug's aic7xxx patches? Last I heard 2.0.x aic7xxx driver was
> > "unfit for news server load". Not that I've had any troubles with it
> > recently mind you. Or are there "political" problems with aic7xxx-5.0.x in
> > the mainstream kernel?
>
> This has two patches Doug sent me that fix problems. There are no
> 'political' problems on versions. Doug makes the final decision on what
> goes in not me.

Well, there are still political issues, just not in Alan's mind. Evidently,
as far as Alan is concerned, I'm now maintaining that driver. However, the
official aic7xxx maintainer still is Daniel Eischen. If I'm maintaining the
driver in 2.0.33 up, it's because that role was somewhat usurped from Dan
Eischen, and so there are political issues at least in my own conscience.

Now, as to my decision not to ship the 5.0.x drivers to Alan. That was a
difficult decision to make. However, I finally decided not to do so based
on these issues:

1. The two patches I did ship Alan should fix all bugs I know of in the
2.0.33 aic7xxx driver. Specifically, they fix things like the RedHat
install disks causing the machine to reboot when you access the CD-ROM and
they fix a problem in the reset code where I forget two restore_flags();
calls before leaving the reset handler which would result in the machine
locking up if it happened to enter the DELAY or FAIL actions in the reset
handler (which is rare, but I saw a few posts about something of this
nature). With these fixed, that driver should be *very* stable.
Furthermore, it will have had the benefit of large amounts of testing.

2. The 5.0.x drivers are still in a state of flux as I'm working out the
few remaining bugs. The 5.0.7 driver is pretty stable, but the reset code
is relatively untested in that version and there are a few known bugs in
that code. My current working set for 5.0.11 has all of that fixed, plus a
few things that should solve some other problems, but it isn't even released
yet as I'm tracking why the system likes to lock up under heavy load on
2.1.9x SMP systems (although I think this may not be driver related and
instead is kernel related, possibly in scsi.c). On 2.0.33, the current
5.0.11 sources appear to be as stable as anything else and fix the known
problems in 5.0.7.

So, I know of several people that were very much wanting the 5.0.x code in
the 2.0.34 kernel for various reasons. So, I've got a plan. Something that
should make them happy and at the same time make myself feel comfortable. I
didn't want to ship this to Alan because it would add an extra 100K or so to
the linux tarball (and about 100K to the patch itself), but my plan is
this. I'm going to add the 5.0.x drivers to the 2.0.34 kernel as a patch
that basically creates two separate modules. One with the standard 4.1.1
driver that's in the kernel already, and one with a 5.0.x driver. They can
both be compiled as modules, although only one can be loaded at a time.
Only one may be in the kernel itself. By doing this, I'm hoping people will
be able to use either the "tried and true" 4.1.1 code that's in the kernel
now (with my fixes), or the more experimental 5.0.x code, at their choice.

In the meantime, the 5.0.11 version of the driver is now released (as I was
writing the email I uploaded it). I'll hold off the interrupt changes until
5.0.12 and having had a chance to test the 2.1.92 code more thoroughly.
This version should prove to be as reliable, or more so, than the 5.0.7
version. I would like to specifically ask anyone running any of the 5.0.x
aic7xxx versions to please update to 5.0.11 and let me know if anything goes
wrong. Also let me know if anything doesn't. I want to build a sort of x
votes for "It works" and y votes for "It didn't work for me" type thing. It
would be especially nice if anyone having any past problems with the driver
would try this out because I want to know if it solves any of the problems
that some people had with 5.0.8 and up as well as anyone who has problems
with the 4.1.1 driver in 2.0.33. I would also like people to stress test
this version fairly hard. That means, if your drives can handle it, turn on
tagged queueing (see the drivers/scsi/README.aic7xxx file for information
regarding tagged queueing and versions above 5.0.5), set a reasonable depth,
and then run continuous bonnie tests for a couple hours and see if anything
bad happens. On 2.1.9x, it locks the system up under that load. I want to
make *SURE* it is just a 2.1.9x problem, not a 2.0.33 problem as well.
NOTE: a SCSI bus reset during a bonnie run that is recovered is not
something bad. A SCSI bus reset that hangs the system is. However, since
the reset code is fairly untested in the 5.0.11 driver, I would like to hear
all reports of SCSI bus resets so I can gather some kind of clue whether the
driver is recovering properly or not.

In any case, if there are no strong objections from anyone, that's my plan
:)

-- 

Doug Ledford <dledford@dialnet.net> Opinions expressed are my own, but they should be everybody's.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu