Re: Kernel Panic: aic7xxx_free

Doug Ledford (dledford@dialnet.net)
Wed, 28 May 1997 15:41:26 -0500


--------
>
> on linux-kernel-digest@vger.rutgers.edu Paul Serice wrote:
>
>
> > Linux is crashing all the time now. I think maybe its partly my
> > hardware's fault. Can someone confirm or otherwise enlighten me?
>
> Yes, I'll try to :-)
> It is your faulting hardware.

Actually, not this time. I had to dig out my older aic7xxx sources to look
this one up, and it actually isn't a hardware fault.

>
> > What happens is that during during disk activity, the computer more
> > or less locks up. If you're lucky enough to have the computer crash
> > while your at a text console, you can see the error messages repeated
> > scroll past, but you'll have an extremely hard time getting the
> > computer to respond to anything, including telnet.
> >
> > The error is as follows:
> >
> > Couldn't Get A Free Page .....
> > Kernel Panic: aic7xxx_free (aic7xxx_free) Couldn't find a free SCB.
>
> I experienced the same problem wit 2.0.30.
> 2.0.30 brings a lot more i/o performance than version < 2.0.30
> AFAIK due to enhancements I've seen in the buffer management.

The enhancement in the buffer management that you speak of has been
resulting in a *lot* more "Couldn't get a free page..." errors than it used
to. When these happen at the wrong time....well....you get what we have
here. In this case, all of the currently allocated SCBs in the aic7xxx
driver were in use, the mid level SCSI code queued out another command,
aic7xxx_queue() attempted to allocate another SCB to handle the request, the
allocation failed with a couldn't get a free SCB message, and the driver
paniced. It currently doesn't know what to do if it can't get memory for an
SCB. We've never had to really worry about this in the past simply because
we didn't get "Couldn't get a free page..." errors, but I guess now we do
need to worry about it. FWIW, the driver wouldn't experience this problem
if immediately after bootup you ran some script that caused the scsi
controller to get beat on real hard, real quick. These SCBs are only
allocated once each. If you have 3 drives with a tagged queue depth of 16
each, then you can only have 48 outstanding commands possible on the SCSI
cus, and hence, only 48 needed SCBs. Once those 48 SCBs are allocated, this
problem won't ever occur.

I may end up talking to Dan Eischen about some sort of pre-allocation scheme
to run around the select_queue_depth() time to get around this problem, but
it would be in the current driver, not in this older version in the 2.0.x
kernel tree.

>
> [snip]
>
> > Will all of these problems disappear if I just dump my Adaptec
> > control and get a BusLogic Control?
>
> Noooooooooooo, I don't think so!!!!!!!!!!!!!!
>
> Only if other than Adaptec lets the problem disapear being
> an i/o bottleneck ;-)

Actually, the BusLogic driver allocates 32 SCB equivelants at a time, and
I'm not exactly sure what it does if that allocation fails at a point in
time when it's needed :)

-- 
*****************************************************************************
* Doug Ledford                      *   Unix, Novell, Dos, Windows 3.x,     *
* dledford@dialnet.net    873-DIAL  *     WfW, Windows 95 & NT Technician   *
*   PPP access $14.95/month         *****************************************
*   Springfield, MO and surrounding * Usenet news, e-mail and shell account.*
*   communities.  Sign-up online at * Web page creation and hosting, other  *
*   873-9000 V.34                   * services available, call for info.    *
*****************************************************************************