Re: Booting from Qlogic qla2300 fibre channel card

From: Michael Clark (michael@metaparadigm.com)
Date: Wed Apr 16 2003 - 10:32:21 EST


Hi,

On 04/16/03 14:56, Lincoln Dale wrote:
> Hi,
>
> At 08:18 AM 16/04/2003 +0200, Jurjen Oskam wrote:
>
>> At work, we are looking to deploy several Linux boxes on our SAN. The
>> machines will be IBM eServer xSeries 345 with Qlogic qla2340 Fibre
>> Channel
>> cards, and no internal disks.
>>
>> The storage array is an EMC Symmetrix model 8530. EMC created a document
>> where they explain how to make such a configuration work. When they
>> mention
>> booting from a Symmetrix-provided volume, they mention the following:
>>
>> "If Linux loses connectivity long enough, the disks disappear from the
>> system. [...] For [this reason], EMC recommends that you do not boot a
>> Linux host from the EMC storage array."
>
>
> in general, all OSes get rather upset if disks disappear under them.
> particularly if those disks contain swap -- exactly how is the machine
> meant to recover from that?
>
> some recommendations:
> - run with the Matthew Jacob's "feral" driver rather than QLogic's driver
> it has much better error recovery

Although this is certainly a matter of opinion. When i tried the feral
driver a month ago - upon unplugging the fibre (and getting loop down)
the SCSI layer started spewing IO errors and the files copied during
this test (on ext3) had invalid checksums. The qlogic driver however
handled this test fine (handling multiple fibre unplugs while copying a
multi gigabyte file). Certainly the qlogic driver has its fair share of
recovery problems such as an abort function that tries to re-init the
hardware but always fails.

I'm currently looking for alternatives to qlogic HBAs after a year of
not being able to find a stable driver combo (one that can stand up
for more than a few weeks). Does any one out there have experience
with the LSI HBAs and Fusion MPT drivers or perhaps Emulex?

We get the following with latest 6.1 qlogic driver and our 2300s about
every 2 weeks (we are about to file a bug report to qlogic).

Apr 2 10:54:13 prodapp3 kernel: qla2x00: Status Entry invalid handle.
Apr 2 10:54:13 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr 2 10:54:13 prodapp3 kernel: qla2x00_abort_isp(2): **** FAILED ****
Apr 2 10:54:13 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr 2 10:54:13 prodapp3 kernel: qla2x00_abort_isp(2): **** FAILED ****
Apr 2 10:54:13 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr 2 10:54:13 prodapp3 kernel: qla2x00_abort_isp(2): **** FAILED ****
Apr 2 10:54:14 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr 2 10:54:14 prodapp3 kernel: qla2x00_abort_isp(2): **** FAILED ****
Apr 2 10:54:15 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr 2 10:54:15 prodapp3 kernel: qla2x00_abort_isp(2): **** FAILED ****
Apr 2 10:54:16 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr 2 10:54:16 prodapp3 kernel: qla2x00_abort_isp(2): **** FAILED ****
Apr 2 10:54:17 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr 2 10:54:17 prodapp3 kernel: qla2x00(2): ISP error recovery failed - board disabled

> - you may want to increase the delay of SCSI_TIMEOUT in
> drivers/scsi/scsi.h
>
> in my lab here, i do a ton of work on Fibre Channel & iSCSI.
> the best setup i've found is that i end up using ramfs as my root and
> having lots of things in there. sure, its burns a bit of ram, but i can
> be sure if i'm doing anything that could impact the i/o path, its on
> less system-critical stuff. since its a lab and the things running on
> the hosts aren't RAM hongs, i don't have swap either. you probably
> can't get away with that, so i'd recommend doing some extensive testing
> pulling cables out and seeing what happens and tuning timers to cope
> with it accordingly.
>
>> When making an online configuration change on the Symmetrix (such as
>> remapping volumes), it is possible for the attached hosts to experience
>> a temporary error while accessing a storage array volume. For example,
>
>
> are you sure this tech note will still apply with the DMX?
> i'd imagine that there are still bin file changes that can cause this
> kind of thing, but its something i believe EMC was addressing with the DMX.
>
>> when changing the Symmetrix configuration, it is not uncommon for the
>> RS/6000s (also attached to the SAN) to log one or two temporary
>> SCSI-errors. They don't cause any problems at all, the AIX volume manager
>> never notices a problem.
>
>
> on RS/6000's, the rules were somewhat different. the HBAs that IBM had
> for RS6Ks typically only tried to issue FLOGIs once every 30 seconds -
> so you would be more likely to see timeout errors if you impacted the
> flow of traffic temporarily.
>
>
> cheers,
>
> lincoln.
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Apr 23 2003 - 22:00:18 EST