Re: Major qla2xxx regression on sparc64

From: Andrew Vasquez
Date: Mon Apr 16 2007 - 12:37:36 EST


On Mon, 16 Apr 2007, David Miller wrote:

> Sparc64 systems which have an on-board qla2xxx chip (such as
> SunBlade-1000 and SunBlade-2000, there are probably some other systems
> like this too) do not have any NVRAM information present, in fact the
> NVRAM is basically all 0's from what I can tell.
>
> This always worked just fine since the code would previously just use
> a bunch of defaults when an inconsistent NVRAM was detected.
>
> But the changeset below at the end of this email broke this and now
> I'm seeing bug reports from sparc64 users and I was just able to
> reproduce the problem myself just today as well. I verified that
> reverting the patch below gets things working again.
>
> Emanuele, you can feed the patch below to "patch -p1 -R" to get that
> working again so we can move on to the other sparc64 bug we're looking
> into :-)

I sent Emanuele the attached patch during the weekend...

> The failure mode isn't nice, it actually ends up crashing with an OOPS
> in qla2xxx_init_host_attr() because ha->node_name is NULL, it's
> supposed to be initialized by functions like qla2x00_nvram_config()

No, it's not very nice...

> Can we revert the patch below or do something similar to get things
> working again on sparc64?
>
> The most important thing which qla2x00_nvram_config() seems to want to
> get is the WWN port_name and node_name. These are provided in the OFW
> device tree so we could pluck them out of there with something like:
>
> #ifdef CONFIG_SPARC
> #include <asm/prom.h>
> #include <asm/pbm.h>
> #endif
>
> ...
>
> #ifdef CONFIG_SPARC
> struct pcidev_cookie *pcp = pdev->sysdata;
> u8 *port_name, *node_name;
>
> port_name = of_get_property(pcp->prom_node, "port-wwn", NULL);
> node_name = of_get_property(pcp->prom_node, "node-wwn", NULL);
> #endif
> Those will hold a pointer to the property values or NULL if the
> property does not exist. This is private data, so you should make
> copies of them into your local data structure and not use references
> to them.
>
> I don't see any OFW properties present that could be used to fill in
> the rest of the NVRAM parameters, so we'd need to use the defaults
> that the code before the change was using.

I'd be more inclined to do soemthing like the above, rather than:

> But even if that fails, I think the fallback code should be put back,
> since it obviously was used by at least one system and it's probable
> that there are some other applications of using this qla2xxx chip that
> will have an empty NVRAM too.

Then they should really get their NVRAM corrected, if in fact their
NVRAMs are cleared.

> I can understand the apprehension in using a fixed port_name[] value,
> since it could conflict with other FC controllers on the mesh, but if
> that is so important just choose some random value that is a valid FC
> ID or use some characteristic ID that can be used to compose part of
> the port WWN in order to give it at least some uniqueness.

Look, there's a fine balance here that we must strike -- the solution
that you're proposing implies that there's some 'random' bit-space
within the IEEE NAA with which one can safely encode without stomping
on any valid OUI.