Re: [PATCH v1 1/6] staging: qlge: Initialize devlink health dump framework for the dlge driver

From: Coiby Xu
Date: Fri Oct 16 2020 - 19:16:55 EST


On Thu, Oct 15, 2020 at 08:06:06PM +0900, Benjamin Poirier wrote:
On 2020-10-15 11:37 +0800, Coiby Xu wrote:
On Tue, Oct 13, 2020 at 09:37:04AM +0900, Benjamin Poirier wrote:
> On 2020-10-12 19:24 +0800, Coiby Xu wrote:
> [...]
> > > I think, but didn't check in depth, that in those drivers, the devlink
> > > device is tied to the pci device and can exist independently of the
> > > netdev, at least in principle.
> > >
> > You are right. Take drivers/net/ethernet/mellanox/mlxsw as an example,
> > devlink reload would first first unregister_netdev and then
> > register_netdev but struct devlink stays put. But I have yet to
> > understand when unregister/register_netdev is needed.
>
> Maybe it can be useful to manually recover if the hardware or driver
> gets in an erroneous state. I've used `modprobe -r qlge && modprobe
> qlge` for the same in the past.

Thank you for providing this user case!
>
> > Do we need to
> > add "devlink reload" for qlge?
>
> Not for this patchset. That would be a new feature.

To implement this feature, it seems I need to understand how qlge work
under the hood. For example, what's the difference between
qlge_soft_reset_mpi_risc and qlge_hard_reset_mpi_risc? Or should we use
a brute-force way like do the tasks in qlge_remove and then re-do the
tasks in qlge_probe?

I don't know. Like I've said before, I'd recommend testing on actual
hardware. I don't have access to it anymore.

Yeah, as I'm changing more code, it's more and more important to test
it on actual hardware. Have you heard anyone installing qle8142 to
Raspberry Pi which has a PCIe bus.

Is a hardware reference manual for qlge device?

I've never gotten access to one.

My experience of wrestling with an AMD GPIO chip [1] shows it would
be a bit annoying to deal with a device without a reference manual.
I have to treat it like a blackbox and try different kinds of input
to see what would happen.

Btw, it seems resetting the device is a kind of panacea. For example,
according to the specs of my touchpad (Synaptics RMI4 Specification),
it even has the feature of spontaneous reset. devlink health [2] also
has the so-called auto-recovery. So resetting is a common phenomenon. I
wonder if there are some common guidelines to do resetting which also
apply to the qlge8*** devices.

The only noteworthy thing from Qlogic that I know of is the firmware
update:
http://driverdownloads.qlogic.com/QLogicDriverDownloads_UI/SearchByProduct.aspx?ProductCategory=322&Product=1104&Os=190

It did fix some weird behavior when I applied it so I'd recommend doing
the same if you get an adapter.

Thank you for sharing the info!


[1] https://www.spinics.net/lists/linux-gpio/msg53901.html
[2] https://www.kernel.org/doc/html/latest/networking/devlink/devlink-health.html

--
Best regards,
Coiby