Re: [PATCH] mwifiex: pcie: add enable_device_dump module parameter

From: Tsuchiya Yuto
Date: Thu Nov 26 2020 - 13:31:15 EST


On Fri, 2020-11-20 at 12:53 -0800, Brian Norris wrote:
> (Sorry if anything's a bit slow here. I don't really have time to
> write out full proposals myself.)

Don’t worry, I (and other owners) am already glad to hear from upstream
devs, including you :)

(and I'm also sorry for the late reply from me. It was difficult to find
spare time these days.)

> On Fri, Oct 30, 2020 at 3:30 AM Tsuchiya Yuto <kitakar@xxxxxxxxx> wrote:
> > Let me know if splitting this patch like this works. 1) The first patch
> > is to add this module parameter but don't change the default behavior.
>
> That *could* be OK with me, although it's already been said that there
> are many people who dislike extra module parameters. I also don't see
> why this needs to be a module parameter at all. If you do #2 right,
> you don't really need this, as there are already several standard ways
> of doing this (e.g., via Kconfig, or via nl80211 on a per-device
> basis).
>
> > 2) The second patch is to change the parameter value depending on the
> > DMI matching or something so that it doesn't break the existing users.
>
> Point 2 sounds good, and this is the key point. Note that you can do
> point 2 without making it a module parameter. Just keep a flag in the
> driver-private structures.

I chose to also provide the module parameter because I thought it would
be a little bit complicated for users to re-enable this dump feature if
I chose not to provide this parameter.

If I don't provide the parameter but still want to allow users to
re-enable this dump feature, we can provide a way to change the bit flags
of quirks (via a new debugfs entry or another module parameter). That
said, is there a way to easily change the quirk flags only for this dump
feature?

For example, assume that the following three quirks are enabled by default:

/* quirks */
#define QUIRK_FW_RST_D3COLD BIT(0)
#define QUIRK_NO_DEVICE_DUMP BIT(1)
#define QUIRK_FOO BIT(2)

.driver_data = (void *)(QUIRK_FW_RST_D3COLD |
QUIRK_NO_DEVICE_DUMP |
QUIRK_FOO),

card->quirks = (uintptr_t)dmi_id->driver_data;

/* and assume that card->quirks can be changed by users by a file named
* "quirks" under debugfs.
*/

So, the card->quirks will be "7" in decimal by default. Then, if users
want to re-enable the dump feature, as far as I know, users need to know
the default value "7", then need to know that device_dump is controlled
by bit 1. Finally, users can set the new quirk flags "5" in decimal (101
in binary).

echo 5 > /sys/kernel/debug/mwifiex/mlan0/quirks

I'm glad if there is another nice way to control only the device_dump
quirk flag, if we only provide a way to change quirk flags.

But at the same time I also think that if someone dare to want to
re-enable this feature, maybe the person won't feel it's complicated haha.
So, I'll drop the device_dump module parameter and switch to use the quirk
framework, adding a way to change the quirk flags value by users.

That said, we may even drop disabling the dump. Take a look at my comment
I wrote below.

> > But what I want to say here as well is that, if the firmware can be fixed,
> > we don't need a patch like this.
>
> Sure. That's also where we don't necessarily need more ways to control
> this from user space (e.g., module parameters), but just better
> detection of currently broken systems (in the driver). And if firmware
> ever gets fixed, we can undo the "broken device" detection.

There are two types of firmware crashes we observes:
1) cmd_timedout (other than ps_mode-related)
2) Firmware wakeup failed (ps_mode-related)

The #2 is observed when we enabled ps_mode. The #1 is observed for the
other causes. And hopefully, verdre (in Cc) found a "fix" [1] for the
#1 fw crash. We are trying the fix now.

The pull req (still WIP) basically addresses fw crashing by using
"non-posted" register writes and uninterruptible wait queue for PCI
operations in the kernel driver side.

We still don't have any ideas to "fix" the #2 fw crash, but now we don't
see the #1 crash anymore so far.

I'd like to see where the attempt goes before I start working on this
patch here again.

Thank you, everyone.

[1] https://github.com/linux-surface/kernel/pull/70
[WIP] Properly fix wifi firmware crashes by jonas2515 · Pull Request #70 · linux-surface/kernel

> Brian