Re: [PATCH 2/3] mwifiex: support sysfs initiated device coredump

From: Arend van Spriel
Date: Fri Feb 23 2018 - 05:39:44 EST


+ Johannes (for devcoredump documentation question).

On 2/22/2018 8:35 PM, Brian Norris wrote:
Hi Arend,

On Thu, Feb 22, 2018 at 01:17:56PM +0100, Arend van Spriel wrote:
On 2/21/2018 11:59 PM, Brian Norris wrote:
On Wed, Feb 21, 2018 at 11:50:19AM +0100, Arend van Spriel wrote:
Since commit 3c47d19ff4dc ("drivers: base: add coredump driver ops")
it is possible to initiate a device coredump from user-space. This
patch adds support for it adding the .coredump() driver callback.
As there is no longer a need to initiate it through debugfs remove
that code.

Signed-off-by: Arend van Spriel <arend.vanspriel@xxxxxxxxxxxx>
---
drivers/net/wireless/marvell/mwifiex/debugfs.c | 31 +-------------------------
drivers/net/wireless/marvell/mwifiex/pcie.c | 19 ++++++++++++++--
drivers/net/wireless/marvell/mwifiex/sdio.c | 13 +++++++++++
drivers/net/wireless/marvell/mwifiex/usb.c | 14 ++++++++++++
4 files changed, 45 insertions(+), 32 deletions(-)

The documentation doesn't really say [1], but is the coredump supposed
to happen synchronously? Because the mwifiex implementation is
asynchronous, whereas it looks like the brcmfmac one is synchronous.

Well, that depends on the eye of the beholder I guess. From user-space
perspective it is asynchronous regardless. A write access to the coredump
sysfs file eventually results in a uevent when the devcoredump entry is
created, ie. after driver has made a dev_coredump API call. Whether the
driver does that synchronously or asynchronously is irrelevant as far as
user-space is concerned.

Is it really? The driver infrastructure seems to guarantee that the
entirety of a driver's ->coredump() will complete before returning from
the write. So it might be reasonable for some user to assume (based on
implementation details, e.g., of brcmfmac) that the devcoredump will be
ready by the time the write() syscall returns, absent documentation that
says otherwise. But then, that's not how mwifiex works right now, so
they might be surprised if they switch drivers.

Ok. I already agreed that the uevent behavior should be documented. The error handling you are bringing up below was something I realized as well. I am not familiar with mwifiex to determine what it can say about the coredump succeeding before scheduling the work.

Anyway, *I'm* already personally used to these dumps being asynchronous,
and writing tooling to listen for the uevent instead. But that doesn't
mean everyone will be.

Also, due to the differences in async/sync, mwifiex doesn't really
provide you much chance for error handling, because most errors would be
asynchronous. So brcmfmac's "coredump" has more chance for user programs
to error-check than mwifiex's (due to the asynchronous nature) [1].

BTW, I push on this mostly because this is migrating from a debugfs
feature (that is easy to hand-wave off as not really providing a
consistent/stable API, etc., etc.) to a documented sysfs feature. If it
were left to rot in debugfs, I probably wouldn't be as bothered ;)

I appreciate it. The documentation is not in the stable ABI folder yet so I welcome any and all improvements. Might learn a thing or two from it.

Brian

[1] In fact, the ABI documentation really just describes kernel
internals, rather than documenting any user-facing details, from what I
can tell.

You are right. Clearly I did not reach the end my learning curve here. I
assumed referring to the existing dev_coredump facility was sufficient, but
maybe it is worth a patch to be more explicit and mention the uevent
behavior. Also dev_coredump facility may be disabled upon which the trigger
will have no effect in sysfs. In the kernel the data passed by the driver is
simply freed by dev_coredump facility.

Is there any other documentation for the coredump feature? I don't
really see much.

Any other than the code itself you mean? I am not sure. Maybe Johannes knows.

Brian

[1] Oh wait, but I see that while ->coredump() has an integer return
code...the caller ignores it:

static ssize_t coredump_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
device_lock(dev);
if (dev->driver->coredump)
dev->driver->coredump(dev);
device_unlock(dev);

return count;
}
static DEVICE_ATTR_WO(coredump);

Is that a bug or a feature?

Yeah. Let's call it a bug. Just not sure what to go for. Return the error or change coredump callback to void return type.

Regards,
Arend