Re: [PATCH v5 1/5] sysfs: Add attribute info for /sys/devices/.../coredump_disabled

From: Arend van Spriel
Date: Sat Aug 13 2022 - 05:39:26 EST


On August 12, 2022 8:09:59 AM Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:

On Thu, Aug 11, 2022 at 04:21:54PM -0700, Manish Mandlik wrote:
On Wed, Aug 10, 2022 at 9:21 AM Greg Kroah-Hartman <
gregkh@xxxxxxxxxxxxxxxxxxx> wrote:

On Wed, Aug 10, 2022 at 06:03:37PM +0200, Johannes Berg wrote:
On Wed, 2022-08-10 at 09:00 -0700, Manish Mandlik wrote:
This patch adds the specification for
/sys/devices/.../coredump_disabled
attribute which allows the userspace to enable/disable devcoredump for
a
particular device and drivers can use it to enable/disable
functionality
accordingly. It is available when the CONFIG_DEV_COREDUMP is enabled
and
driver has implemented the .coredump() callback.

It would be nice to say _why_? What problem does this solve? You could
just create the dump and discard it, instead, for example?

Agreed, I do not understand the need for this at all.

The existing /sys/class/devcoredump/disabled (devcd) switch has two
limitations - it disables dev_coredump for everyone who's using it;

Which is good and is the design of the thing.

and
drivers don't have visibility if devcd is disabled or not, so, the
dev_coredump API simply lets drivers collect the coredump from a device but
then later discards it if devcd is disabled.

Why would a driver care?

Now that there are more subsystems using the base dev_coredump API, having
a granular control will make it easier to selectively disable dev_coredump
only for a particular device. For ChromeOS, this is useful to allow drivers
to develop coredump functionality and deploy it without affecting other
drivers with stable devcoredump implementations (example, we've had some
devcoredumps that take 12s to run and we only want to enable it on test
builds because it has lots of PII). The drivers can use this flag to
refrain from collecting or triggering coredump when undesirable.

This feels odd. You have various out-of-tree drivers that take too long
when they crash to make a dump and it causes some unknown issue
elsewhere?

If you have drivers taking 12s for coredump you could/should consider doing it asynchronous, eg. schedule a worker for it. The coredump callback has void return type so it would be fairly easy.

Regards,
Arend