Re: [RFC 1/2] devlink: add simple fw crash helpers

From: Jakub Kicinski
Date: Mon May 25 2020 - 16:57:51 EST


On Fri, 22 May 2020 22:46:07 +0200 Johannes Berg wrote:
> > The irony is you have a problem with a networking device and all the
> > devices your initial set touched are networking. Two of the drivers
> > you touched either have or will soon have devlink health reporters
> > implemented.
>
> Like I said above, do you think it'd be feasible to make a devcoredump
> out of devlink health reports? And can the report be in a way that we
> control the file format, or are there limits? I guess I should read the
> code to find out, but I figure you probably just know. But feel free to
> tell me to read it :)
>
> The reason I'm asking is that it's starting to sound like we really
> ought to be implementing devlink, but we've got a bunch of
> infrastructure that uses the devcoredump, and it'll take time
> (significantly so) to change all that...

In devlink world pure FW core dumps are exposed by devlink regions.
An API allowing reading device memory, registers, etc., but also
creating dumps of memory regions when things go wrong. It should be
a fairly straightforward migration.

Devlink health is more targeted, the dump is supposed to contain only
relevant information, selected and formatted by the driver. When device
misbehaves driver reads the relevant registers and FW state and creates
a formatted state dump. I believe each element of the dump must fit
into a netlink message (but there may be multiple elements, see
devlink_fmsg_prepare_skb()).

We should be able to convert dl-regions dumps and dl-health dumps into
devcoredumps, but since health reporters are supposed to be more
targeted there's usually multiple of them per device.

Conversely devcoredumps can be trivially exposed as dl-region dumps,
but I believe dl-health would require driver changes to format the
information appropriately.