Re: [PATCH] firmware: wake all waiters

From: Luis R. Rodriguez
Date: Tue Jun 27 2017 - 12:39:55 EST


On Mon, Jun 26, 2017 at 07:10:09PM -0700, Jakub Kicinski wrote:
> On Mon, 26 Jun 2017 23:20:36 +0200, Luis R. Rodriguez wrote:
> > > In that case we will make them all use the same struct firmware_buf.
> > > When wake up happens make sure it's propagated to all of them.
> > >
> > > Signed-off-by: Jakub Kicinski <jakub.kicinski@xxxxxxxxxxxxx>
> >
> > There's a slew of bugs lurking here though!
> >
> > As noted the reported Intel driver issues still need other fixes, one was the
> > fw_state_done() on the direct filesystem lookup mechanism [1], and that may be
> > a regression since direct filesystem loading was added, and even secondary
> > requests would seem to just wait forever (MAX_SCHEDULE_TIMEOUT); the combination
> > of both fixes should fix your reported issue.
> >
> > Do you intend on submitting those changes as well ? There's still *other* bugs
> > with this feature though... Knowing if you will follow up with further fixes
> > will be appreciated.
>
> No, I don't have any more fixes in my tree right now :)

Ok I can take on the other bits.

> What I'm
> looking towards implementing is actually a ability for NICs to load
> default FW but then enable users to load different FW on their request.

request_firmware_direct() loads optional firmware but this is a sync call. We
don't currently have a similar API for async, we would have gotten this with
the driver data API I wrote, but am now looking forward to Greg advising how to
implement this. But it seems you need more actually, comments below.

> The problem is that advanced NICs are quite programmable [1] and
> depending on use case one may want to load different firmware files.

Right, so in the 802.11 world some devices might use different firmware for
different modes of operation, STA, AP, Mesh, but this is all very protocol
specific, so userspace could tickle the kernel about a mode.

Do your use cases have protocol definitions which can be exposed in userspace?
Or are these just fw variants with different bells and whistles? How man
different use cases are we talking about?

> It's slightly close to the FPGA use case, only with FPGA people don't
> expect much plug and play, and with NICs the default mode after boot
> must be "look as much as a standard NIC as possible". Then loading
> "advanced"/hand crafted firmware can turn more interesting features on.

Makes sense.

> The FW loading we have now in drivers/net/ethernet/netronome/nfp is
> requesting default FW and returning -EPROBE_DEFER if not found.

Oh I see -- right now nfp_nsp_init() is the path that will call the firmware
load via request_firmware() on nfp_net_fw_find(), and if this fails it fails to
find firmware it still returns 0, and the nfp_net_pci_probe() does the
-EPROBE_DEFER handling.

Ugh. This is super hacky, and I realize -EPROBE_DEFER is used for these hacks
folks should stop doing this, specially for this use case given we thought
about it and I believe we have a solution now.

Tom Gundersen and Daniel Wagner worked on a userspace solution to help with
this, it works with two simple modes: best-effort and final-mode. The idea is
the firmwared daemon will be kicked into final-mode once userspace knows the
real rootfs is ready, and this in turn can be used to signal a final
notification that the optional or required firmware is *definitely* not there.

Arend was going to start toying with it, so it would be good to wait for his
feedback.

> Now I
> need to find a way to allow users to "push" whatever advanced FW they
> have into the NIC after/during boot.

Be careful how you do this as you'll have to support it in the driver forever
if you use something like sysfs I think, otherwise you will break some
userspace. However if you use debugfs I think its understood that's loose API.

I'd recommend instead to first see if you can get a mapping of the modes as
specific knobs / tunables through the networking stack, if so then those can
be used as triggers. If not, consider the *features* that are exposed by
the different firmwares and consider their need as triggers for a reload.
How many other devices do the same you do? In what modes?

> Current firmware subsystem doesn't seem to cater to this use case to
> well.

Its a matter of asking and talking. I've provided references of things to
try to address the hacky -EPROBE_DEFER. It does however require a userspace
daemon used, so it does require use of the uevent fallback mechanism.

> I have to look at the FPGA-related code.

Not sure how that would help. Is it huge firmware?

> The three main
> problems to solve are:
> - how to stay bound and retry the direct default FW load until rootfs
> is mounted (equivalent to when -EPROBE_DEFER would give up);

I've thrown a bone for that.

> - how to expose permanent FW loading sysfs interface which won't
> disappear after the first -1/1 is written to .../loading;

The lib/test_firmware.c driver has an example sysfs know a driver could use
on its own to load firmware. This is not as dynamic as you'd want, so I had
implemented an alternative interface which lets you customize hooks in userspace
first and then you just have a sync or async trigger for the test driver
data. It would seem this will not go upstream but you can look at it as an
example of what could be done:

https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20170605-driver-data
https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/commit/?h=20170605-driver-data&id=3696afe8d4aba5606dc8f3c562aeae1687f3b53e

But take the warning above about using sysfs serious, you don't want to break
userspace for users, and you want to see if you can first work towards something
more generic with the networking folks.

> - how to make sure different cards, which request the same file name
> can be served different default firmwares...

I believe your patch + the error path fix will handle this now, no?

Luis

>
> Thanks for the improved commit message!
>
> [1] HW links:
> https://www.hotchips.org/wp-content/uploads/hc_archives/hc25/HC25.60-Networking-epub/HC25.27.620-22nm-Flow-Proc-Stark-Netronome.pdf
> https://www.netronome.com/media/pdfs/NFP_Programming_Model_h6vxM7Y.pdf
> http://open-nfp.org/resources/
>

--
Luis Rodriguez, SUSE LINUX GmbH
Maxfeldstrasse 5; D-90409 Nuernberg