Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

From: Luis R. Rodriguez
Date: Fri Jun 09 2017 - 17:12:28 EST


On Fri, Jun 09, 2017 at 09:40:47AM +0200, Martin Fuzzey wrote:
> On 09/06/17 03:57, Luis R. Rodriguez wrote:
> > On Thu, Jun 8, 2017 at 6:10 PM, Luis R. Rodriguez <mcgrof@xxxxxxxxxx> wrote:
> > > > Android didn't send the signal, the kernel did (SIGCHLD).
> > > >
> > > > Like this:
> > > >
> > > > 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally
> > > > unrelated to firmware loading]
> > > > 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which
> > > > ends up calling request_firmware() kernel side
> > > > 3) The firmware loading fallback mechanism is used, the request is sent to
> > > > userspace and pid 1 waits in the kernel on wait_*
> > > > 4) before firmware loading completes pid 42 dies (for any reason - in my
> > > > case normal termination)
> > Martin just to be clear, by "normal case termination" do you mean
> > completing successfully ?? Ie the firmware actually did make it onto
> > the device ?
>
> The firmware did *not* make it onto the device since the request_firmware()
> call returned an error
> (the code that would have transfered it to the device is only executed
> following a successful request_firmware)
>
> The process that terminates normally is unrelated to firmware loading as I
> said above.
>
> The only things that matter are:
> - It is a child process of the process that calls request_firmware()
> - It terminates *while* the the wait_ is still in progress
>
>
> Here is a way of reproducing the problem using the test_firmware module
> (which I only just saw) on normal linux with no Android or custom driver
>
>
> #!/bin/sh
> set -e
>
> # Make sure the system firmware loader doesn't get in the way
> /etc/init.d/udev stop
>
> modprobe test_firmware
>
> DIR=/sys/devices/virtual/misc/test_firmware
>
> echo 10 >/sys/class/firmware/timeout;
> sleep 2 &
> echo -n "/some/non/existing/file.bin" > "$DIR"/trigger_request;
>
>
>
> If run with the "sleep 2 &" it terminates after 2 seconds
> If the sleep is commented it runs for the expected 10 seconds (the firmware
> loading timeout)
>
> Since the sleep process is a child of the script process requesting a
> firmware load its death causes a SIGCHLD causing request_firmware() to abort
> prematurely.

Thanks this could mean we also *should* trigger a failure if init is issuing
modprobe on a series of drivers and one completes before another while
request_firmware() is called on init or probe on a subsequent driver. If true
I'm surprised this never was reported back when the fallback mechanism was
popular, I suppose it was not an issue given most firmware *was* present on
/lib/firmware/ and the direct filesystem lookup first step always found the
firmware first, so this would only be an issue for folks relying on the
fallback mechanism exclusively.

Will include a test case based on your above script. Thanks!

Luis