Re: [PATCH v3 0/2] ALSA: core: Fix deadlock when shutdown a frozen userspace

From: Takashi Iwai
Date: Mon Nov 28 2022 - 04:54:02 EST


On Mon, 28 Nov 2022 10:26:36 +0100,
Ricardo Ribalda wrote:
>
> Hi Takashi
>
> Thanks for your prompt reply
>
> On Mon, 28 Nov 2022 at 10:24, Takashi Iwai <tiwai@xxxxxxx> wrote:
> >
> > On Mon, 28 Nov 2022 10:10:12 +0100,
> > Ricardo Ribalda wrote:
> > >
> > > Since 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown")
> > > we wait for userspace to close its fds.
> >
> > IMO, the fix above brought more problem. If you'd need to want to
> > avoid later accesses during shutdown, the driver should rather just
> > disconnect devices without waiting for the user-space completion.
> > And, for that, a simple call of snd_card_disconnect() should suffice.
> >
> > > But that will never occur with a frozen userspace (like during kexec()).
> > >
> > > Lets detect the frozen userpace and act accordingly.
> >
> > ... and skipping the user-space sync at snd_card_disconnect_sync() as
> > of this patch set is a dangerous move, I'm afraid. The user-space
> > gets frozen also at the normal suspend/resume, and it implies that the
> > sync will be lost even for the normal PM, too (although it must be a
> > very corner case).
> >
>
> And what about checking kexec_in_progress instead?

I still think that the call of snd_card_disconnect_sync() itself at
shutdown is somehow wrong. If this only comes from the SOF code path
above, we should address in that code path instead.

OTOH, you showed two code paths: one is

[ 84.943749] Freezing user space processes ... (elapsed 0.111 seconds) done.
[ 246.784446] INFO: task kexec-lite:5123 blocked for more than 122 seconds.
[ 246.819035] Call Trace:
[ 246.821782] <TASK>
[ 246.824186] __schedule+0x5f9/0x1263
[ 246.828231] schedule+0x87/0xc5
[ 246.831779] snd_card_disconnect_sync+0xb5/0x127
...
[ 246.889249] snd_sof_device_shutdown+0xb4/0x150
[ 246.899317] pci_device_shutdown+0x37/0x61
[ 246.903990] device_shutdown+0x14c/0x1d6
[ 246.908391] kernel_kexec+0x45/0xb9

and another is

[ 246.893222] INFO: task kexec-lite:4891 blocked for more than 122 seconds.
[ 246.927709] Call Trace:
[ 246.930461] <TASK>
[ 246.932819] __schedule+0x5f9/0x1263
[ 246.936855] ? fsnotify_grab_connector+0x5c/0x70
[ 246.942045] schedule+0x87/0xc5
[ 246.945567] schedule_timeout+0x49/0xf3
[ 246.949877] wait_for_completion+0x86/0xe8
[ 246.954463] snd_card_free+0x68/0x89
...
[ 247.001080] platform_device_unregister+0x12/0x35

The former is likely the SOF code path by the commit you mentioned
(but it's not 100% clear because you trimmed the stack trace), and
this should be reconsidered.

But, the latter seems to be independent from that. If that's the code
path where the unbind is triggered before kexec, your fix might not
work, either; it could be already at the wait_event*() when kexec
starts.

Maybe a simpler workaround would be to replace it with
wait_event_killable*() stuff. But whether we can discontinue the sync
there is still another thing to consider...


Takashi

>
> Thanks!
>
> >
> > thanks,
> >
> > Takashi
> >
> > >
> > > To: Jaroslav Kysela <perex@xxxxxxxx>
> > > To: Takashi Iwai <tiwai@xxxxxxxx>
> > > To: "Rafael J. Wysocki" <rafael@xxxxxxxxxx>
> > > To: Pavel Machek <pavel@xxxxxx>
> > > To: Len Brown <len.brown@xxxxxxxxx>
> > > To: Kai Vehmanen <kai.vehmanen@xxxxxxxxxxxxxxx>
> > > To: Ranjani Sridharan <ranjani.sridharan@xxxxxxxxxxxxxxx>
> > > To: Pierre-Louis Bossart <pierre-louis.bossart@xxxxxxxxxxxxxxx>
> > > To: Mark Brown <broonie@xxxxxxxxxx>
> > > Cc: alsa-devel@xxxxxxxxxxxxxxxx
> > > Cc: linux-kernel@xxxxxxxxxxxxxxx
> > > Cc: "Joel Fernandes (Google)" <joel@xxxxxxxxxxxxxxxxx>
> > > Cc: linux-pm@xxxxxxxxxxxxxxx
> > > Signed-off-by: Ricardo Ribalda <ribalda@xxxxxxxxxxxx>
> > > ---
> > > Changes in v3:
> > > - Wrap pm_freezing in a function
> > > - Link to v2: https://lore.kernel.org/r/20221127-snd-freeze-v2-0-d8a425ea9663@xxxxxxxxxxxx
> > >
> > > Changes in v2:
> > > - Only use pm_freezing if CONFIG_FREEZER
> > > - Link to v1: https://lore.kernel.org/r/20221127-snd-freeze-v1-0-57461a366ec2@xxxxxxxxxxxx
> > >
> > > ---
> > > Ricardo Ribalda (2):
> > > freezer: Add processes_frozen()
> > > ALSA: core: Fix deadlock when shutdown a frozen userspace
> > >
> > > include/linux/freezer.h | 2 ++
> > > kernel/freezer.c | 11 +++++++++++
> > > sound/core/init.c | 13 +++++++++++++
> > > 3 files changed, 26 insertions(+)
> > > ---
> > > base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4
> > > change-id: 20221127-snd-freeze-1ee143228326
> > >
> > > Best regards,
> > > --
> > > Ricardo Ribalda <ribalda@xxxxxxxxxxxx>
> > >
>
>
>
> --
> Ricardo Ribalda
>