Re: [alsa-devel] WARNING in snd_pcm_hw_param_first

From: Takashi Iwai
Date: Tue Jan 02 2018 - 08:05:50 EST


On Mon, 01 Jan 2018 17:14:25 +0100,
Takashi Iwai wrote:
>
> On Mon, 01 Jan 2018 11:29:51 +0100,
> Lars-Peter Clausen wrote:
> >
> > On 01/01/2018 10:03 AM, Takashi Iwai wrote:
> > [...]
> > >> CPU: 0 PID: 3502 Comm: syzkaller781065 Not tainted 4.15.0-rc5+ #154
> > >> Hardware name: Google Google Compute Engine/Google Compute Engine,
> > >> BIOS Google 01/01/2011
> > >> Call Trace:
> > >> __dump_stack lib/dump_stack.c:17 [inline]
> > >> dump_stack+0x194/0x257 lib/dump_stack.c:53
> > >> panic+0x1e4/0x41c kernel/panic.c:183
> > >> __warn+0x1dc/0x200 kernel/panic.c:547
> > >> report_bug+0x211/0x2d0 lib/bug.c:184
> > >> fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
> > >> fixup_bug arch/x86/kernel/traps.c:247 [inline]
> > >> do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
> > >> do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
> > >> invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1079
> > >> RIP: 0010:snd_pcm_hw_param_first+0x289/0x690 sound/core/pcm_lib.c:1635
> > >> RSP: 0018:ffff8801c013f1a0 EFLAGS: 00010293
> > >> RAX: ffff8801c03bc3c0 RBX: ffff8801bff08dc0 RCX: ffffffff841bee19
> > >> RDX: 0000000000000000 RSI: 00000000ffffffea RDI: ffffed0038027e28
> > >> RBP: ffff8801c013f1f0 R08: ffffed0038027d63 R09: ffff8801c013eb10
> > >> R10: 0000000000000001 R11: ffffed0038027d62 R12: 000000000000000d
> > >> R13: 00000000ffffffea R14: 0000000000000005 R15: 0000000000002000
> > >> snd_pcm_hw_param_near.constprop.27+0x78d/0x9a0 sound/core/oss/pcm_oss.c:457
> > >> snd_pcm_oss_change_params+0x17d3/0x3720 sound/core/oss/pcm_oss.c:969
> > >> snd_pcm_oss_make_ready+0xaa/0x130 sound/core/oss/pcm_oss.c:1128
> > >> snd_pcm_oss_sync+0x257/0x830 sound/core/oss/pcm_oss.c:1638
> > >> snd_pcm_oss_release+0x20b/0x280 sound/core/oss/pcm_oss.c:2431
> > >> __fput+0x327/0x7e0 fs/file_table.c:210
> > >> ____fput+0x15/0x20 fs/file_table.c:244
> > >> task_work_run+0x199/0x270 kernel/task_work.c:113
> > >> exit_task_work include/linux/task_work.h:22 [inline]
> > >> do_exit+0x9bb/0x1ad0 kernel/exit.c:865
> > >> do_group_exit+0x149/0x400 kernel/exit.c:968
> > >> SYSC_exit_group kernel/exit.c:979 [inline]
> > >> SyS_exit_group+0x1d/0x20 kernel/exit.c:977
> > >> do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
> > >> do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
> > >> entry_SYSENTER_compat+0x54/0x63 arch/x86/entry/entry_64_compat.S:129
> > >> RIP: 0023:0xf7f4ec79
> > >> RSP: 002b:00000000ffc2c18c EFLAGS: 00000292 ORIG_RAX: 00000000000000fc
> > >> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000080f0298
> > >> RDX: 0000000000000000 RSI: 00000000080d9b78 RDI: 00000000080f02a0
> > >> RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
> > >> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > >> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > >> Dumping ftrace buffer:
> > >> (ftrace buffer empty)
> > >> Kernel Offset: disabled
> > >> Rebooting in 86400 seconds..
> > >
> > > This must be a superfluous WARN_ON() call invoked by snd_BUG_ON()
> > > check that can be safely ignored. A quick fix patch is below.
> >
> > I believe those snd_BUG_ON() are there because these calls to refine
> > should never fail based on the checks done earlier. And them triggering
> > indicates that something is wrong somewhere else.
>
> I guess it's a place where no hw_params was set properly (it's through
> OSS emulation) but at closing it syncs forcibly that triggers the
> refine snd_BUG_ON(). But maybe better to double-check the condition,
> yes.

The situation became clearer after looking at the reproducer and the
configs closely. This happens because the program tries to open and
set up the loopback PCM devices concurrently. Since the aloop driver
modifies the hw_params restriction dynamically depending on the
connection, the configuration space restriction does change as well
even while trying to determine the parameters. Although the whole
code has a proper protection and no race happens, the space itself
changes; that resulted in the unexpected hw_refine error which assumes
only the static rules.

So I still think that the proper way to address this is just to get
rid of snd_BUG_ON() checks. We can't remove the dynamic configuration
of aloop, as it's the core feature. At most, we may invalidate the
configuration in the other side, but still it's tricky and fragile,
risky for regressions. In anyway, what's wrong is the assumption of
static restrictions in hw params engine, after all.

I'll update the text in the patch to reflect that.


thanks,

Takashi