Re: [PATCH 4/6] ASoC: soc-pcm: serialize BE triggers

From: Marek Szyprowski
Date: Fri Dec 17 2021 - 06:57:53 EST


Dear All,

On 17.12.2021 11:11, Marek Szyprowski wrote:
> On 07.12.2021 18:37, Pierre-Louis Bossart wrote:
>> From: Takashi Iwai <tiwai@xxxxxxx>
>>
>> When more than one FE is connected to a BE, e.g. in a mixing use case,
>> the BE can be triggered multiple times when the FE are opened/started
>> concurrently. This race condition is problematic in the case of
>> SoundWire BE dailinks, and this is not desirable in a general
>> case.
>>
>> This patch relies on the existing BE PCM lock, which takes atomicity
>> into
>> account. The locking model assumes that all interactions start with
>> the FE, so that there is no deadlock between FE and BE locks.
>>
>> Signed-off-by: Takashi Iwai <tiwai@xxxxxxx>
>> [test, checkpatch fix and clarification of commit message by plbossart]
>> Signed-off-by: Pierre-Louis Bossart
>> <pierre-louis.bossart@xxxxxxxxxxxxxxx>
>> Reviewed-by: Kai Vehmanen <kai.vehmanen@xxxxxxxxxxxxxxx>
>> Reviewed-by: Bard Liao <yung-chuan.liao@xxxxxxxxxxxxxxx>
>> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@xxxxxxxxxxxxxxx>
>
> This patch recently landed in linux-next (next-20211215) as commit
> b2ae80663008 ("ASoC: soc-pcm: serialize BE triggers").

It turned out that the issue is caused by the previous patch "[PATCH
3/6] ASoC: soc-pcm: Fix and cleanup DPCM locking
<https://lore.kernel.org/all/20211207173745.15850-4-pierre-louis.bossart@xxxxxxxxxxxxxxx/#r>",
merged as commit b7898396f4bb ("ASoC: soc-pcm: Fix and cleanup DPCM
locking"), for some reasons 'git bisect' pointed me initially to the
wrong patch. I will re-report it under the proper patch. Reverting it
and the next patches on top of linux-next fixes the issue.


> I found that it triggers a warning on my test boards. This is the one
> from Exynos4412-based Odroid U3 board:
>
> # speaker-test -l1
>
> speaker-test 1.1.8
>
> Playback device is default
> Stream parameters are 48000Hz, S16_LE, 1 channels
> Using 16 octaves of pink noise
> Rate set to 48000Hz (requested 48000Hz)
> Buffer size range from 128 to 131072
> Period size range from 64 to 65536
> Using max buffer size 131072
> Periods = 4
> was set period_size = 32768
> was set buffer_size = 131072
>  0 - Front Left
>
> ============================================
> WARNING: possible recursive locking detected
> 5.16.0-rc1-00270-gb2ae80663008 #11109 Not tainted
> --------------------------------------------
> speaker-test/1312 is trying to acquire lock:
> c1d78ca4 (&group->lock){....}-{2:2}, at: dpcm_be_dai_trigger+0x80/0x300
>
> but task is already holding lock:
> c1d788a4 (&group->lock){....}-{2:2}, at:
> snd_pcm_action_lock_irq+0x68/0x7c
>
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>
>        CPU0
>        ----
>   lock(&group->lock);
>   lock(&group->lock);
>
>  *** DEADLOCK ***
>
>  May be due to missing lock nesting notation
>
> 1 lock held by speaker-test/1312:
>  #0: c1d788a4 (&group->lock){....}-{2:2}, at:
> snd_pcm_action_lock_irq+0x68/0x7c
>
> stack backtrace:
> CPU: 0 PID: 1312 Comm: speaker-test Not tainted
> 5.16.0-rc1-00270-gb2ae80663008 #11109
> Hardware name: Samsung Exynos (Flattened Device Tree)
> [<c0110b30>] (unwind_backtrace) from [<c010c618>] (show_stack+0x10/0x14)
> [<c010c618>] (show_stack) from [<c0b65d24>] (dump_stack_lvl+0x58/0x70)
> [<c0b65d24>] (dump_stack_lvl) from [<c0193740>]
> (__lock_acquire+0x15ac/0x319c)
> [<c0193740>] (__lock_acquire) from [<c0195dd8>]
> (lock_acquire+0x14c/0x424)
> [<c0195dd8>] (lock_acquire) from [<c0b745b8>]
> (_raw_spin_lock_irqsave+0x44/0x60)
> [<c0b745b8>] (_raw_spin_lock_irqsave) from [<c0926b6c>]
> (dpcm_be_dai_trigger+0x80/0x300)
> [<c0926b6c>] (dpcm_be_dai_trigger) from [<c0927004>]
> (dpcm_fe_dai_do_trigger+0x124/0x1e4)
> [<c0927004>] (dpcm_fe_dai_do_trigger) from [<c090728c>]
> (snd_pcm_action+0x74/0xb0)
> [<c090728c>] (snd_pcm_action) from [<c0907eac>]
> (snd_pcm_action_lock_irq+0x3c/0x7c)
> [<c0907eac>] (snd_pcm_action_lock_irq) from [<c02f13a0>]
> (sys_ioctl+0x568/0xd44)
> [<c02f13a0>] (sys_ioctl) from [<c0100060>] (ret_fast_syscall+0x0/0x1c)
> Exception stack(0xc4777fa8 to 0xc4777ff0)
> 7fa0:                   004f5210 b6e27394 00000004 00004142 004f5398
> 004f5398
> 7fc0: 004f5210 b6e27394 00020000 00000036 00000000 00000000 bee588e8
> 00008000
> 7fe0: b6e277c4 bee58874 b6d8e888 b6c751dc
> Time per period = 0.253397
> max98090 1-0010: PLL unlocked
> BUG: sleeping function called from invalid context at
> kernel/locking/rwsem.c:1526
> in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 1312, name:
> speaker-test
> preempt_count: 1, expected: 0
> RCU nest depth: 0, expected: 0
> INFO: lockdep is turned off.
> irq event stamp: 8158
> hardirqs last  enabled at (8157): [<c0b747d0>]
> _raw_spin_unlock_irqrestore+0x5c/0x60
> hardirqs last disabled at (8158): [<c0b74570>]
> _raw_spin_lock_irq+0x58/0x5c
> softirqs last  enabled at (7854): [<c0101578>] __do_softirq+0x348/0x610
> softirqs last disabled at (7849): [<c012e7a4>] __irq_exit_rcu+0x144/0x1ec
> Preemption disabled at:
> [<00000000>] 0x0
> CPU: 0 PID: 1312 Comm: speaker-test Not tainted
> 5.16.0-rc1-00270-gb2ae80663008 #11109
> Hardware name: Samsung Exynos (Flattened Device Tree)
> [<c0110b30>] (unwind_backtrace) from [<c010c618>] (show_stack+0x10/0x14)
> [<c010c618>] (show_stack) from [<c0b65d24>] (dump_stack_lvl+0x58/0x70)
> [<c0b65d24>] (dump_stack_lvl) from [<c0158b04>]
> (__might_resched+0x1c0/0x288)
> [<c0158b04>] (__might_resched) from [<c0b71898>] (down_write+0x24/0x8c)
> [<c0b71898>] (down_write) from [<c030ed64>]
> (simple_recursive_removal+0x6c/0x370)
> [<c030ed64>] (simple_recursive_removal) from [<c04d07a4>]
> (debugfs_remove+0x38/0x4c)
> [<c04d07a4>] (debugfs_remove) from [<c0928784>]
> (dpcm_be_disconnect+0x160/0x2c4)
> [<c0928784>] (dpcm_be_disconnect) from [<c092895c>]
> (dpcm_fe_dai_cleanup+0x74/0xb0)
> [<c092895c>] (dpcm_fe_dai_cleanup) from [<c0928d90>]
> (dpcm_fe_dai_close+0xe8/0x14c)
> [<c0928d90>] (dpcm_fe_dai_close) from [<c090977c>]
> (snd_pcm_release_substream.part.0+0x3c/0xcc)
> [<c090977c>] (snd_pcm_release_substream.part.0) from [<c0909878>]
> (snd_pcm_release+0x54/0xa4)
> [<c0909878>] (snd_pcm_release) from [<c02dc400>] (__fput+0x88/0x258)
> [<c02dc400>] (__fput) from [<c014cd44>] (task_work_run+0x8c/0xc8)
> [<c014cd44>] (task_work_run) from [<c010c08c>]
> (do_work_pending+0x4a4/0x598)
> [<c010c08c>] (do_work_pending) from [<c0100088>]
> (slow_work_pending+0xc/0x20)
> Exception stack(0xc4777fb0 to 0xc4777ff8)
> 7fa0:                                     00000000 004f5260 004eaa9c
> 00000000
> 7fc0: 004f5260 004f536c 004f5210 00000006 004fb700 004e6e8c 004d6120
> bee58cc4
> 7fe0: b6e27e64 bee58928 b6d8eda4 b6d09ac0 60000050 00000004
>
> Let me know how I can help debugging this issue.
>
>> ---
>>   sound/soc/soc-pcm.c | 46 ++++++++++++++++++++++++++++-----------------
>>   1 file changed, 29 insertions(+), 17 deletions(-)
>>
>> diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
>> index 2e282c42bac2..7043857e30b1 100644
>> --- a/sound/soc/soc-pcm.c
>> +++ b/sound/soc/soc-pcm.c
>> @@ -46,12 +46,18 @@ static inline void
>> snd_soc_dpcm_stream_lock_irq(struct snd_soc_pcm_runtime *rtd,
>>       snd_pcm_stream_lock_irq(snd_soc_dpcm_get_substream(rtd, stream));
>>   }
>>   +#define snd_soc_dpcm_stream_lock_irqsave(rtd, stream, flags) \
>> +    snd_pcm_stream_lock_irqsave(snd_soc_dpcm_get_substream(rtd,
>> stream), flags)
>> +
>>   static inline void snd_soc_dpcm_stream_unlock_irq(struct
>> snd_soc_pcm_runtime *rtd,
>>                             int stream)
>>   {
>>       snd_pcm_stream_unlock_irq(snd_soc_dpcm_get_substream(rtd,
>> stream));
>>   }
>>   +#define snd_soc_dpcm_stream_unlock_irqrestore(rtd, stream, flags) \
>> + snd_pcm_stream_unlock_irqrestore(snd_soc_dpcm_get_substream(rtd,
>> stream), flags)
>> +
>>   #define DPCM_MAX_BE_USERS    8
>>     static inline const char *soc_cpu_dai_name(struct
>> snd_soc_pcm_runtime *rtd)
>> @@ -2079,6 +2085,7 @@ int dpcm_be_dai_trigger(struct
>> snd_soc_pcm_runtime *fe, int stream,
>>   {
>>       struct snd_soc_pcm_runtime *be;
>>       struct snd_soc_dpcm *dpcm;
>> +    unsigned long flags;
>>       int ret = 0;
>>         for_each_dpcm_be(fe, stream, dpcm) {
>> @@ -2087,9 +2094,11 @@ int dpcm_be_dai_trigger(struct
>> snd_soc_pcm_runtime *fe, int stream,
>>           be = dpcm->be;
>>           be_substream = snd_soc_dpcm_get_substream(be, stream);
>>   +        snd_soc_dpcm_stream_lock_irqsave(be, stream, flags);
>> +
>>           /* is this op for this BE ? */
>>           if (!snd_soc_dpcm_be_can_update(fe, be, stream))
>> -            continue;
>> +            goto next;
>>             dev_dbg(be->dev, "ASoC: trigger BE %s cmd %d\n",
>>               be->dai_link->name, cmd);
>> @@ -2099,77 +2108,80 @@ int dpcm_be_dai_trigger(struct
>> snd_soc_pcm_runtime *fe, int stream,
>>               if ((be->dpcm[stream].state !=
>> SND_SOC_DPCM_STATE_PREPARE) &&
>>                   (be->dpcm[stream].state != SND_SOC_DPCM_STATE_STOP) &&
>>                   (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
>> -                continue;
>> +                goto next;
>>                 ret = soc_pcm_trigger(be_substream, cmd);
>>               if (ret)
>> -                goto end;
>> +                goto next;
>>                 be->dpcm[stream].state = SND_SOC_DPCM_STATE_START;
>>               break;
>>           case SNDRV_PCM_TRIGGER_RESUME:
>>               if ((be->dpcm[stream].state !=
>> SND_SOC_DPCM_STATE_SUSPEND))
>> -                continue;
>> +                goto next;
>>                 ret = soc_pcm_trigger(be_substream, cmd);
>>               if (ret)
>> -                goto end;
>> +                goto next;
>>                 be->dpcm[stream].state = SND_SOC_DPCM_STATE_START;
>>               break;
>>           case SNDRV_PCM_TRIGGER_PAUSE_RELEASE:
>>               if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
>> -                continue;
>> +                goto next;
>>                 ret = soc_pcm_trigger(be_substream, cmd);
>>               if (ret)
>> -                goto end;
>> +                goto next;
>>                 be->dpcm[stream].state = SND_SOC_DPCM_STATE_START;
>>               break;
>>           case SNDRV_PCM_TRIGGER_STOP:
>>               if ((be->dpcm[stream].state !=
>> SND_SOC_DPCM_STATE_START) &&
>>                   (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
>> -                continue;
>> +                goto next;
>>                 if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream))
>> -                continue;
>> +                goto next;
>>                 ret = soc_pcm_trigger(be_substream, cmd);
>>               if (ret)
>> -                goto end;
>> +                goto next;
>>                 be->dpcm[stream].state = SND_SOC_DPCM_STATE_STOP;
>>               break;
>>           case SNDRV_PCM_TRIGGER_SUSPEND:
>>               if (be->dpcm[stream].state != SND_SOC_DPCM_STATE_START)
>> -                continue;
>> +                goto next;
>>                 if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream))
>> -                continue;
>> +                goto next;
>>                 ret = soc_pcm_trigger(be_substream, cmd);
>>               if (ret)
>> -                goto end;
>> +                goto next;
>>                 be->dpcm[stream].state = SND_SOC_DPCM_STATE_SUSPEND;
>>               break;
>>           case SNDRV_PCM_TRIGGER_PAUSE_PUSH:
>>               if (be->dpcm[stream].state != SND_SOC_DPCM_STATE_START)
>> -                continue;
>> +                goto next;
>>                 if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream))
>> -                continue;
>> +                goto next;
>>                 ret = soc_pcm_trigger(be_substream, cmd);
>>               if (ret)
>> -                goto end;
>> +                goto next;
>>                 be->dpcm[stream].state = SND_SOC_DPCM_STATE_PAUSED;
>>               break;
>>           }
>> +next:
>> +        snd_soc_dpcm_stream_unlock_irqrestore(be, stream, flags);
>> +        if (ret)
>> +            break;
>>       }
>> -end:
>>       if (ret < 0)
>>           dev_err(fe->dev, "ASoC: %s() failed at %s (%d)\n",
>>               __func__, be->dai_link->name, ret);
>
> Best regards

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland