Re: [PATCH] ASoC: soc-pcm: Fix and cleanup DPCM locking

From: Greg KH
Date: Thu May 11 2023 - 09:33:40 EST


On Thu, May 11, 2023 at 04:06:42PM +0800, Yixuan Jiang wrote:
> Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> 於 2023年5月10日 週三 下午10:40寫道:
> >
> > On Wed, May 10, 2023 at 07:59:49PM +0800, Yixuan Jiang wrote:
> > > Greg KH <greg@xxxxxxxxx> 於 2023年5月6日 週六 下午1:56寫道:
> > > >
> > > > On Thu, May 04, 2023 at 05:21:42PM +0800, yixuanjiang wrote:
> > > > > From: Takashi Iwai <tiwai@xxxxxxx>
> > > > >
> > > > > The existing locking for DPCM has several issues
> > > > > a) a confusing mix of card->mutex and card->pcm_mutex.
> > > > > b) a dpcm_lock spinlock added inconsistently and on paths that could
> > > > > be recursively taken. The use of irqsave/irqrestore was also overkill.
> > > > >
> > > > > The suggested model is:
> > > > >
> > > > > 1) The pcm_mutex is the top-most protection of BE links in the FE. The
> > > > > pcm_mutex is applied always on either the top PCM callbacks or the
> > > > > external call from DAPM, not taken in the internal functions.
> > > > >
> > > > > 2) the FE stream lock is taken in higher levels before invoking
> > > > > dpcm_be_dai_trigger()
> > > > >
> > > > > 3) when adding and deleting a BE, both the pcm_mutex and FE stream
> > > > > lock are taken.
> > > > >
> > > > > Signed-off-by: Takashi Iwai <tiwai@xxxxxxx>
> > > > > [clarification of commit message by plbossart]
> > > > > Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@xxxxxxxxxxxxxxx>
> > > > > Reviewed-by: Kai Vehmanen <kai.vehmanen@xxxxxxxxxxxxxxx>
> > > > > Reviewed-by: Bard Liao <yung-chuan.liao@xxxxxxxxxxxxxxx>
> > > > > Reviewed-by: Ranjani Sridharan <ranjani.sridharan@xxxxxxxxxxxxxxx>
> > > > > Link: https://lore.kernel.org/r/20211207173745.15850-4-pierre-louis.bossart@xxxxxxxxxxxxxxx
> > > > > Cc: stable@xxxxxxxxxxxxxxx # 5.15.x
> > > > > Signed-off-by: Mark Brown <broonie@xxxxxxxxxx>
> > > > > ---
> > > >
> > > > What is the git commit id of this patch in Linus's tree?
> > > >
> > > > thanks,
> > > >
> > > > greg k-h
> > >
> > > Hi Greg,
> > > For this patch I think it is [3/6] b7898396f4bbe160f546d0c5e9fa17cca9a7d153
> > >
> > > >From https://lore.kernel.org/all/163953384515.1515253.13641477106348913835.b4-ty@xxxxxxxxxx/
> > > Seems there are total 6 patches.
> > >
> > > [1/6] ASoC: soc-pcm: use GFP_ATOMIC for dpcm structure
> > > commit: d8a9c6e1f6766a16cf02b4e99a629f3c5512c183
> > > [2/6] ASoC: soc-pcm: align BE 'atomicity' with that of the FE
> > > commit: bbf7d3b1c4f40eb02dd1dffb500ba00b0bff0303
> > > [3/6] ASoC: soc-pcm: Fix and cleanup DPCM locking
> > > commit: b7898396f4bbe160f546d0c5e9fa17cca9a7d153
> > > [4/6] ASoC: soc-pcm: serialize BE triggers
> > > commit: b2ae80663008a7662febe7d13f14ea1b2eb0cd51
> > > [5/6] ASoC: soc-pcm: test refcount before triggering
> > > commit: 848aedfdc6ba25ad5652797db9266007773e44dd
> > > [6/6] ASoC: soc-pcm: fix BE handling of PAUSE_RELEASE
> > > commit: 3aa1e96a2b95e2ece198f8dd01e96818971b84df
> > >
> > > These 6 patches could directly cherry-pick to in 5.15 without conflict.
> >
> > Then please submit them for stable inclusion after you have tested that
> > they all work properly. But first, what bug is actually needed to be
> > fixed here? What is not working that this patch series fixes?
> >
> > thanks,
> >
> > greg k-h
>
> Hi Greg,
>
> The bug is, in 5.15
> It will always deadlock after stop compress playback.
>
> The patch A
> ASoC: soc-compress: Reposition and add pcm_mutex commit:
> aa9ff6a4955fdba02b54fbc4386db876603703b7
> >From patch A comment it is about to fix the issue by adding lock hold
> becasue patch B will check if lock is held.
>
> The patch B
> ASoC: soc-pcm: Fix and cleanup DPCM locking commit:
> b7898396f4bbe160f546d0c5e9fa17cca9a7d153
> Patch B remove lock aquire then check if lock is already held.
>
> In 5.15 it only include patch A then cause the deadlock.
>
> [ 198.670679][ T1] Call trace:
> [ 198.670690][ T1] __switch_to+0x174/0x328
> [ 198.670744][ T1] __schedule+0x5d0/0xaec
> [ 198.670784][ T1] schedule+0xc8/0x134
> [ 198.670803][ T1] schedule_preempt_disabled+0x30/0x50
> [ 198.670820][ T1] __mutex_lock+0x39c/0xa70
> [ 198.670845][ T1] __mutex_lock_slowpath+0x1c/0x2c
> [ 198.670862][ T1] mutex_lock+0x4c/0x104
> [ 198.670878][ T1] soc_pcm_hw_clean+0x38/0x16c <--
> Patch B will remove lock aquire, if no patch B, it will aquire lock
> again then cause AA deadlock
> [ 198.670958][ T1] dpcm_be_dai_hw_free+0x17c/0x1b4
> [ 198.670983][ T1] soc_compr_free_fe+0x84/0x158 <--
> Patch A aquire the lock
> [ 198.671025][ T1] snd_compr_free+0xac/0x148
>
> So is it better by revert patch A because purpose of patch A doesn't
> exist in 5.15 ?
> Or just backport full 6 patches series B to 5.15 ?

A full backport is always best.

thanks,

greg k-h