Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic

From: Greg Kroah-Hartman
Date: Thu Oct 05 2023 - 11:53:04 EST


On Thu, Oct 05, 2023 at 10:03:11AM +0100, Lee Jones wrote:
> On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
>
> > On Wed, Oct 04, 2023 at 01:57:04PM +0100, Lee Jones wrote:
> > > On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> > >
> > > > On Wed, Oct 04, 2023 at 09:57:20AM +0100, Lee Jones wrote:
> > > > > On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> > > > >
> > > > > > On Wed, Oct 04, 2023 at 05:59:09AM +0000, Starke, Daniel wrote:
> > > > > > > > Daniel, any thoughts?
> > > > > > >
> > > > > > > Our application of this protocol is only with specific modems to enable
> > > > > > > circuit switched operation (handling calls, selecting/querying networks,
> > > > > > > etc.) while doing packet switched communication (i.e. IP traffic over PPP).
> > > > > > > The protocol was developed for such use cases.
> > > > > > >
> > > > > > > Regarding the issue itself:
> > > > > > > There was already an attempt to fix all this by switching from spinlocks to
> > > > > > > mutexes resulting in ~20% performance loss. However, the patch was reverted
> > > > > > > as it did not handle the T1 timer leading into sleep during atomic within
> > > > > > > gsm_dlci_t1() on every mutex lock there.
> > > > >
> > > > > That's correct. When I initially saw this report, my initial thought
> > > > > was to replace the spinlocks with mutexts, but having read the previous
> > > > > accepted attempt and it's subsequent reversion I started to think of
> > > > > other ways to solve this issue. This solution, unlike the last, does
> > > > > not involve adding sleep inducing locks into atomic contexts, nor
> > > > > should it negatively affect performance.
> > > > >
> > > > > > > There was also a suggestion to fix this in do_con_write() as
> > > > > > > tty_operations::write() appears to be documented as "not allowed to sleep".
> > > > > > > The patch for this was rejected. It did not fix the issue within n_gsm.
> > > > > > >
> > > > > > > Link: https://lore.kernel.org/all/20221203215518.8150-1-pchelkin@xxxxxxxxx/
> > > > > > > Link: https://lore.kernel.org/all/20221212023530.2498025-1-zengheng4@xxxxxxxxxx/
> > > > > > > Link: https://lore.kernel.org/all/5a994a13-d1f2-87a8-09e4-a877e65ed166@xxxxxxxxxx/
> > > > > >
> > > > > > Ok, I thought I remembered this, I'll just drop this patch from my
> > > > > > review queue and wait for a better solution if it ever comes up as this
> > > > > > isn't a real issue that people are seeing on actual systems, but just a
> > > > > > syzbot report.
> > > > >
> > > > > What does the "better solution" look like?
> > > >
> > > > One that actually fixes the root problem here (i.e. does not break the
> > > > recursion loop, or cause a performance decrease for normal users, or
> > > > prevent this from being bound to the console).
> > >
> > > Does this solution break the recursion loop or affect performance?
> >
> > This solution broke the recursion by returning an error, right?
>
> This is the part I was least sure about.
>
> If this was considered valid and we were to go forward with a solution
> like this, what would a quality improvement look like? Should we have
> stayed in this function and waited for the previous occupant to leave
> before continuing through ->write()?

This isn't valid, as it obviously never shows up in real use.

The real solution should be to prevent binding a console to this line
discipline as it can not handle the recursion that consoles require for
the write path.

Then, if consoles are really needed, the code can be fixed up to handle
such recursion. That's not a trivial thing to do, as can be seen by the
crazy gyrations that the n_tty line discipline does in its write path...

thanks,

greg k-h