Re: [PATCH 4.19 56/81] kernel/sysctl.c: do not override max_threads provided by userspace

From: Michal Hocko
Date: Mon Nov 18 2019 - 10:52:05 EST


On Mon 18-11-19 16:25:58, Pavel Machek wrote:
> Hi!
>
> > > > From: Michal Hocko <mhocko@xxxxxxxx>
> > > >
> > > > commit b0f53dbc4bc4c371f38b14c391095a3bb8a0bb40 upstream.
> > > >
> > > > Partially revert 16db3d3f1170 ("kernel/sysctl.c: threads-max observe
> > > > limits") because the patch is causing a regression to any workload which
> > > > needs to override the auto-tuning of the limit provided by kernel.
> > > >
> > > > set_max_threads is implementing a boot time guesstimate to provide a
> > > > sensible limit of the concurrently running threads so that runaways will
> > > > not deplete all the memory. This is a good thing in general but there
> > > > are workloads which might need to increase this limit for an application
> > > > to run (reportedly WebSpher MQ is affected) and that is simply not
> > > > possible after the mentioned change. It is also very dubious to
> > > > override an admin decision by an estimation that doesn't have any direct
> > > > relation to correctness of the kernel operation.
> > > >
> > > > Fix this by dropping set_max_threads from sysctl_max_threads so any
> > > > value is accepted as long as it fits into MAX_THREADS which is important
> > > > to check because allowing more threads could break internal robust futex
> > > > restriction. While at it, do not use MIN_THREADS as the lower boundary
> > > > because it is also only a heuristic for automatic estimation and admin
> > > > might have a good reason to stop new threads to be created even when
> > > > below this limit.
> > >
> > > Ok, why not, but I smell followup work could be done:
> > >
> > > > @@ -2635,7 +2635,7 @@ int sysctl_max_threads(struct ctl_table
> > > > if (ret || !write)
> > > > return ret;
> > > >
> > > > - set_max_threads(threads);
> > > > + max_threads = threads;
> > > >
> > >
> > > AFAICT set_max_threads can now become __init.
> >
> > Yes. Care to send a patch?
>
> I'm not usually hacking in that area. Could you do that?

I can put it on my ever growing todo list. But this should be a low
hanging fruit that doesn't really require a deep understanding of the
specific subsystem.

> > > Plus, I don't see any locking here, should this be WRITE_ONCE() at
> > > minimum?
> >
> > Why would that matter? Do you expect several root processes race to set
> > the value?
>
> Well, for example to warn humans that this code is accessing unlocked
> variable. Second, as is, code is not valid C and compilers are
> allowed to do strange stuff ("undefined behaviour"). Third, there are
> concurency checkers that will not like this one.

I do not see any undefined behahvior in assigning an integer in a
lockless manner if there are no actual consistency issues. If this is
not the case then please do describe them in a specific manner.
--
Michal Hocko
SUSE Labs