Re: threads-max observe limits

From: Eric W. Biederman
Date: Tue Sep 17 2019 - 13:26:43 EST


Michal Hocko <mhocko@xxxxxxxxxx> writes:

> On Tue 17-09-19 17:28:02, Heinrich Schuchardt wrote:
>>
>> On 9/17/19 12:03 PM, Michal Hocko wrote:
>> > Hi,
>> > I have just stumbled over 16db3d3f1170 ("kernel/sysctl.c: threads-max
>> > observe limits") and I am really wondering what is the motivation behind
>> > the patch. We've had a customer noticing the threads_max autoscaling
>> > differences btween 3.12 and 4.4 kernels and wanted to override the auto
>> > tuning from the userspace, just to find out that this is not possible.
>>
>> set_max_threads() sets the upper limit (max_threads_suggested) for
>> threads such that at a maximum 1/8th of the total memory can be occupied
>> by the thread's administrative data (of size THREADS_SIZE). On my 32 GiB
>> system this results in 254313 threads.
>
> This is quite arbitrary, isn't it? What would happen if the limit was
> twice as large?
>
>> With patch 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits")
>> a user cannot set an arbitrarily high number for
>> /proc/sys/kernel/threads-max which could lead to a system stalling
>> because the thread headers occupy all the memory.
>
> This is still a decision of the admin to make. You can consume the
> memory by other means and that is why we have measures in place. E.g.
> memcg accounting.
>
>> When developing the patch I remarked that on a system where memory is
>> installed dynamically it might be a good idea to recalculate this limit.
>> If you have a system that boots with let's say 8 GiB and than
>> dynamically installs a few TiB of RAM this might make sense. But such a
>> dynamic update of thread_max_suggested was left out for the sake of
>> simplicity.
>>
>> Anyway if more than 100,000 threads are used on a system, I would wonder
>> if the software should not be changed to use thread-pools instead.
>
> You do not change the software to overcome artificial bounds based on
> guessing.
>
> So can we get back to the justification of the patch. What kind of
> real life problem does it solve and why is it ok to override an admin
> decision?
> If there is no strong justification then the patch should be reverted
> because from what I have heard it has been noticed and it has broken
> a certain deployment. I am not really clear about technical details yet
> but it seems that there are workloads that believe they need to touch
> this tuning and complain if that is not possible.

Taking a quick look myself.

I am completely mystified by both sides of this conversation.

a) The logic to set the default number of threads in a system
has not changed since 2.6.12-rc2 (the start of the git history).

The implementation has changed but we should still get the same
value. So anyone seeing threads_max autoscaling differences
between kernels is either seeing a bug in the rewritten formula
or something else weird is going on.

Michal is it a very small effect your customers are seeing?
Is it another bug somewhere else?

b) Not being able to bump threads_max to the physical limit of
the machine is very clearly a regression.

Limiting threads_max to THREADS_MIN on the low end and THREAD_MAX on
the high end is reasonable, because linux can't cope with values
outside of that range. Limiting threads_max to the auto-scaling value
is a regression.

The point of limits like threads_max is to have something that 99%
of people won't hit and if they do it will indicate a bug in their
application. And to generally keep the kernel working when an
application bug happens.

But there are always cases where heuristics fail so it is completely
reasonable to allow these values to be manually tuned.

Eric