Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes

From: Linux regression tracking #adding (Thorsten Leemhuis)
Date: Fri Sep 22 2023 - 07:14:19 EST


[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 20.09.23 10:11, Jeremi Piotrowski wrote:
> On Sun, Sep 17, 2023 at 09:12:40PM +0200, Greg Kroah-Hartman wrote:
>> 6.1-stable review patch. If anyone has any objections, please let me know.
>>
>> ------------------
>
> Hi Greg/Michal,
>
> This commit breaks userspace which makes it a bad commit for mainline and an
> even worse commit for stable.
>
> We ingested 6.1.54 into our nightly testing and found that runc fails to gather
> cgroup statistics (when reading kmem.limit_in_bytes). The same code is vendored
> into kubelet and kubelet fails to start if this operation fails. 6.1.53 is
> fine.
>
>> Address this by wiping out the file completely and effectively get back to
>> pre 4.5 era and CONFIG_MEMCG_KMEM=n configuration.
>
> On reads, the runc code checks for MEMCG_KMEM=n by checking
> kmem.usage_in_bytes. If it is present then runc expects the other cgroup files
> to be there (including kmem.limit_in_bytes). So this change is not effectively
> the same.
>
> Here's a link to the PR that would be needed to handle this change in userspace
> (not merged yet and would need to be propagated through the ecosystem):
>
> https://github.com/opencontainers/runc/pull/4018.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced 86327e8eb94c52
#regzbot title mm, memcg: runc fails to gather cgroup statistics
#regzbot fix: mm, memcg: reconsider kmem.limit_in_bytes deprecation
#regzbot ignore-activity

FWIW, the porposed fix can be found here:
https://lore.kernel.org/all/ZQwnUpX7FlzIOWXP@xxxxxxxxxxxxxx/

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.