Aw: Re: [External] : nfsd: memory leak when client does many file operations

From: Jan Schunk
Date: Tue Mar 26 2024 - 13:07:31 EST


Before I start doing this on my own build I tried it with unmodified linux-image-6.6.13+bpo-amd64 from Debian 12.
I installed systemtap, linux-headers-6.6.13+bpo-amd64 and linux-image-6.6.13+bpo-amd64-dbg and tried to run stap:

user@deb:~$ sudo stap -v --all-modules kmem_alloc.stp nfsd_file
WARNING: Kernel function symbol table missing [man warning::symbols]
Pass 1: parsed user script and 484 library scripts using 110120virt/96896res/7168shr/89800data kb, in 1360usr/1080sys/4963real ms.
WARNING: cannot find module kernel debuginfo: No DWARF information found [man warning::debuginfo]
semantic error: resolution failed in DWARF builder

semantic error: while resolving probe point: identifier 'kernel' at kmem_alloc.stp:5:7
source: probe kernel.function("kmem_cache_alloc") {
^

semantic error: no match

Pass 2: analyzed script: 1 probe, 5 functions, 1 embed, 3 globals using 112132virt/100352res/8704shr/91792data kb, in 30usr/30sys/167real ms.
Pass 2: analysis failed. [man error::pass2]
Tip: /usr/share/doc/systemtap/README.Debian should help you get started.
user@deb:~$

user@deb:~$ grep -E 'CONFIG_DEBUG_INFO|CONFIG_KPROBES|CONFIG_DEBUG_FS|CONFIG_RELAY' /boot/config-6.6.13+bpo-amd64
CONFIG_RELAY=y
CONFIG_KPROBES=y
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_INFO_NONE is not set
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
# CONFIG_DEBUG_INFO_DWARF4 is not set
# CONFIG_DEBUG_INFO_DWARF5 is not set
# CONFIG_DEBUG_INFO_REDUCED is not set
CONFIG_DEBUG_INFO_COMPRESSED_NONE=y
# CONFIG_DEBUG_INFO_COMPRESSED_ZLIB is not set
# CONFIG_DEBUG_INFO_SPLIT is not set
CONFIG_DEBUG_INFO_BTF=y
CONFIG_DEBUG_INFO_BTF_MODULES=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y
# CONFIG_DEBUG_FS_DISALLOW_MOUNT is not set
# CONFIG_DEBUG_FS_ALLOW_NONE is not set
user@deb:~$

Do I need to enable other options?


> Gesendet: Dienstag, den 26.03.2024 um 12:15 Uhr
> Von: "Benjamin Coddington" <bcodding@xxxxxxxxxx>
> An: "Chuck Lever III" <chuck.lever@xxxxxxxxxx>
> Cc: "Jan Schunk" <scpcom@xxxxxx>, "Jeff Layton" <jlayton@xxxxxxxxxx>, "Neil Brown" <neilb@xxxxxxx>, "Olga Kornievskaia" <kolga@xxxxxxxxxx>, "Dai Ngo" <dai.ngo@xxxxxxxxxx>, "Tom Talpey" <tom@xxxxxxxxxx>, "Linux NFS Mailing List" <linux-nfs@xxxxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx
> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
>
> On 25 Mar 2024, at 16:11, Chuck Lever III wrote:
>
> >> On Mar 25, 2024, at 3:55 PM, Jan Schunk <scpcom@xxxxxx> wrote:
> >>
> >> The VM is now running 20 hours with 512MB RAM, no desktop, without the "noatime" mount option and without the "async" export option.
> >>
> >> Currently there is no issue, but the memory usage is still contantly growing. It may just take longer before something happens.
> >>
> >> top - 00:49:49 up 3 min, 1 user, load average: 0,21, 0,19, 0,09
> >> Tasks: 111 total, 1 running, 110 sleeping, 0 stopped, 0 zombie
> >> %CPU(s): 0,2 us, 0,3 sy, 0,0 ni, 99,5 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
> >> MiB Spch: 467,0 total, 302,3 free, 89,3 used, 88,1 buff/cache
> >> MiB Swap: 975,0 total, 975,0 free, 0,0 used. 377,7 avail Spch
> >>
> >> top - 15:05:39 up 14:19, 1 user, load average: 1,87, 1,72, 1,65
> >> Tasks: 104 total, 1 running, 103 sleeping, 0 stopped, 0 zombie
> >> %CPU(s): 0,2 us, 4,9 sy, 0,0 ni, 53,3 id, 39,0 wa, 0,0 hi, 2,6 si, 0,0 st
> >> MiB Spch: 467,0 total, 21,2 free, 147,1 used, 310,9 buff/cache
> >> MiB Swap: 975,0 total, 952,9 free, 22,1 used. 319,9 avail Spch
> >>
> >> top - 20:48:16 up 20:01, 1 user, load average: 5,02, 2,72, 2,08
> >> Tasks: 104 total, 5 running, 99 sleeping, 0 stopped, 0 zombie
> >> %CPU(s): 0,2 us, 46,4 sy, 0,0 ni, 11,9 id, 2,3 wa, 0,0 hi, 39,2 si, 0,0 st
> >> MiB Spch: 467,0 total, 16,9 free, 190,8 used, 271,6 buff/cache
> >> MiB Swap: 975,0 total, 952,9 free, 22,1 used. 276,2 avail Spch
> >
> > I don't see anything in your original memory dump that
> > might account for this. But I'm at a loss because I'm
> > a kernel developer, not a support guy -- I don't have
> > any tools or expertise that can troubleshoot a system
> > without rebuilding a kernel with instrumentation. My
> > first instinct is to tell you to bisect between v6.3
> > and v6.4, or at least enable kmemleak, but I'm guessing
> > you don't build your own kernels.
> >
> > My only recourse at this point would be to try to
> > reproduce it myself, but unfortunately I've just
> > upgraded my whole lab to Fedora 39, and there's a grub
> > bug that prevents booting any custom-built kernel
> > on my hardware.
> >
> > So I'm stuck until I can nail that down. Anyone else
> > care to help out?
>
> Sure - I can throw some stuff..
>
> Can we dig into which memory slabs might be growing? Something like:
>
> watch -d "cat /proc/slabinfo | grep nfsd"
>
> .. for a bit might show what is growing.
>
> Then use a systemtap script like the one below to trace the allocations - use:
>
> stap -v --all-modules kmem_alloc.stp <slab_name>
>
> Ben
>
>
> 8<---------------------------- save as kmem_alloc.stp ----------------------------
>
> # This script displays the number of given slab allocations and the backtraces leading up to it.
>
> global slab = @1
> global stats, stacks
> probe kernel.function("kmem_cache_alloc") {
> if (kernel_string($s->name) == slab) {
> stats[execname()] <<< 1
> stacks[execname(),kernel_string($s->name),backtrace()] <<< 1
> }
> }
> # Exit after 10 seconds
> # probe timer.ms(10000) { exit () }
> probe end {
> printf("Number of %s slab allocations by process\n", slab)
> foreach ([exec] in stats) {
> printf("%s:\t%d\n",exec,@count(stats[exec]))
> }
> printf("\nBacktrace of processes when allocating\n")
> foreach ([proc,cache,bt] in stacks) {
> printf("Exec: %s Name: %s Count: %d\n",proc,cache,@count(stacks[proc,cache,bt]))
> print_stack(bt)
> printf("\n-------------------------------------------------------\n\n")
> }
> }
>