RE: [RFC 00/14] Dynamic Kernel Stacks

From: David Laight
Date: Tue Mar 12 2024 - 18:18:58 EST

Next message: Steven Rostedt: "[for-linus][PATCH 0/5] tracing/ring-buffer: Updates that should have made it for 6.8"
Previous message: Casey Schaufler: "Re: [PATCH v15 05/11] LSM: Create lsm_list_modules system call"
In reply to: Kent Overstreet: "Re: [RFC 00/14] Dynamic Kernel Stacks"
Next in thread: Matthew Wilcox: "Re: [RFC 00/14] Dynamic Kernel Stacks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

...
> I re-read my cover letter, and I do not see where "kernel memory" is
> mentioned. We are talking about kernel stacks overhead that is
> proportional to the user workload, as every active thread has an
> associated kernel stack. The idea is to save memory by not
> pre-allocating all pages of kernel-stacks, but instead use it as a
> safeguard when a stack actually becomes deep. Come-up with a solution
> that can handle rare deeper stacks only when needed. This could be
> done through faulting on the supported hardware (as proposed in this
> series), or via pre-map on every schedule event, and checking the
> access when thread goes off cpu (as proposed by Andy Lutomirski to
> avoid double faults on x86) .
>
> In other words, this feature is only about one very specific type of
> kernel memory that is not even directly mapped (the feature required
> vmapped stacks).

Just for interest how big does the register save area get?
In the 'good old days' it could be allocated from the low end of the
stack memory. But AVX512 starts making it large - never mind some
other things that (IIRC) might get to 8k.
Even the task area is probably non-trivial since far fewer things
can be shared than one might hope.

I'm sure I remember someone contemplating not allocating stacks to
each thread. I think that requires waking up with a system call
restart for some system calls - plausibly possible for futex() and poll().

Another option is to do a proper static analysis of stack usage
and fix the paths that have deep stacks and remove all recursion.
I'm pretty sure objtool knows the stack offsets of every call instruction.
The indirect call hashes (fine IBT?) should allow indirect calls
be handled as well as direct calls.
Processing the 'A calls B at offset n' to generate a max depth
is just a SMOP.

At the moment I think all 'void (*)(void *)' function have the same hash?
So the compiler would need a function attribute to seed the hash.

With that you might be able to remove all the code paths that actually
use a lot of stack - instead of just guessing and limiting individual
stack frames.

My 'gut feel' from calculating the stack use that way for an embedded
system back in the early 1980s is that the max use will be inside
printk() inside an obscure error path and if you actually hit it
things will explode.
(We didn't have enough memory to allocate big enough stacks!)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Next message: Steven Rostedt: "[for-linus][PATCH 0/5] tracing/ring-buffer: Updates that should have made it for 6.8"
Previous message: Casey Schaufler: "Re: [PATCH v15 05/11] LSM: Create lsm_list_modules system call"
In reply to: Kent Overstreet: "Re: [RFC 00/14] Dynamic Kernel Stacks"
Next in thread: Matthew Wilcox: "Re: [RFC 00/14] Dynamic Kernel Stacks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]