Re: [PATCH v7 3/4] mm,page_owner: Display all stacks and their count

From: Oscar Salvador
Date: Fri Feb 09 2024 - 16:51:49 EST


On Fri, Feb 09, 2024 at 09:00:40AM +0100, Marco Elver wrote:
> > +/**
> > + * stack_depot_get_next_stack - Returns all stacks, one at a time
>
> "Returns all stack_records" to be clear that this is returning the struct.

Fixed.


>
> > + *
> > + * @table: Current table we are checking
> > + * @bucket: Current bucket we are checking
> > + * @last_found: Last stack that was found
> > + *
> > + * This function finds first a non-empty bucket and returns the first stack
> > + * stored in it. On consequent calls, it walks the bucket to see whether
> > + * it contains more stacks.
> > + * Once we have walked all the stacks in a bucket, we check
> > + * the next one, and we repeat the same steps until we have checked all of them
>
> I think for this function it's important to say that no entry returned
> from this function can be evicted.
>
> I.e. the easiest way to ensure this is that the caller makes sure the
> entries returned are never passed to stack_depot_put() - which is
> certainly the case for your usecase because you do not use
> stack_depot_put().
>
> > + * Return: A pointer a to stack_record struct, or NULL when we have walked all
> > + * buckets.
> > + */
> > +struct stack_record *stack_depot_get_next_stack(unsigned long *table,
>
> To keep consistent, I'd also call this
> __stack_depot_get_next_stack_record(), so that we're clear this is
> more of an internal function not for general usage.
>
> > + struct list_head **bucket,
> > + struct stack_record **last_found);
> > +
> > /**
> > * stack_depot_fetch - Fetch a stack trace from stack depot
> > *
> > diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> > index 197c355601f9..107bd0174cd6 100644
> > --- a/lib/stackdepot.c
> > +++ b/lib/stackdepot.c
> > @@ -782,6 +782,52 @@ unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle)
> > }
> > EXPORT_SYMBOL(stack_depot_get_extra_bits);
> >
> > +struct stack_record *stack_depot_get_next_stack(unsigned long *table,
> > + struct list_head **curr_bucket,
> > + struct stack_record **last_found)
> > +{
> > + struct list_head *bucket = *curr_bucket;
> > + unsigned long nr_table = *table;
> > + struct stack_record *found = NULL;
> > + unsigned long stack_table_entries = stack_hash_mask + 1;
> > +
> > + rcu_read_lock_sched_notrace();
>
> We are returning pointers to stack_records out of the RCU-read
> critical section, which are then later used to continue the iteration.
> list_for_each_entry_continue_rcu() says this is fine if "... you held
> some sort of non-RCU reference (such as a reference count) ...".
> Updating the function's documentation to say none of these entries can
> be evicted via a stack_depot_put() is required.

Thinking about it some more, I think I made a mistake:

I am walking all buckets, and within those buckets there are not only
page_owner stack_records, which means that I could return a stack_record
from e.g: KASAN (which I think can evict stack_records) and then
everything goes off the rails.
Which means I cannot walk the buckets like that.

Actually, I think that having something like the following

struct list_stack_records {
struct stack_record *stack;
struct list_stack_records *next;
}

in page_owner would make sense.
Then the only thing I would have to do is to add a new record on every
new stack_record, and then I could just walk the list like a linked
list.

Which means that the function stack_depot_get_next_stack() could be
killed because everything would happen in page_owner code.

e.g:

static void inc_stack_record_count(depot_stack_handle_t handle)
{
struct stack_record *stack = __stack_depot_get_stack_record(handle);

if (stack) {
/*
* New stack_record's that do not use STACK_DEPOT_FLAG_GET start
* with REFCOUNT_SATURATED to catch spurious increments of their
* refcount.
* Since we do not use STACK_DEPOT_FLAG_{GET,PUT} API, let us
* set a refcount of 1 ourselves.
*/
if (refcount_read(&stack->count) == REFCOUNT_SATURATED) {
refcount_set(&stack->count, 1);
add_new_stack_record_into_the_list(stack)
}
refcount_inc(&stack->count);
}
}

and then just walk the list_stack_records list whenever we want to
show the stacktraces and their counting.

I think that overall this approach is cleaner and safer.

--
Oscar Salvador
SUSE Labs