Re: [PATCH 2/7] stacktrace,sched: Make stack_trace_save_tsk() more robust

From: Mark Rutland
Date: Fri Oct 22 2021 - 12:57:59 EST


On Fri, Oct 22, 2021 at 06:45:14PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 22, 2021 at 09:25:02AM -0700, Kees Cook wrote:
> > On Fri, Oct 22, 2021 at 05:09:35PM +0200, Peter Zijlstra wrote:
> > > /**
> > > * stack_trace_save_tsk - Save a task stack trace into a storage array
> > > * @task: The task to examine
> > > @@ -135,7 +142,6 @@ EXPORT_SYMBOL_GPL(stack_trace_save);
> > > unsigned int stack_trace_save_tsk(struct task_struct *tsk, unsigned long *store,
> > > unsigned int size, unsigned int skipnr)
> > > {
> > > - stack_trace_consume_fn consume_entry = stack_trace_consume_entry_nosched;
> > > struct stacktrace_cookie c = {
> > > .store = store,
> > > .size = size,
> > > @@ -143,11 +149,8 @@ unsigned int stack_trace_save_tsk(struct
> > > .skip = skipnr + (current == tsk),
> > > };
> > >
> > > - if (!try_get_task_stack(tsk))
> > > - return 0;
> > > + task_try_func(tsk, try_arch_stack_walk_tsk, &c);
> >
> > Pardon my thin understanding of the scheduler, but I assume this change
> > doesn't mean stack_trace_save_tsk() stops working for "current", right?
> > In trying to answer this for myself, I couldn't convince myself what value
> > current->__state have here. Is it one of TASK_(UN)INTERRUPTIBLE ?
>
> current really shouldn't be using stack_trace_save_tsk(), and no you're
> quite right, it will not work for current, irrespective of ->__state,
> current will always be ->on_rq.

Heh, we raced to say the same thing. :)

> I started auditing stack_trace_save_tsk() users a few days ago, but
> didn't look for this particular issue. I suppose I'll have to start over
> with that.

FWIW, this shape of thing was one of the reasons I wanted to split
arch_stack_walk() into separate:

* arch_stack_walk_current()
* arch_stack_walk_current_regs()
* arch_stack_walk_blocked_task()

... with similar applying here, since otherwise people won't consider
the distinction between current / !current at the caller level, leading
to junk like this.

Thanks,
Mark.