Re: [PATCH] tracing: Choose static tp_printk buffer by explicit nesting count

From: Andy Lutomirski
Date: Wed May 25 2016 - 16:18:05 EST


On May 25, 2016 6:16 AM, "Peter Zijlstra" <peterz@xxxxxxxxxxxxx> wrote:
>
> On Tue, May 24, 2016 at 03:52:28PM -0700, Andy Lutomirski wrote:
> > Currently, the trace_printk code chooses which static buffer to use based
> > on what type of atomic context (NMI, IRQ, etc) it's in. Simplify the
> > code and make it more robust: simply count the nesting depth and choose
> > a buffer based on the current nesting depth.
> >
> > The new code will only drop an event if we nest more than 4 deep,
> > and the old code was guaranteed to malfunction if that happened.
> >
> > Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
> > ---
> > kernel/trace/trace.c | 83 +++++++++++++++-------------------------------------
> > 1 file changed, 24 insertions(+), 59 deletions(-)
> >
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index a2f0b9f33e9b..4508f3bf4a97 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -1986,83 +1986,41 @@ static void __trace_userstack(struct trace_array *tr, unsigned long flags)
> >
> > /* created for use with alloc_percpu */
> > struct trace_buffer_struct {
> > - char buffer[TRACE_BUF_SIZE];
> > + int nesting;
> > + char buffer[4][TRACE_BUF_SIZE];
> > };
> >
> > static struct trace_buffer_struct *trace_percpu_buffer;
> > /*
> > + * Thise allows for lockless recording. If we're nested too deeply, then
> > + * this returns NULL.
> > */
> > static char *get_trace_buf(void)
> > {
> > + struct trace_buffer_struct *buffer = this_cpu_ptr(trace_percpu_buffer);
> >
> > + if (!buffer || buffer->nesting >= 4)
> > return NULL;
>
> This is buggy fwiw; you need to unconditionally increment
> buffer->nesting to match the unconditional decrement.
>
> Otherwise 5 'increments' and 5 decrements will land you at -1.

I did indeed mess up the error handling. I'll fix it.

>
> >
> > + return &buffer->buffer[buffer->nesting++][0];
> > +}
> > +
> > +static void put_trace_buf(void)
> > +{
> > + this_cpu_dec(trace_percpu_buffer->nesting);
> > }
>
> So I don't know about tracing; but for perf this construct would not
> work 'properly'.
>
> The per context counter -- which is lost in this scheme -- guards
> against in-context recursion.
>
> Only if we nest from another context do we allow generation of a new
> event.

What's the purpose of this feature?

I'm guessing that the idea is to prevent events that are triggered
synchronously during processing of another event. So, for example, if
you get a page fault or trigger a data breakpoint while generating a
callchain, it's not terribly helpful to emit events due to that fault
or breakpoint. In this respect, my patch is an improvement:
watchpoints are synchronous events.

If that's the goal, then the current heuristic may be fairly good after all.

--Andy