Re: [rfc] x86, bts: fix crash

From: Markus Metzger
Date: Fri Mar 27 2009 - 13:33:46 EST


On Fri, 2009-03-27 at 17:50 +0100, Oleg Nesterov wrote:
> On 03/27, Metzger, Markus T wrote:
> >
> > >> @@ -752,6 +752,14 @@ void ds_release_bts(struct bts_tracer *t
> > >>
> > >> ds_suspend_bts(tracer);
> > >>
> > >> + /*
> > >> + * We must wait for the suspend to take effect before we may
> > >> + * free the tracer and the ds configuration.
> > >> + */
> > >> + if (tracer->ds.context->task &&
> > >> + (tracer->ds.context->task != current))
> > >> + wait_task_inactive(tracer->ds.context->task, 0);
> > >
> > >I am not sure I understand the problem. From the changelog:
> > >
> > > If the children are currently executing, the buffer
> > > may be freed while the hardware is still tracing.
> > > This might cause the hardware to overwrite memory.
> > >
> > >So, the problem is that ds.context->task must not be running before we
> > >can start to disable/free ds, yes? Something like ds_switch_to() should
> > >be completed, right?
> > >
> > >In that case I don't really understand how wait_task_inactive() can help.
> > >If the task is killed it can be scheduled again, right after
> > >wait_task_inactive() returns.
> >
> > We first call ds_suspend_bts().
> > This clears the branch tracing control bits for the traced task and already
> > writes the updated value to the msr, if running on the current cpu.
> > If the task is running on a different cpu, the updated value will be written
> > when the task is scheduled out.
> > By waiting for the task to become inactive, we know that it has been scheduled out
> > at least once after we changed the bits. So we know that the hardware will not use
> > the tracing configuration for that task and we can safely free the memory.
>
> Still can't understand...
>
> Let's suppose the traced task is scheduled again, right after
> wait_task_inactive() returns a before we set ds.context->bts_master = NULL.
>
> In this case, can't ds_switch_to() (which plays with ds_context) race
> with ds_put_context()->kfree(context) ?


You're right. Now I see the problem.

The bts_master pointer needs to be set to NULL while the traced task is
off the cpu.

The kfree will only happen if the context is not used. If another tracer
immediately reuses the context and traces the same task, it will
become the new master and overwrite the configuration.

I assumed that the traced task would always be stopped, since that's
the ptrace scenario. But it's not true for disable, which may be called
by an exiting tracer while the tracee is running.

I would still require it for ds_request_bts_task(), though. I will add
a comment regarding this.

Regarding the race on task->thread.ds_ctx between ds_release_bts() and
ds_switch_to(), how would I prevent a task from being rescheduled for
a small amount of time?


> > >Also. This function is called from ptrace_bts_exit_tracer(), when the
> > >tracee is not stopped. In this case wait_task_inactive() can spin forever.
> > >For example, if the tracee simply does "for (;;) ;" it never succeeds.
> >
> > As far as I understand, wait_task_inactive() returns when the task is scheduled out.
>
> Yes. But the task does above is never scheduled out, it is always running
> even if preempted by another task. wait_task_inactive() returns when
> ->on_rq == 0, iow when the task sleeps.

Oops.


> This means that the tracer can hang "forever" during exit, until the tracee
> does the blocking syscall or exits.
>
> This is not acceptable, imho.

Definitely not.


What I would need here (see above) is to get the traced task off any cpu
in order to clear the bts_master pointer.
Is there some existing function that I could use?
Could you point me to some code that does something similar?

thanks,
markus.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/