Re: [PATCH v6 2/2] tracing: Add support for preempt and irq enable/disable events

From: Joel Fernandes
Date: Mon Sep 25 2017 - 20:23:06 EST


Hi Peter,

On Mon, Sep 25, 2017 at 1:34 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Fri, Sep 22, 2017 at 11:28:30AM -0700, Joel Fernandes wrote:
[...]
>> >> void start_critical_timings(void)
>> >> {
>> >> + if (this_cpu_read(tracing_preempt_cpu))
>> >> + trace_preempt_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
>> >> +
>> >> + if (this_cpu_read(tracing_irq_cpu))
>> >> + trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
>> >> +
>> >> start_critical_timings_tracer();
>> >> }
>> >> EXPORT_SYMBOL_GPL(start_critical_timings);
>> >>
>> >> void stop_critical_timings(void)
>> >> {
>> >> + if (this_cpu_read(tracing_preempt_cpu))
>> >> + trace_preempt_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
>> >> +
>> >> + if (this_cpu_read(tracing_irq_cpu))
>> >> + trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
>> >> +
>> >> stop_critical_timings_tracer();
>> >> }
>> >> EXPORT_SYMBOL_GPL(stop_critical_timings);
>> >
>> > And I feel these yield somewhat odd semantics, does that need explaining
>> > somewhere?
>>
>> Maybe I can add a comment here, if you prefer that. When you meant
>> semantics, do you mean 'critical' vs 'atomic' thing or do you mean the
>> semantics/context of how this function is supposed to be used?
>
> I would add the comment to the tracepoint definition.
>
> On semantics, the whole stop/start excludes a fair bunch of walltime
> from our measurement, I feel that needs to be called out and enumerated
> (when time goes missing and why).
>
> Given that the idle thread runs with preempt-off I understand its
> purpose from the POV from the preempt-tracer, but its 'weird' behaviour
> if you're looking at it from a pure tracepoint pov.

I see your point, basically you have a problem with the "fake preempt
on" event that I'm generating when start_critical_timings is called
from the idle code path which isn't really a preempt-on per-se but is
an effect of entering the idle path.

Since we already have cpuidle events, one thing I could do is just
drop out this part of the patch set and not generate any trace events
during start_critical_timings, and one could use the preempt_disable
trace event (introduced in this path) and the cpuidle trace event to
figure out that part of the preempt-disabled section is actual the cpu
idle path. Would that alleviate your concern here?

As a next step Steven discussed with me in the hallway at LPC that we
could possibly a start_critical_timings event as well, so probably as
a next step after this patch, those new events can be added which
gives more visibility into walltime that shouldn't be counted. Or I
could add it in this series as a separate patch, let me know if that
makes sense and what you'd like to do, thanks.

>> >> void trace_preempt_off(unsigned long a0, unsigned long a1)
>> >> {
>> >> + if (this_cpu_read(tracing_preempt_cpu))
>> >> + return;
>> >> +
>> >> + this_cpu_write(tracing_preempt_cpu, 1);
>> >> +
>> >> + trace_preempt_disable_rcuidle(a0, a1);
>> >> tracer_preempt_off(a0, a1);
>> >> }
>> >> #endif
>> >
>> > And here you assume things like double on / double off don't happen,
>> > which might well be so, but does seem somewhat fragile.
>> >
>>
>> We are handling the cases where these functions might be called twice,
>> but we are only interested in the first time they're called. I caught
>> a dead lock happen when I didn't add such protection to
>> trace_hardirqs_off so I added to these to the trace_hardirqs* and
>> trace_preempt* ones as well to just to be extra safe and keep it
>> consistent. Hope I understood your concern correctly, if not please
>> let me know, thanks.
>
> Under what conditions where they called twice? That seems like something
> that should not happen to begin with. Esp the one I left quoted above,

Ok, so the calling twice AIUI cannot happen with trace_preempt_on and
trace_preempt_off, I agree and I'll drop the unnecessary per-cpu
variable check, but...

> disabling when its already disabled sounds like fail. So please provide
> more details on the scenario you're working around.

It can still happen with local_irq_save called while interrupts are disabled:

The trace_hardirqs_off API can be called even when IRQs are already
off. This is unlike the trace_hardirqs_on which checks if IRQs are off
(atleast from some callsites), here are the definitions just for
reference [1]. I guess we could modify local_irq_disable and
local_irq_save to check what the HW flags was before calling
raw_local_irq_save and only then call trace_hardirqs_off if they were
indeed on and now being turned off, but that adds complexity to it -
also we have to then modify all the callsites from assembly code to
conditionally call trace_hardirqs_on/off :(.

Instead my patch uses a per-CPU variable in trace_hardirqs_* function
to track that interrupts were already off when it was called thus
avoiding uselessly emitting a trace event:

void trace_hardirqs_off(void)
{
if (this_cpu_read(tracing_irq_cpu))
return;

this_cpu_write(tracing_irq_cpu, 1);

trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
tracer_hardirqs_off();
}

I feel this is the cleanest path forward for the IRQ disable/enable
trace events.

Are you Ok with this?

thanks,

- Joel

[1] http://elixir.free-electrons.com/linux/latest/source/include/linux/irqflags.h#L89