Re: TIF_NOHZ can escape nonhz mask? (Was: [PATCH v3 6/8] x86: Split syscall_trace_enter into two phases)

From: Oleg Nesterov
Date: Sat Aug 02 2014 - 13:33:01 EST


On 07/31, Frederic Weisbecker wrote:
>
> On Thu, Jul 31, 2014 at 08:12:30PM +0200, Oleg Nesterov wrote:
> > > >
> > > > Yes sure. But context_tracking_cpu_set() is called by init task with PID 1, not
> > > > by "swapper".
> > >
> > > Are you sure? It's called from start_kernel() which is init/0.
> >
> > But do_initcalls() is called by kernel_init(), this is the init process which is
> > going to exec /sbin/init later.
> >
> > But this doesn't really matter,
>
> Yeah but tick_nohz_init() is not an initcall, it's a function called from start_kernel(),
> before initcalls.

Ah, indeed, and context_tracking_init() too. Even better, so we only need

--- x/kernel/context_tracking.c
+++ x/kernel/context_tracking.c
@@ -30,8 +30,10 @@ EXPORT_SYMBOL_GPL(context_tracking_enabl
DEFINE_PER_CPU(struct context_tracking, context_tracking);
EXPORT_SYMBOL_GPL(context_tracking);

-void context_tracking_cpu_set(int cpu)
+void __init context_tracking_cpu_set(int cpu)
{
+ /* Called by "swapper" thread, all threads will inherit this flag */
+ set_thread_flag(TIF_NOHZ);
if (!per_cpu(context_tracking.active, cpu)) {
per_cpu(context_tracking.active, cpu) = true;
static_key_slow_inc(&context_tracking_enabled);

and now we can kill context_tracking_task_switch() ?

> > Yes, yes, this doesn't really matter. We can even add set(TIF_NOHZ) at the start
> > of start_kernel(). The question is, I still can't understand why do we want to
> > have the global TIF_NOHZ.
>
> Because then the flags is inherited in forks. It's better than inheriting it on
> context switch due to context switch being called much more often than fork.

This is clear, that is why I suggested this. Just we didn't understand each other,
when I said "global TIF_NOHZ" I meant the current situtation when every (running)
task has this bit set anyway. Sorry for confusion.

> No, because preempt_schedule_irq() does the ctx_state save and restore with
> exception_enter/exception_exit.

Thanks again. Can't understand how I managed to miss that exception_enter/exit
in preempt_schedule_*.

Damn. And after I spent more time, I don't have any idea how to make this
tracking cheaper.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/