Re: [PATCH -tip] introduce sys_membarrier(): process-wide memorybarrier (v9)

From: Mathieu Desnoyers
Date: Tue Mar 16 2010 - 10:16:30 EST


* Ingo Molnar (mingo@xxxxxxx) wrote:
>
> * Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>
> > * Ingo Molnar (mingo@xxxxxxx) wrote:
> > >
> > > * Nick Piggin <npiggin@xxxxxxx> wrote:
> > >
> > > > On Tue, Mar 16, 2010 at 08:36:35AM +0100, Ingo Molnar wrote:
> > > > >
> > > > > * Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
> > > > >
> > > > > > Unless this question is answered, Ingo's SA_RUNNING signal proposal, as
> > > > > > appealing as it may look at a first glance, falls into the
> > > > > > "fundamentally broken" category. [...]
> > > > >
> > > > > How is it different from your syscall? I.e. which lines of code make the
> > > > > difference? We could certainly apply the (trivial) barrier change to
> > > > > context_switch().
> > > >
> > > > I think it is just easy for userspace to misuse or think it does something
> > > > that it doesn't (because of races).
> > >
> > > That wasnt my question though. The question i asked Mathieu was to show how
> > > SA_RUNNING is "fundamentally broken" for librcu use while sys_membarrier() is
> > > not?
> > >
> > > This is really what he claims above. (i preserved the quote)
> > >
> > > It must be a misunderstanding either on my side or on his side. (Once that is
> > > cleared we can discuss further usecases for SA_RUNNING.)
> >
> > Well, it's not broken for sys_membarrier() specifically if we add the proper
> > memory barriers to the scheduler, but it's broken when we try to use it for
> > anything else. [...]
>
> That's quite an important distinction to an unqualified "fundamentally
> broken", right?

OK, I guess "conceptually broken" would be more precise in this case. ;)

>
> > [...] What makes it broken is that it requires that the scheduler switch
> > guarantee to have the same side-effect on a running thread than execution on
> > the per-running-thread signal handler.
> >
> > What's different with the sys_membarrier system call is that it does not try
> > to make generic something that should probably stay case-specific due to its
> > close coupling with the scheduler.
>
> Yeah, that's a fair point.
>
> Without another realistic usecase SA_RUNNING would just essentially be a
> SA_BARRIER special-case. (IMO even in that case signal handling speedups
> driven via this usecase would still be tempting though.)
>
> But note that some other usecase is possible as well:
>
> In theory we could inject signals at context-switch time (if that signal is
> not pending yet) - signals are fairly atomic [with a preallocated pool] and
> the 'wakeup' property of signals is not needed as the to-be-running task is
> obviously up to execution. (so there's no deadlock. It doesnt have to run with
> the rq lock taken in any case - it can run from sched_tail() i suspect.)
>
> So all this could be done via the ret-to-user framework that KVM uses at
> essentially no extra scheduler overhead. I think :-) It would be a bit like
> SIGALRM for timers.

That could be an interesting approach to hook into the scheduler "return to
userspace" path. We have to consider that this signal should probably have a
very high priority if we expect it to effectively nest over other signal
handlers.

But it does not address the hook needed upon entry into the scheduler context
switch. I fear this one might be a bit harder to do without tons of extra
overhead.

>
> Plus another performance optimization would be useful as well: signals could
> be turned on/off without having to enter the kernel. This could be done via a
> in-user-memory enable/disable-signals flag/mask associated with each task. (it
> would pin a page of memory.)

Hrm, it makes we wonder if this optimization would not add a slight overhead to
the scheduler. By allowing this kind of enable/disable flag, we would have to
check for blocked signal delivery upon each return to userspace. With the
current system call used for masking signals, this check can accurately be done
only in the signal-related system calls. (but maybe the scheduler already has to
take part of this burden for other reasons I'm not aware of). But yes,
independently of the SA_RUNNING topic, this optimization might very well be
worth it. I've actually been thinking along the same lines for a enable-disable
"thread migration" flag too, but that's a completely different topic (and has
impact on scheduler migration and cpu hotplug, so it's not as easy as it seems).

>
> The question is, do we want to enable user-space to trigger a signal upon
> context-switches?
>
> It probably cannot be a queued one, as preemption from the signal handler
> itself would be rather yucky. As long as concurrency control is involved,
> user-space only wants a callback for the _first_ reschedule - subsequent
> reschedules dont need to trigger a signal, until the signal handler has
> finished.

That could work for return to userspace, any clever idea about how to deal with
the hook to call upon entry into context switch ?

Thanks,

Mathieu


>
> Ingo

--
Mathieu Desnoyers
Operating System Efficiency Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/