Re: [RFC PATCH] introduce sys_membarrier(): process-wide memorybarrier (v3a)

From: Mathieu Desnoyers
Date: Mon Jan 11 2010 - 17:49:06 EST


* Peter Zijlstra (peterz@xxxxxxxxxxxxx) wrote:
> On Mon, 2010-01-11 at 17:04 -0500, Mathieu Desnoyers wrote:
> > * Peter Zijlstra (peterz@xxxxxxxxxxxxx) wrote:
> > > On Mon, 2010-01-11 at 15:52 -0500, Mathieu Desnoyers wrote:
> > > >
> > > > So the clear bit can occur far, far away in the future, we don't care.
> > > > We'll just send extra IPIs when unneeded in this time-frame.
> > >
> > > I think we should try harder not to disturb CPUs, particularly in the
> > > face of RT tasks and DoS scenarios. Therefore I don't think we should
> > > just wildly send to mm_cpumask(), but verify (although speculatively)
> > > that the remote tasks' mm matches ours.
> > >
> >
> > Well, my point of view is that if IPI TLB shootdown does not care about
> > disturbing CPUs running other processes in the time window of the lazy
> > removal, why should we ?
>
> while (1)
> sys_membarrier();
>
> is a very good reason, TLB shootdown doesn't have that problem.
>
> > We're adding an overhead very close to that of
> > an unrequired IPI shootdown which returns immediately without doing
> > anything.
>
> Except we don't clear the mask.
>

Good point. And I'm not so confident that clearing it ourself would be
safe in any way.

> > The tradeoff here seems to be:
> > - more overhead within switch_mm() for more precise mm_cpumask.
> > vs
> > - lazy removal of the cpumask, which implies that some processors
> > running a different process can receive the IPI for nothing.
> >
> > I really doubt we could create an IPI DoS based on such a small
> > time window.
>
> What small window? When there's less runnable tasks than available mm
> contexts some architectures can go quite a long while without
> invalidating TLBs.

OK.

>
> So what again is wrong with:
>
> int cpu, this_cpu = get_cpu();
>
> smp_mb();
>
> for_each_cpu(cpu, mm_cpumask(current->mm)) {
> if (cpu == this_cpu)
> continue;
> if (cpu_curr(cpu)->mm != current->mm)
> continue;
> smp_send_call_function_single(cpu, do_mb, NULL, 1);
> }
>
> put_cpu();
>
> ?
>

Almost. Missing smp_mb() at the end. We also have to specify that the
smp_mb() we plan to require in switch_mm() should now surround:

- clear mask
- set mask
- ->mm update

Or, for a simpler way to protect ->mm read, we can go with the runqueue
spinlock.

Also, I'd like to use a send-to-many IPI rather than sending to single
CPUs one by one, because the former has a much better scalability for
architectures supporting IPI broadcast. This, however, implies
allocating a temporary cpumask.

Thanks,

Mathieu

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/