Re: [RFC PATCH 0/2] Introduce serialized smp_call_function APIs

From: Avi Kivity
Date: Wed Mar 13 2024 - 18:30:58 EST


On Wed, 2024-03-13 at 18:06 -0400, Mathieu Desnoyers wrote:
> On 2024-03-13 17:14, Avi Kivity wrote:
> > On Wed, 2024-03-13 at 16:56 -0400, Mathieu Desnoyers wrote:
> > > commit 944d5fe50f3f ("sched/membarrier: reduce the ability to
> > > hammer
> > > on sys_membarrier")
> > > introduces a mutex over all membarrier operations to reduce its
> > > ability
> > > to slow down the rest of the system.
> > >
> > > This RFC series has two objectives:
> > >
> > > 1) Move this mutex to the smp_call_function APIs so other system
> > > calls
> > >    using smp_call_function IPIs are limited in the same way,
> > >
> > > 2) Restore scalability of MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
> > > with
> > >    MEMBARRIER_CMD_FLAG_CPU, which targets specific CPUs with
> > > IPIs.
> > >    This may or may not be useful, and I would welcome benchmarks
> > > from
> > >    users of this feature to figure out if this is worth it.
> > >
> > > This series applies on top of v6.8.
> > >
> >
> >
> > I see this doesn't restore scaling of
> > MEMBARRIER_CMD_PRIVATE_EXPEDITED,
> > which I use (and wasn't aware was broken).
>
> It's mainly a mitigation for IPI Storming: CVE-2024-26602 disclosed


Very interesting.


> as part of [1].
>
> >
> > I don't have comments on the patches, but do have ideas on how to
> > work
> > around the problem in Seastar. So this was a useful heads-up for
> > me.
>
> Note that if you don't use membarrier private expedited too heavily,
> you should not notice any difference. But nevertheless I would be
> interested to hear about any regression on performance of real
> workloads resulting from commit 944d5fe50f3f.
>


In fact I did observe the original text of 944d5fe50f3f ("On some
systems, sys_membarrier can be very expensive, causing overall
slowdowns for everything") to be true [1]. So rather than causing
a regression, this commit made me fix a problem.

The smp_call_function_many_cond() in [1] is very likely due to
sys_membarrier, and it's slow since it's running on a virtual machine
without posted interrupt virtualization. Usually we detect virtual
machines and call membarrier() less frequently, but on that instance
(AWS d3en) the detection failed and triggered that IPI storm.

My fix is to just detect if there's a concurrent membarrier running and
fall back to doing something else, I don't think it's generally
applicable.

[1] https://github.com/scylladb/scylladb/issues/17207