Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader

From: Borislav Petkov
Date: Wed Apr 29 2015 - 04:14:26 EST


On Wed, Apr 29, 2015 at 12:49:59AM +0000, Zheng, Lv wrote:
> > > We absolutely want to use atomic_add_unless() because we get to save us
> > > the expensive
> > >
> > > LOCK; CMPXCHG
> > >
> > > if the value was already 1. Which is exactly what this patch is trying
> > > to avoid - a thundering herd of cores CMPXCHGing a global variable.
> >
> > IMO, on most architectures, the "cmp" part should work just like what you've done with "if".
> > And on some architectures, if the "xchg" doesn't happen, the "cmp" part even won't cause a pipe line hazard.

Even if CMPXCHG is being split into several microops, they all still
need to flow down the pipe and require resources and tracking. And you
only know at retire time what the CMP result is and can "discard" the
XCHG part. Provided the uarch is smart enough to do that.

This is probably why CMPXCHG needs 5,6,7,10,22,... cycles depending on
uarch and vendor, if I can trust Agner Fog's tables. And I bet those
numbers are best-case only and in real-life they probably tend to fall
out even worse.

CMP needs only 1. On almost every uarch and vendor. And even that cycle
probably gets hidden with a good branch predictor.

> If you man the LOCK prefix, I understand now.

And that makes several times worse: 22, 40, 80, ... cycles.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/