Re: [PATCH v6 0/2] x86: Implement fast refcount overflow protection

From: Ingo Molnar
Date: Fri Jul 21 2017 - 03:50:35 EST



* Kees Cook <keescook@xxxxxxxxxxxx> wrote:

> On Thu, Jul 20, 2017 at 10:15 AM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> > On Thu, Jul 20, 2017 at 2:11 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> >> Could you please also create a tabulated quick-comparison of the three variants,
> >> of all key properties, about behavior, feature and tradeoff differences?
> >>
> >> Something like:
> >>
> >> !ARCH_HAS_REFCOUNT ARCH_HAS_REFCOUNT=y REFCOUNT_FULL=y
> >>
> >> avg fast path instructions: 5 3 10
> >> behavior on overflow: unsafe, silent safe, verbose safe, verbose
> >> behavior on underflow: unsafe, silent unsafe, verbose unsafe, verbose
> >> ...
> >>
> >> etc. - note that this table is just a quick mockup with wild guesses. (Please add
> >> more comparisons of other aspects as well.)
> >>
> >> Such a comparison would make it easier for arch, subsystem and distribution
> >> maintainers to decide on which variant to use/enable.
> >
> > Sure, I can write this up. I'm not sure "safe"/"unsafe" is quite that
> > clean. The differences between -full and -fast are pretty subtle, but
> > I think I can describe it using the updated LKDTM tests I've written
> > to compare the two. There are conditions that -fast doesn't catch, but
> > those cases aren't actually useful for the overflow defense.
> >
> > As for "avg fast path instructions", do you mean the resulting
> > assembly for each refcount API function? I think it's going to look
> > something like "1 2 45", but I'll write it up.
>
> So, doing a worst-case timing of a loop of inc() to INT_MAX and then
> dec_and_test() back to zero, I see this out of perf:
>
> atomic
> 25255.114805 task-clock (msec)
> 82249267387 cycles
> 11208720041 instructions
>
> refcount-fast
> 25259.577583 task-clock (msec)
> 82211446892 cycles
> 15486246572 instructions
>
> refcount-full
> 44625.923432 task-clock (msec)
> 144814735193 cycles
> 105937495952 instructions
>
> I'll still summarize all this in the v7 series, but I think that
> really clarifies the differences: 1.5x more instructions in -fast, but
> nearly identical cycles and clock. Using -full sees a large change (as
> expected).

Ok, that's pretty convincig - I'd suggest including a cicles row in the table
instead of an instructions row: number of instructions is indeed slightly
misleading in this case.

Thanks,

Ingo