Re: [PATCH] x86 rwsem optimization extreme

From: H. Peter Anvin
Date: Wed Feb 17 2010 - 21:00:35 EST


On 02/17/2010 05:53 PM, Linus Torvalds wrote:
>>
>> FWIW, I don't know of any microarchitecture where adc is slower than
>> add, *as long as* the setup time for the CF flag is already used up.
>
> Oh, I think there are lots.
>
> Look at just about any x86 latency/throughput table, and you'll see:
>
> - adc latencies are typically much higher than a single cycle
>
> But you are right that this is likel not an issue on any out-of-order
> chip, since the 'stc' will schedule perfectly.
>

STC actually tends to schedule poorly, since it has a partial register
stall. In-order or out-of-order doesn't really matter, though; what
matters is that the scoreboarding used for the flags has to settle, or
you will take a huge hit.

> - but adc _throughput_ is also typically much higher, which indicates
> that even if you do flag renaming, the 'adc' quite likely only
> schedules in a single ALU unit.
>
> For example, on a Pentium, adc/sbb can only go in the U pipe, and I think
> the same is true of 'stc'. Now, nobody likely cares about Pentiums any
> more, but the point is, 'adc' does often have constraints that a regular
> 'add' does not, and there's an example of a 'stc+adc' pair would at the
> very least have to be scheduled with an instruction in between.

No doubt. I doubt it much matters in this context, but either way I
think the patch is probably a bad idea... much for the same as my incl
hack was - since the code isn't actually inline, saving a handful bytes
is not the right tradeoff.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/