RE: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg

From: David Laight
Date: Tue Apr 11 2023 - 17:34:49 EST

Next message: Konrad Dybcio: "Re: [PATCH RFT v2 01/14] dt-bindings: clock: qcom,rpmcc: Add a way to enable unused clock cleanup"
Previous message: Stephen Rothwell: "Re: linux-next: build warnings after merge of the block tree"
In reply to: Dave Hansen: "Re: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Dave Hansen
> Sent: 11 April 2023 14:44
>
> On 4/11/23 04:35, Mark Rutland wrote:
> > I agree it'd be nice to have performance figures, but I think those would only
> > need to demonstrate a lack of a regression rather than a performance
> > improvement, and I think it's fairly clear from eyeballing the generated
> > instructions that a regression isn't likely.
>
> Thanks for the additional context.
>
> I totally agree that there's zero burden here to show a performance
> increase. If anyone can think of a quick way to do _some_ kind of
> benchmark on the code being changed and just show that it's free of
> brown paper bags, it would be appreciated. Nothing crazy, just think of
> one workload (synthetic or not) that will stress the paths being changed
> and run it with and without these changes. Make sure there are not
> surprises.
>
> I also agree that it's unlikely to be brown paper bag material.

The only thing I can think of is that, on x86, the locked
variant may actually be faster!
Both require exclusive access to the cache line (the unlocked
variant always does the write! [1]).
So if the cache line is contended between cpu the unlocked
variant might ping-pong the cache line twice!
Of course, if the line is shared like that then performance
is horrid.

[1] I checked on an uncached PCIe address on which I can monitor
the TLP. The write always happens so you can use cmpxchg18b
with a 'known bad value' to do a 16 byte read as a single TLP
(without using an SSE register).

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Next message: Konrad Dybcio: "Re: [PATCH RFT v2 01/14] dt-bindings: clock: qcom,rpmcc: Add a way to enable unused clock cleanup"
Previous message: Stephen Rothwell: "Re: linux-next: build warnings after merge of the block tree"
In reply to: Dave Hansen: "Re: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]