Re: [PATCHv7 18/33] lib/vdso: Add unlikely() hint into vdso_read_begin()

From: Vincenzo Frascino
Date: Thu Oct 24 2019 - 05:28:38 EST


Hi Andrei,

On 10/24/19 7:13 AM, Andrei Vagin wrote:
> On Wed, Oct 16, 2019 at 12:24:14PM +0100, Vincenzo Frascino wrote:
>> On 10/11/19 2:23 AM, Dmitry Safonov wrote:
>>> From: Andrei Vagin <avagin@xxxxxxxxx>
>>>
>>> Place the branch with no concurrent write before contended case.
>>>
>>> Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
>>> (more clock_gettime() cycles - the better):
>>> | before | after
>>> -----------------------------------
>>> | 150252214 | 153242367
>>> | 150301112 | 153324800
>>> | 150392773 | 153125401
>>> | 150373957 | 153399355
>>> | 150303157 | 153489417
>>> | 150365237 | 153494270
>>> -----------------------------------
>>> avg | 150331408 | 153345935
>>> diff % | 2 | 0
>>> -----------------------------------
>>> stdev % | 0.3 | 0.1
>>>
>>> Signed-off-by: Andrei Vagin <avagin@xxxxxxxxx>
>>> Co-developed-by: Dmitry Safonov <dima@xxxxxxxxxx>
>>> Signed-off-by: Dmitry Safonov <dima@xxxxxxxxxx>
>>
>> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@xxxxxxx>
>> Tested-by: Vincenzo Frascino <vincenzo.frascino@xxxxxxx>
>
> Hello Vincenzo,
>
> Could you test the attached patch on aarch64? On x86, it gives about 9%
> performance improvement for CLOCK_MONOTONIC and CLOCK_BOOTTIME.
>

I did run similar tests in past with a previous version of the unified vDSO
library and what I can tell based on the results of those is that the impact of
"__always_inline" alone was around 7% on arm64, in fact I had a comment stating
"To improve performances, in this file, __always_inline it is used for the
functions called multiple times." in my implementation [1].

[1] https://bit.ly/2W9zMxB

I spent some time yesterday trying to dig out why the approach did not make the
cut but I could not infer it from the review process.

> Here is my test:
> https://github.com/avagin/vdso-perf
>
> It is calling clock_gettime() in a loop for three seconds and then
> reports a number of iterations.
>

I am happy to run the test on arm64 and provide some results.

> Thanks,
> Andrei
>

--
Regards,
Vincenzo