RE: [PATCH 07/32] mm: Bring back vmalloc_exec

From: David Laight
Date: Mon May 15 2023 - 06:29:30 EST


From: Kent Overstreet
> Sent: 14 May 2023 06:45
...
> dynamically generated unpack:
> rand_insert: 20.0 MiB with 1 threads in 33 sec, 1609 nsec per iter, 607 KiB per sec
>
> old C unpack:
> rand_insert: 20.0 MiB with 1 threads in 35 sec, 1672 nsec per iter, 584 KiB per sec
>
> the Eric Biggers special:
> rand_insert: 20.0 MiB with 1 threads in 35 sec, 1676 nsec per iter, 583 KiB per sec
>
> Tested two versions of your approach, one without a shift value, one
> where we use a shift value to try to avoid unaligned access - second was
> perhaps 1% faster

You won't notice any effect of avoiding unaligned accesses on x86.
I think then get split into 64bit accesses and again on 64 byte
boundaries (that is what I see for uncached access to PCIe).
The kernel won't be doing >64bit and the 'out of order'
pipeline will tend to cover the others (especially since you
get 2 reads/clock).

> so it's not looking good. This benchmark doesn't even hit on
> unpack_key() quite as much as I thought, so the difference is
> significant.

Beware: unless you manage to lock the cpu frequency (which is ~impossible
on some cpu) timings in nanoseconds are pretty useless.
You can use the performance counter to get accurate cycle times
(provided there isn't a cpu switch in the middle of a micro-benchmark).

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)