Re: [RFC PATCH 00/14] Prevent cross-cache attacks in the SLUB allocator

From: Matteo Rizzo
Date: Mon Sep 18 2023 - 08:09:36 EST


On Fri, 15 Sept 2023 at 18:30, Lameter, Christopher
<cl@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, 15 Sep 2023, Dave Hansen wrote:
>
> > What's the cost?
>
> The only thing that I see is 1-2% on kernel compilations (and "more on
> machines with lots of cores")?

I used kernel compilation time (wall clock time) as a benchmark while
preparing the series. Lower is better.

Intel Skylake, 112 cores:

LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV
---------------+-------+---------+---------+---------+---------+--------
SLAB_VIRTUAL=n | 150 | 49.700s | 51.320s | 50.449s | 50.430s | 0.29959
SLAB_VIRTUAL=y | 150 | 50.020s | 51.660s | 50.880s | 50.880s | 0.30495
| | +0.64% | +0.66% | +0.85% | +0.89% | +1.79%

AMD Milan, 256 cores:

LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV
---------------+-------+---------+---------+---------+---------+--------
SLAB_VIRTUAL=n | 150 | 25.480s | 26.550s | 26.065s | 26.055s | 0.23495
SLAB_VIRTUAL=y | 150 | 25.820s | 27.080s | 26.531s | 26.540s | 0.25974
| | +1.33% | +2.00% | +1.79% | +1.86% | +10.55%

Are there any specific benchmarks that you would be interested in seeing or
that are usually used for SLUB?

> Problems:
>
> - Overhead due to more TLB lookups
>
> - Larger amounts of TLBs are used for the OS. Currently we are trying to
> use the maximum mappable TLBs to reduce their numbers. This presumably
> means using 4K TLBs for all slab access.

Yes, we are using 4K pages for the slab mappings which is going to increase
TLB pressure. I also tried writing a version of the patch that uses 2M
pages which had slightly better performance, but that had its own problems.
For example most slabs are much smaller than 2M, so we would need to create
and map multiple slabs at once and we wouldn't be able to release the
physical memory until all slabs in the 2M page are unused which increases
fragmentation.

> - Memory may not be physically contiguous which may be required by some
> drivers doing DMA.

In the current implementation each slab is backed by physically contiguous
memory, but different slabs that are adjacent in virtual memory might not
be physically contiguous. Treating objects allocated from two different
slabs as one contiguous chunk of memory is probably wrong anyway, right?

--
Matteo