RE: [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86

From: tarumizu.kohei@xxxxxxxxxxx
Date: Fri Jun 17 2022 - 05:07:41 EST


Hi Linus,

Thanks for the comment.

> OK
>
> > A64FX and some Intel processors have implementation-dependent register
> > for controlling CPU's hardware prefetch behavior. A64FX has
> > IMP_PF_STREAM_DETECT_CTRL_EL0[1], and Intel processors have MSR
> 0x1a4
> > (MSR_MISC_FEATURE_CONTROL)[2].
>
> Hardware prefetch (I guess of memory contents) is a memory hierarchy feature.
>
> Linux has a memory hierarchy manager, conveniently named "mm", developed
> by some of the smartest people I know. The main problem addressed by that is
> paging, but prefetching into the CPU from the next lowest level in the memory
> hierarchy is just another memory hierarchy hardware feature, such as hard
> disks, primary RAM etc.
>
> > These registers cannot be accessed from userspace.
>
> Good. The kernel managed hardware. If the memory hierarchy people have
> userspace now doing stuff behind their back, through some special interface,
> that makes their world more complicated.
>
> This looks like it needs information from the generic memory manager, from the
> scheduler, and possibly all the way down from the block layer to do the right
> thing, so it has no business in userspace.
> Have you seen mm/damon for example? Access to statistics for memory
> access patterns seems really useful for tuning the behaviour of this hardware.
> Just my €0.01.

Thank you for the information. I will see if mm/damon statistics can
be used for tuning.

> If it does interact with userspace I suppose it should be using control groups,
> like everything else of this type, see e.g. mm/memcontrol.c, not custom sysfs
> files.

Hardware prefetch registers exist for each core, and the settings are
independent for each cache. Therefore, currently, I created it under
/sys/devices/system/cpu/cpu*/cache/index*.
However, when user actually configure it for an application, they may
want to set it on a per-process basis. Considering that, I think
control groups is suitable for this usage.

For example, is your idea of interface like the following?

```
/sys/fs/cgroup/memory/memory.hardware_prefetcher.enable
```

Cpuset controller has information about which CPU a process belonging
to a group is bound to, so maybe cpuset controller is more appropriate.

Control groups has hierarchical structure, so it is necessary to
consider whether they can map hardware prefetch behavior well.
Currentry I have two concerns.
First, upper hierarchy contains the same CPU as the lower hierarchy.
In this case, it may not be possible to configure independent setting
in each hierarchy.
Next, context switch considerations. This function rewrites the
value of the register that exists for each core. Therefore, the
register value must be changed at the timing of the context switch
with a process belonging to a different group.

> Just an example from one of the patches:
>
> + - "* Adjacent Cache Line Prefetcher Disable (R/W)"
> + corresponds to the
> "adjacent_cache_line_prefetcher_enable"
>
> I might only be on "a little knowledge is dangerous" on the memory manager
> topics, but I know for sure that they at times adjust the members of structs to fit
> nicely on cache lines. And now this? It looks really useful for kernel machinery
> that know very well what needs to go into the cache line next and when.
>
> Talk to the people on linux-mm and memory maintainer Andrew Morton on how
> to do this right, it's a really interesting feature! Also given that people say that
> the memory hierarchy is an important part in the performance of the Apple
> M1 (M2) silicon, I expect that machine to have this too?

I think this proposal will be useful for users, so I will proceed
with concrete studies and talk to the MM people.