Re: [PATCH 5/8] x86/clear_page: add clear_page_uncached()

From: Ankur Arora
Date: Wed Oct 14 2020 - 23:22:33 EST


On 2020-10-14 2:07 p.m., Andy Lutomirski wrote:



On Oct 14, 2020, at 12:58 PM, Borislav Petkov <bp@xxxxxxxxx> wrote:

On Wed, Oct 14, 2020 at 08:45:37AM -0700, Andy Lutomirski wrote:
On Wed, Oct 14, 2020 at 1:33 AM Ankur Arora <ankur.a.arora@xxxxxxxxxx> wrote:

Define clear_page_uncached() as an alternative_call() to clear_page_nt()
if the CPU sets X86_FEATURE_NT_GOOD and fallback to clear_page() if it
doesn't.

Similarly define clear_page_uncached_flush() which provides an SFENCE
if the CPU sets X86_FEATURE_NT_GOOD.

As long as you keep "NT" or "MOVNTI" in the names and keep functions
in arch/x86, I think it's reasonable to expect that callers understand
that MOVNTI has bizarre memory ordering rules. But once you give
something a generic name like "clear_page_uncached" and stick it in
generic code, I think the semantics should be more obvious.

Why does it have to be a separate call? Why isn't it behind the
clear_page() alternative machinery so that the proper function is
selected at boot? IOW, why does a user of clear_page functionality need
to know at all about an "uncached" variant?

I assume it’s for a little optimization of clearing more than one page
per SFENCE.

In any event, based on the benchmark data upthread, we only want to do
NT clears when they’re rather large, so this shouldn’t be just an
alternative. I assume this is because a page or two will fit in cache
and, for most uses that allocate zeroed pages, we prefer cache-hot
pages. When clearing 1G, on the other hand, cache-hot is impossible
and we prefer the improved bandwidth and less cache trashing of NT
clears.

Also, if we did extend clear_page() to take the page-size as parameter
we still might not have enough information (ex. a 4K or a 2MB page that
clear_page() sees could be part of a GUP of a much larger extent) to
decide whether to go uncached or not.

Perhaps SFENCE is so fast that this is a silly optimization, though,
and we don’t lose anything measurable by SFENCEing once per page.
Alas, no. I tried that and dropped about 15% performance on Rome.

Thanks
Ankur