Re: [PATCH v2 1/3] mm,page_alloc: Use {get,put}_online_mems() to get stable zone's values

From: David Hildenbrand
Date: Mon Jun 07 2021 - 04:49:07 EST

Next message: Borislav Petkov: "Re: [patch V2 02/14] x86/fpu: Prevent state corruption in __fpu__restore_sig()"
Previous message: Jan Kara: "Re: [PATCH v7 1/6] writeback, cgroup: do not switch inodes with I_WILL_FREE flag"
In reply to: Oscar Salvador: "Re: [PATCH v2 1/3] mm,page_alloc: Use {get,put}_online_mems() to get stable zone's values"
Next in thread: Oscar Salvador: "Re: [PATCH v2 1/3] mm,page_alloc: Use {get,put}_online_mems() to get stable zone's values"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 07.06.21 09:52, Oscar Salvador wrote:

On Fri, Jun 04, 2021 at 09:41:45AM +0200, Oscar Salvador wrote:

On Thu, Jun 03, 2021 at 02:45:13PM +0200, Michal Hocko wrote:

I believe we need to define the purpose of the locking first. The

If you ask me, this locking would be meant to make sure zone's zone_start_pfn
or spanned_pages do not change under us, in case we __need__ the value to be
stable.

existing locking doesn't serve much purpose, does it? The state might

Well, half-way. Currently, the locking is taken in write mode whenever
the zone is expanded or shrinked, and in read mode when called from
bad_range()->page_outside_zone_boundaries() (only on VM_DEBUG).

But as you pointed out, such state might change right after the locking is
released and all the work would be for nothing.
So indeed, the __whole__ operation should be envolved by the lock in the caller
The way that stands right now is not optimal.

change right after the lock is released and the caller cannot really
rely on the result. So aside of the current implementation, I would
argue that any locking has to be be done on the caller layer.

But the primary question is whether anybody actually cares about
potential races in the first place.

I have been checking move_freepages_block() and alloc_contig_pages(), which
are two of the functions that call zone_spans_pfn().

move_freepages_block() uses it in a way to align the given pfn to pageblock
top and bottom, and then check that aligned pfns are still within the same zone.
From a memory-hotplug perspective that's ok as we know that we are offlining
PAGES_PER_SECTION (which implies whole pageblocks).

alloc_contig_pages() (used by the hugetlb gigantic allocator) runs through a
node's zonelist and checks whether zone->zone_start_pfn + nr_pages stays within
the same zone.
IMHO, the race with zone_spans_last_pfn() vs mem-hotplug would not be that bad,
as it will be caught afters by e.g: __alloc_contig_pages when pages cannot be
isolated because they are offline etc.

So, I would say we do not really need the lock, but I might be missing something.
But if we chose to care about this, then the locking should be done right, not
half-way as it is right now.

Any thoughts on this? :-)

I'd like to point out that I think the seqlock is not in place to synchronize with actual growing/shrinking but to get consistent zone ranges -- like using atomics, but we have two inter-dependent values here.

If you obtain the zone ranges that way and properly use pfn_to_online_page(), there is hardly something that can go wrong in practice. If the zone grew in the meantime, most probably you can just live with not processing that part for now. If the zone shrunk in the meantime, pfn_to_online_page() will make you skip that part (it was offlined either way, so you most probably don't really care about that part).

[pfn_to_online_page() is racy as well, but the race window is very small and we never saw a problem in practice really]

Without the seqlock, you might just get a garbage zone range and have either false/positive negatives when just testing for a simple range not in an hot(un)plugged range [which is the usual case when talking about compaction etc.].

--
Thanks,

David / dhildenb

Next message: Borislav Petkov: "Re: [patch V2 02/14] x86/fpu: Prevent state corruption in __fpu__restore_sig()"
Previous message: Jan Kara: "Re: [PATCH v7 1/6] writeback, cgroup: do not switch inodes with I_WILL_FREE flag"
In reply to: Oscar Salvador: "Re: [PATCH v2 1/3] mm,page_alloc: Use {get,put}_online_mems() to get stable zone's values"
Next in thread: Oscar Salvador: "Re: [PATCH v2 1/3] mm,page_alloc: Use {get,put}_online_mems() to get stable zone's values"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]