Re: [RFC PATCH] arm64: mm: Fix kernel page tables incorrectly deleted during memory removal

From: David Hildenbrand
Date: Wed Jul 26 2023 - 04:00:32 EST


On 26.07.23 08:20, mawupeng wrote:


On 2023/7/24 14:11, David Hildenbrand wrote:
On 24.07.23 07:54, Anshuman Khandual wrote:


On 7/24/23 06:55, mawupeng wrote:

On 2023/7/21 18:36, Will Deacon wrote:
On Mon, Jul 17, 2023 at 07:51:50PM +0800, Wupeng Ma wrote:
From: Ma Wupeng <mawupeng1@xxxxxxxxxx>

During our test, we found that kernel page table may be unexpectedly
cleared with rodata off. The root cause is that the kernel page is
initialized with pud size(1G block mapping) while offline is memory
block size(MIN_MEMORY_BLOCK_SIZE 128M), eg, if 2G memory is hot-added,
when offline a memory block, the call trace is shown below,

Is someone adding memory in 2 GiB granularity and then removing parts of it in 128 MiB granularity? That would be against what we support using the add_memory() / offline_and_remove_memory() API and that driver should be fixed instead.

Yes, this kind of situation.

The problem occurs in the following scenarios:
1. use mem=xxG to reserve memory.
2. add_momory to online memory.
3. offline part of the memroy via offline_and_remove_memory.

During my research, ACPI memory removal use memory_subsys_offline to offline memory section and
this will not delete page table entry which do not trigger this kind of problem.

So I understand what you are talking about.
1. 3rd-party driver shouldn't use add_memory/offline_and_remove_memory to online/offline memory.
If it have to use, this can be achieved by driver.
2. memory_subsys_offline is perfered to do such thing.

No, my point is that

1) If you use add_memory() and offline_and_remove_memory() in the *same
granularity* it has to be working, otherwise it has to be fixed.

2) If you use add_memory() and offline_and_remove_memory() in different
granularity (especially, add_memory() in bigger granularity) , then
change your code to do add_memory() in the same granularity.


If you run into 1), then we populated a PUD for boot memory that also covers yet unpopulated physical memory ranges that are later populated by add_memory(). If that's the case, then we can either fix it by

a) Not doing that. Use PMD tables instead for that piece of memory.

b) Detecting that that PUD still covers memory and refusing to remove
that PUD.

c) Rejecting to hotadd memory in this situation at that location. We
have mhp_get_pluggable_range() -> arch_get_mappable_range() to kind-
of handle something like that.

--
Cheers,

David / dhildenb