Re: [PATCH] remove the initrd resource in /proc/iomem as the initrdhas freed the reserved memblock.

From: James Morse
Date: Wed Jul 03 2019 - 05:43:23 EST


Hi,

On 03/07/2019 10:16, huang.junhua@xxxxxxxxxx wrote:
>> On 02/07/2019 11:34, Yi Wang wrote:
>>> From: Junhua Huang <huang.junhua@xxxxxxxxxx>
>>> The 'commit 50d7ba36b916 ("arm64: export memblock_reserve()d regions via /proc/iomem")'
>>> show the reserved memblock in /proc/iomem. But the initrd's reserved memblock
>>> will be freed in free_initrd_mem(), which executes after the reserve_memblock_reserved_regions().
>>> So there are some incorrect information shown in /proc/iomem. e.g.:
>>> 80000000-bbdfffff : System RAM
>>> 80080000-813effff : Kernel code
>>> 813f0000-8156ffff : reserved
>>> 81570000-817fcfff : Kernel data
>>> 83400000-83ffffff : reserved
>>> 90000000-90004fff : reserved
>>> b0000000-b2618fff : reserved
>>> b8c00000-bbbfffff : reserved
>>> In this case, the range from b0000000 to b2618fff is reserved for initrd, which should be
>>> clean from the resource tree after it was freed.
>>
>> (There was some discussion about this over-estimate on the list, but it didn't make it
>> into the commit message.) I think a reserved->free change is fine. If user-space thinks
>> its still reserved nothing bad happens.

>>> As kexec-tool will collect the iomem reserved info
>>> and use it in second kernel, which causes error message generated a second time.

>> What error message?

> Sorry, it's my mistake. The kexec-tool could not use iomem reserved info in the second kernel.
> The error message I mean is that the initrd reserved memblock region will be shown in
> second kernel /proc/iomem. But this message comes from the dtb's memreserve node,
> not the first kernel /proc/iomem.

This doesn't sound right.
Is kexec-tool spraying anything reserved in /proc/iomem into the DT as memreserve?


These top-level 'nomap' and second-level 'reserved' entries exist to stop kexec-tools
trying to write the new kernel over the top of something important. This only matters
between 'load' and 'exec' during the #1-kernel:

| kexec-tools reads /proc/iomem.
| kexec-tools tells #1-kernel "I want this 10MB image to be located at 0xf00".
| #1-kernel knows 0xf00 is in use, so it stores the data else where until kexec-time.
[some time passes]
| #1-kernel kexec's, copying the image to 0xf00
| #2-kernel now owns the machine

This goes wrong if 0xf00 belonged to firmware (nomap), or contained something important
(uefi memory map, acpi tables etc).

Once the second kernel has started running it should re-discover where this important
stuff is from the EFI and ACPI tables.

We deliberately over-estimate these second-level reserved regions as its the simplest
thing to do. (e.g. the per-cpu chunk allocations get swept up too)


Does this mean the amount of usable memory in the system reduces each time you kexec? That
shouldn't be true!


Thanks,

James