Re: [PATCH] remove the initrd resource in /proc/iomem as theinitrdhas freed the reserved memblock.

From: James Morse
Date: Fri Jul 05 2019 - 06:51:09 EST


Hi,

(CC: +devicetree list:
memreserving the initrd, which linux then frees causes a zombie memreserve in all future
kexec'd kernels)

On 03/07/2019 12:42, huang.junhua@xxxxxxxxxx wrote:
>>>> On 02/07/2019 11:34, Yi Wang wrote:
>>>>> From: Junhua Huang <huang.junhua@xxxxxxxxxx>
>>>>> The 'commit 50d7ba36b916 ("arm64: export memblock_reserve()d regions via /proc/iomem")'
>>>>> show the reserved memblock in /proc/iomem. But the initrd's reserved memblock
>>>>> will be freed in free_initrd_mem(), which executes after the reserve_memblock_reserved_regions().
>>>>> So there are some incorrect information shown in /proc/iomem. e.g.:
>>>>> 80000000-bbdfffff : System RAM
>>>>> 80080000-813effff : Kernel code
>>>>> 813f0000-8156ffff : reserved
>>>>> 81570000-817fcfff : Kernel data
>>>>> 83400000-83ffffff : reserved
>>>>> 90000000-90004fff : reserved
>>>>> b0000000-b2618fff : reserved
>>>>> b8c00000-bbbfffff : reserved
>>>>> In this case, the range from b0000000 to b2618fff is reserved for initrd, which should be
>>>>> clean from the resource tree after it was freed.
>>>>
>>>> (There was some discussion about this over-estimate on the list, but it didn't make it
>>>> into the commit message.) I think a reserved->free change is fine. If user-space thinks
>>>> its still reserved nothing bad happens.
>>
>>>>> As kexec-tool will collect the iomem reserved info
>>>>> and use it in second kernel, which causes error message generated a second time.
>>
>>>> What error message?
>>
>>> Sorry, it's my mistake. The kexec-tool could not use iomem reserved info in the second kernel.
>>> The error message I mean is that the initrd reserved memblock region will be shown in
>>> second kernel /proc/iomem. But this message comes from the dtb's memreserve node,
>>> not the first kernel /proc/iomem.
>>
>> This doesn't sound right.
>> Is kexec-tool spraying anything reserved in /proc/iomem into the DT as memreserve?

> No, it isn't. The kexec-tool could not spray resserved info to DT as memreserve.

(well, it generates the second DT, and it has the reserved /proc/iomem entries on hand)


> After we started the kernel from uboot, the /sys/firmware/fdt in this kernel has been add some infos,
> incluing the dtb memreserve and ramdisk memreserve info.

Aha! Ugh.
arm64_memblock_init() memblock_reserves() this, as does fdt_init_reserved_mem().
But then we memblock_free() it in free_initrd_mem().


> And then we use this fdt as the kexec dtb, and the second kernel would read
> the memreserve node and reserve the memblock.

> So we can see the first kernel initrd reserved info from the second kernel /proc/iomem.

[...]

> This phenomenon is not caused by first kernel /proc/iomem or kexec, but dtb.

Right, so the options are:

(1) Delete the memreserve node from the DT when we free the initrd. I don't think this is
the sort of thing the kernel should be doing.

(2) Memreserving the initramfs implies 'keepinitrd' on the commandline. The DT has told us
to "exclude reserved memory from normal usage". If the bootloader didn't mean for us to
reserve this memory forever, it should drop the memreserve.

(3) kexec-tools determines this memreserve is no longer needed and removes it from the new
DT. It has the details to do this: /sys/firmware/fdt will show the physical addresses of
the initramfs. With your patch /proc/iomem will show that this region is regular memory.
(not covered by any reserved type). This is safe with older kernels.


My preference is 2 then 3.

[...]

> I agree. The kexec-tool will use the second-level reserved info to avoid the load address
> conflict with the important thing, It is true that if the image load address kexec-tools set
> would belong to other important thing, something would go wrong.
> So I think we need clean the initrd reserved info from /proc/iomem, it is useless.

If we've freed the memory and we can update that file, sure.
I think at the time of that patch the assumption was only the arch code does
memblock_reserve() early, so we never need to update /proc/iomem.


Thanks,

James