Re: [PATCH] x86_64, kexec: Avoid unnecessary identity mappings for kdump

From: Xunlei Pang
Date: Sun Mar 19 2017 - 22:20:12 EST


On 03/18/2017 at 01:38 AM, Eric W. Biederman wrote:
> Xunlei Pang <xlpang@xxxxxxxxxx> writes:
>
>> kexec setups identity mappings for all the memory mapped in 1st kernel,
>> this is not necessary for the kdump case. Actually it can cause extra
>> memory consumption for paging structures, which is quite considerable
>> on modern machines with huge memory.
>>
>> E.g. On our 24TB machine, it will waste around 96MB (around 4MB/TB)
>> from the reserved memory range if setting all the identity mappings.
>>
>> It also causes some trouble for distributions that use an intelligent
>> policy to evaluate the proper "crashkernel=X" for users.
>>
>> To solve it, in case of kdump, we only setup identity mappings for the
>> crash memory and the ISA memory(may be needed by purgatory/kdump
>> boot).
> How about instead we detect the presence of 1GiB pages and use them
> if they are available. We already use 2MiB pages. If we can do that
> we will only need about 192K for page tables in the case you have
> described and this all becomes a non-issue.
>
> I strongly suspect that the presence of 24TiB of memory in an x86 system
> strongly correlates to the presence of 1GiB pages.
>
> In principle we certainly can use a less extensive mapping but that
> should not be something that differs between the two kexec cases.

Ok, will try gbpages for the identity mapping.

Regards,
Xunlei

> I can see forcing the low 1MiB range in. But calling it ISA range is
> very wrong and misleading. The reasons that range are special during
> boot-up have nothing to do with ISA. But have everything to do with
> where legacy page tables are mapped, and where we need identity pages to
> start other cpus. I think the only user that actually cares is
> purgatory where it plays swapping games with the low 1MiB because we
> can't preload what we need to down there or it would mess up the running
> kernel. So saying anything about the old ISA bus is wrong and
> misleading. At the very very least we need accurate comments.
>
> Eric
>
>
>> Signed-off-by: Xunlei Pang <xlpang@xxxxxxxxxx>
>> ---
>> arch/x86/kernel/machine_kexec_64.c | 34 ++++++++++++++++++++++++++++++----
>> 1 file changed, 30 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
>> index 857cdbd..db77a76 100644
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -112,14 +112,40 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>>
>> level4p = (pgd_t *)__va(start_pgtable);
>> clear_page(level4p);
>> - for (i = 0; i < nr_pfn_mapped; i++) {
>> - mstart = pfn_mapped[i].start << PAGE_SHIFT;
>> - mend = pfn_mapped[i].end << PAGE_SHIFT;
>>
>> + if (image->type == KEXEC_TYPE_CRASH) {
>> + /* Always map the ISA range */
>> result = kernel_ident_mapping_init(&info,
>> - level4p, mstart, mend);
>> + level4p, 0, ISA_END_ADDRESS);
>> if (result)
>> return result;
>> +
>> + /* crashk_low_res may not be initialized when reaching here */
>> + if (crashk_low_res.end) {
>> + mstart = crashk_low_res.start;
>> + mend = crashk_low_res.end + 1;
>> + result = kernel_ident_mapping_init(&info,
>> + level4p, mstart, mend);
>> + if (result)
>> + return result;
>> + }
>> +
>> + mstart = crashk_res.start;
>> + mend = crashk_res.end + 1;
>> + result = kernel_ident_mapping_init(&info,
>> + level4p, mstart, mend);
>> + if (result)
>> + return result;
>> + } else {
>> + for (i = 0; i < nr_pfn_mapped; i++) {
>> + mstart = pfn_mapped[i].start << PAGE_SHIFT;
>> + mend = pfn_mapped[i].end << PAGE_SHIFT;
>> +
>> + result = kernel_ident_mapping_init(&info,
>> + level4p, mstart, mend);
>> + if (result)
>> + return result;
>> + }
>> }
>>
>> /*