Re: [PATCH 0/6] Boot-time switching between 4- and 5-level paging for 4.15, Part 1

From: Ingo Molnar
Date: Tue Oct 31 2017 - 05:47:40 EST



* Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> wrote:

> On Thu, Oct 26, 2017 at 09:37:52AM +0200, Ingo Molnar wrote:
> >
> > * Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> wrote:
> >
> > > On Tue, Oct 24, 2017 at 02:47:41PM +0200, Ingo Molnar wrote:
> > > > > > > > > > Making a variable that 'looks' like a constant macro dynamic in a rare Kconfig
> > > > > > > > > > scenario is asking for trouble.
> > > > > > > > >
> > > > > > > > > We expect boot-time page mode switching to be enabled in kernel of next
> > > > > > > > > generation enterprise distros. It shoudn't be that rare.
> > > > > > > >
> > > > > > > > My point remains even with not-so-rare Kconfig dependency.
> > > > > > >
> > > > > > > I don't follow how introducing new variable that depends on Kconfig option
> > > > > > > would help with the situation.
> > > > > >
> > > > > > A new, properly named variable or function (max_physmem_bits or
> > > > > > max_physmem_bits()) that is not all uppercase would make it abundantly clear that
> > > > > > it is not a constant but a runtime value.
> > > > >
> > > > > Would we need to rename every uppercase macros that would depend on
> > > > > max_physmem_bits()? Like MAXMEM.
> > > >
> > > > MAXMEM isn't used in too many places either - what's the total impact of it?
> > >
> > > The impact is not very small. The tree of macros dependent on
> > > MAX_PHYSMEM_BITS:
> > >
> > > MAX_PHYSMEM_BITS
> > > MAXMEM
> > > KEXEC_SOURCE_MEMORY_LIMIT
> > > KEXEC_DESTINATION_MEMORY_LIMIT
> > > KEXEC_CONTROL_MEMORY_LIMIT
> > > SECTIONS_SHIFT
> > > ZONEID_SHIFT
> > > ZONEID_PGSHIFT
> > > ZONEID_MASK
> > >
> > > The total number of users of them is not large. It's doable. But I expect
> > > it to be somewhat ugly, since we're partly in generic code and it would
> > > require some kind of compatibility layer for other archtectures.
> > >
> > > Do you want me to rename them all?
> >
> > Yeah, I think these former constants should be organized better.
> >
> > Here's their usage frequency:
> >
> > triton:~/tip> for N in MAX_PHYSMEM_BITS MAXMEM KEXEC_SOURCE_MEMORY_LIMIT \
> > KEXEC_DESTINATION_MEMORY_LIMIT KEXEC_CONTROL_MEMORY_LIMIT SECTIONS_SHIFT \
> > ZONEID_SHIFT ZONEID_PGSHIFT ZONEID_MASK; do printf " %-40s: " $N; git grep -w $N | grep -vE 'define| \* ' | wc -l; done
> >
> > MAX_PHYSMEM_BITS : 10
> > MAXMEM : 5
> > KEXEC_SOURCE_MEMORY_LIMIT : 2
> > KEXEC_DESTINATION_MEMORY_LIMIT : 2
> > KEXEC_CONTROL_MEMORY_LIMIT : 2
> > SECTIONS_SHIFT : 2
> > ZONEID_SHIFT : 1
> > ZONEID_PGSHIFT : 1
> > ZONEID_MASK : 1
> >
> > So it's not too bad to clean up, I think.
> >
> > How about something like this:
> >
> > machine.physmem.max_bytes /* ex MAXMEM */
> > machine.physmem.max_bits /* bit count of the highest in-use physical address */
> > machine.physmem.zones.id_shift /* ZONEID_SHIFT */
> > machine.physmem.zones.pg_shift /* ZONEID_PGSHIFT */
> > machine.physmem.zones.id_mask /* ZONEID_MASK */
> >
> > machine.kexec.physmem_bytes_src /* KEXEC_SOURCE_MEMORY_LIMIT */
> > machine.kexec.physmem_bytes_dst /* KEXEC_DESTINATION_MEMORY_LIMIT */
> >
> > ( With perhaps 'physmem' being an alias to '&machine->physmem', so that
> > physmem->max_bytes and physmem->max_bits would be a natural thing to write. )
> >
> > I'd suggest doing this in a finegrained fashion, one step at a time, introducing
> > 'struct machine' and 'struct physmem' and extending it gradually with new fields.
>
> I don't think this design is reasonable.
>
> - It introduces memory references where we haven't had them before.
>
> At this point all variable would fit a cache line, which is not that
> bad. But I don't see what would stop the list from growing in the
> future.

Is any of these actually in a hotpath?

Also, note the context: your changes turn some of these into variables. Yes, I
suggest structuring them all and turning them all into variables, exactly because
the majority are now dynamic, yet their _naming_ suggests that they are constants.

> - We loose ability to optimize out change with static branches
> (cpu_feature_enabled() instead of pgtable_l5_enabled variable).
>
> It's probably, not that big of an issue here, but if we are going to
> use the same approach for other dynamic macros in the patchset, it
> might be.

Here too I think the (vast) majority of the uses here are for bootup/setup/init
purposes, where clarity and maintainability of code matters a lot.

> - AFAICS, it requires changes to all architectures to provide such
> structures as we now partly in generic code.
>
> Or to introduce some kind of compatibility layer, but it would make
> the kernel as a whole uglier than cleaner. Especially, given that
> nobody beyond x86 need this.

Yes, all the uses should be harmonized (no compatibility layer) - but as you can
see it from the histogram I generated it's a few dozen uses, i.e. not too bad.

Thanks,

Ingo