Re: [PATCH v2 3/3] powerpc/mm/hash: Avoid multiple HPT resize-downs on memory hotunplug

From: David Gibson
Date: Wed Jun 09 2021 - 02:09:17 EST


On Wed, Jun 09, 2021 at 02:30:36AM -0300, Leonardo Brás wrote:
> On Mon, 2021-06-07 at 15:20 +1000, David Gibson wrote:
> > On Fri, Apr 30, 2021 at 11:36:10AM -0300, Leonardo Bras wrote:
> > > During memory hotunplug, after each LMB is removed, the HPT may be
> > > resized-down if it would map a max of 4 times the current amount of
> > > memory.
> > > (2 shifts, due to introduced histeresis)
> > >
> > > It usually is not an issue, but it can take a lot of time if HPT
> > > resizing-down fails. This happens  because resize-down failures
> > > usually repeat at each LMB removal, until there are no more bolted
> > > entries
> > > conflict, which can take a while to happen.
> > >
> > > This can be solved by doing a single HPT resize at the end of
> > > memory
> > > hotunplug, after all requested entries are removed.
> > >
> > > To make this happen, it's necessary to temporarily disable all HPT
> > > resize-downs before hotunplug, re-enable them after hotunplug ends,
> > > and then resize-down HPT to the current memory size.
> > >
> > > As an example, hotunplugging 256GB from a 385GB guest took 621s
> > > without
> > > this patch, and 100s after applied.
> > >
> > > Signed-off-by: Leonardo Bras <leobras.c@xxxxxxxxx>
> >
> > Hrm.  This looks correct, but it seems overly complicated.
> >
> > AFAICT, the resize calls that this adds should in practice be the
> > *only* times we call resize, all the calls from the lower level code
> > should be suppressed. 
>
> That's correct.
>
> > In which case can't we just remove those calls
> > entirely, and not deal with the clunky locking and exclusion here.
> > That should also remove the need for the 'shrinking' parameter in
> > 1/3.
>
>
> If I get your suggestion correctly, you suggest something like:
> 1 - Never calling resize_hpt_for_hotplug() in
> hash__remove_section_mapping(), thus not needing the srinking
> parameter.
> 2 - Functions in hotplug-memory.c that call dlpar_remove_lmb() would in
> fact call another function to do the batch resize_hpt_for_hotplug() for
> them

Basically, yes.

> If so, that assumes that no other function that currently calls
> resize_hpt_for_hotplug() under another path, or if they do, it does not
> need to actually resize the HPT.
>
> Is the above correct?
>
> There are some examples of functions that currently call
> resize_hpt_for_hotplug() by another path:
>
> add_memory_driver_managed
> virtio_mem_add_memory
> dev_dax_kmem_probe

Oh... virtio-mem. I didn't think of that.


> reserve_additional_memory
> balloon_process
> add_ballooned_pages

AFAICT this comes from drivers/xen, and Xen has never been a thing on
POWER.

> __add_memory
> probe_store

So this is a sysfs triggered memory add. If the user is doing this
manually, then I think it's reasonable for them to manually manage the
HPT size as well, which they can do through debugfs. I think it might
also be used my drmgr under pHyp, but pHyp doesn't support HPT
resizing.

> __remove_memory
> pseries_remove_memblock

Huh, this one comes through OF_RECONFIG_DETACH_NODE. I don't really
know when those happen, but I strongly suspect it's only under pHyp
again.

> remove_memory
> dev_dax_kmem_remove
> virtio_mem_remove_memory

virtio-mem again.

> memunmap_pages
> pci_p2pdma_add_resource
> virtio_fs_setup_dax

And virtio-fs in dax mode. Didn't think of that either.


Ugh, yeah, I'm used to the world where the platform provides the only
way of hotplugging memory, but virtio-mem does indeed provide another
one, and we could indeed need to manage the HPT size based on that.
Drat, so moving all the HPT resizing handling up into
pseries/hotplug-memory.c won't work.

I still think we can simplify the communication between the stuff in
the pseries hotplug code and the actual hash resizing. In your draft
there are kind of 3 ways the information is conveyed: the mutex
suppresses HPT shrinks, pre-growing past what we need prevents HPT
grows, and the 'shrinking' flag handles some edge cases.

I suggest instead a single flag that will suppress all the current
resizes. Not sure it technically has to be an atomic mutex, but
that's probably the obvious safe choice. Then have a "resize up to
target" and "resize down to target" that ignore that suppression and
are no-ops if the target is in the other direction.
Then you should be able to make the path for pseries hotplugs be:

suppress other resizes

resize up to target

do the actual adds or removes

resize down to target

unsuppress other resizes


--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature