Re: [PATCH V2 4/4] arm/xen: Read extended regions from DT and init Xen resource

From: Stefano Stabellini
Date: Thu Nov 18 2021 - 20:19:53 EST


On Wed, 10 Nov 2021, Oleksandr wrote:
> On 28.10.21 04:40, Stefano Stabellini wrote:
>
> Hi Stefano
>
> I am sorry for the late response.
>
> > On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
> > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>
> > >
> > > This patch implements arch_xen_unpopulated_init() on Arm where
> > > the extended regions (if any) are gathered from DT and inserted
> > > into passed Xen resource to be used as unused address space
> > > for Xen scratch pages by unpopulated-alloc code.
> > >
> > > The extended region (safe range) is a region of guest physical
> > > address space which is unused and could be safely used to create
> > > grant/foreign mappings instead of wasting real RAM pages from
> > > the domain memory for establishing these mappings.
> > >
> > > The extended regions are chosen by the hypervisor at the domain
> > > creation time and advertised to it via "reg" property under
> > > hypervisor node in the guest device-tree. As region 0 is reserved
> > > for grant table space (always present), the indexes for extended
> > > regions are 1...N.
> > >
> > > If arch_xen_unpopulated_init() fails for some reason the default
> > > behaviour will be restored (allocate xenballooned pages).
> > >
> > > This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86.
> > >
> > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>
> > > ---
> > > Changes RFC -> V2:
> > > - new patch, instead of
> > > "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide
> > > unallocated space"
> > > ---
> > > arch/arm/xen/enlighten.c | 112
> > > +++++++++++++++++++++++++++++++++++++++++++++++
> > > drivers/xen/Kconfig | 2 +-
> > > 2 files changed, 113 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> > > index dea46ec..1a1e0d3 100644
> > > --- a/arch/arm/xen/enlighten.c
> > > +++ b/arch/arm/xen/enlighten.c
> > > @@ -62,6 +62,7 @@ static __read_mostly unsigned int xen_events_irq;
> > > static phys_addr_t xen_grant_frames;
> > > #define GRANT_TABLE_INDEX 0
> > > +#define EXT_REGION_INDEX 1
> > > uint32_t xen_start_flags;
> > > EXPORT_SYMBOL(xen_start_flags);
> > > @@ -303,6 +304,117 @@ static void __init xen_acpi_guest_init(void)
> > > #endif
> > > }
> > > +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
> > > +int arch_xen_unpopulated_init(struct resource *res)
> > > +{
> > > + struct device_node *np;
> > > + struct resource *regs, *tmp_res;
> > > + uint64_t min_gpaddr = -1, max_gpaddr = 0;
> > > + unsigned int i, nr_reg = 0;
> > > + struct range mhp_range;
> > > + int rc;
> > > +
> > > + if (!xen_domain())
> > > + return -ENODEV;
> > > +
> > > + np = of_find_compatible_node(NULL, NULL, "xen,xen");
> > > + if (WARN_ON(!np))
> > > + return -ENODEV;
> > > +
> > > + /* Skip region 0 which is reserved for grant table space */
> > > + while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL, NULL))
> > > + nr_reg++;
> > > + if (!nr_reg) {
> > > + pr_err("No extended regions are found\n");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL);
> > > + if (!regs)
> > > + return -ENOMEM;
> > > +
> > > + /*
> > > + * Create resource from extended regions provided by the hypervisor to
> > > be
> > > + * used as unused address space for Xen scratch pages.
> > > + */
> > > + for (i = 0; i < nr_reg; i++) {
> > > + rc = of_address_to_resource(np, i + EXT_REGION_INDEX,
> > > &regs[i]);
> > > + if (rc)
> > > + goto err;
> > > +
> > > + if (max_gpaddr < regs[i].end)
> > > + max_gpaddr = regs[i].end;
> > > + if (min_gpaddr > regs[i].start)
> > > + min_gpaddr = regs[i].start;
> > > + }
> > > +
> > > + /* Check whether the resource range is within the hotpluggable range
> > > */
> > > + mhp_range = mhp_get_pluggable_range(true);
> > > + if (min_gpaddr < mhp_range.start)
> > > + min_gpaddr = mhp_range.start;
> > > + if (max_gpaddr > mhp_range.end)
> > > + max_gpaddr = mhp_range.end;
> > > +
> > > + res->start = min_gpaddr;
> > > + res->end = max_gpaddr;
> > > +
> > > + /*
> > > + * Mark holes between extended regions as unavailable. The rest of
> > > that
> > > + * address space will be available for the allocation.
> > > + */
> > > + for (i = 1; i < nr_reg; i++) {
> > > + resource_size_t start, end;
> > > +
> > > + start = regs[i - 1].end + 1;
> > > + end = regs[i].start - 1;
> > > +
> > > + if (start > (end + 1)) {
> > Should this be:
> >
> > if (start >= end)
> >
> > ?
>
> Yes, we can do this here (since the checks are equivalent) but ...
>
> > > + rc = -EINVAL;
> > > + goto err;
> > > + }
> > > +
> > > + /* There is no hole between regions */
> > > + if (start == (end + 1))
> > Also here, shouldn't it be:
> >
> > if (start == end)
> >
> > ?
>
>    ... not here.
>
> As
>
> "(start == (end + 1))" is equal to "(regs[i - 1].end + 1 == regs[i].start)"
>
> but
>
> "(start == end)" is equal to "(regs[i - 1].end + 1 == regs[i].start - 1)"

OK. But the check:

if (start >= end)

Actually covers both cases so that's the only check we need?


> >
> > I think I am missing again something in termination accounting :-)
>
> If I understand correctly, we need to follow "end = start + size - 1" rule, so
> the "end" is the last address inside a range, but not the "first" address
> outside of a range))

yeah


> > > + continue;
> > > +
> > > + /* Check whether the hole range is within the resource range
> > > */
> > > + if (start < res->start || end > res->end) {
> > By definition I don't think this check is necessary as either condition
> > is impossible?
>
>
> This is a good question, let me please explain.
> Not all extended regions provided by the hypervisor can be used here. This is
> because the addressable physical memory range for which the linear mapping
> could be created has limits on Arm, and maximum addressable range depends on
> the VA space size (CONFIG_ARM64_VA_BITS_XXX). So we decided to not filter them
> in hypervisor as this logic could be quite complex as different OS may have
> different requirement, etc. This means that we need to make sure that regions
> are within the hotpluggable range to avoid a failure later on when a region is
> pre-validated by the memory hotplug path.
>
> The following code limits the resource range based on that:
>
> +    /* Check whether the resource range is within the hotpluggable range */
> +    mhp_range = mhp_get_pluggable_range(true);
> +    if (min_gpaddr < mhp_range.start)
> +        min_gpaddr = mhp_range.start;
> +    if (max_gpaddr > mhp_range.end)
> +        max_gpaddr = mhp_range.end;
> +
> +    res->start = min_gpaddr;
> +    res->end = max_gpaddr;
>
> In current loop (when calculating and inserting holes) we also need to make
> sure that resulting hole range is within the resource range (and adjust/skip
> it if not true) as regs[] used for the calculations contains raw regions as
> they described in DT so not updated. Otherwise insert_resource() down the
> function will return an error for the conflicting operations. Yes, I could
> took a different route and update regs[] in advance to adjust/skip
> non-suitable regions in front, but I decided to do it on the fly in the loop
> here, I thought doing it in advance would add some overhead/complexity. What
> do you think?

I understand now.


> So I am afraid this check is necessary here.
>
> For example in my environment the extended regions are:
>
> (XEN) Extended region 0: 0->0x8000000
> (XEN) Extended region 1: 0xc000000->0x30000000
> (XEN) Extended region 2: 0x40000000->0x47e00000
> (XEN) Extended region 3: 0xd0000000->0xe6000000
> (XEN) Extended region 4: 0xe7800000->0xec000000
> (XEN) Extended region 5: 0xf1200000->0xfd000000
> (XEN) Extended region 6: 0x100000000->0x500000000
> (XEN) Extended region 7: 0x580000000->0x600000000
> (XEN) Extended region 8: 0x680000000->0x700000000
> (XEN) Extended region 9: 0x780000000->0x10000000000
>
> *With* the check the holes are:
>
> holes [47e00000 - cfffffff]
> holes [e6000000 - e77fffff]
> holes [ec000000 - f11fffff]
> holes [fd000000 - ffffffff]
> holes [500000000 - 57fffffff]
> holes [600000000 - 67fffffff]
> holes [700000000 - 77fffffff]
>
> And they seem to look correct, you can see that two possible holes between
> extended regions 0-1 (8000000-bffffff) and 1-2 (30000000-3fffffff) were
> skipped as they entirely located below res->start
> which is 0x40000000 in my case (48-bit VA: 0x40000000 - 0x80003fffffff).
>
> *Without* the check these two holes won't be skipped and as the result
> insert_resource() will fail.
>
>
> **********
>
>
> I have one idea how we can simplify filter logic, we can drop all checks here
> (including confusing one) in Arm code and update common code a bit:
>
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index 1a1e0d3..ed5b855 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -311,7 +311,6 @@ int arch_xen_unpopulated_init(struct resource *res)
>         struct resource *regs, *tmp_res;
>         uint64_t min_gpaddr = -1, max_gpaddr = 0;
>         unsigned int i, nr_reg = 0;
> -       struct range mhp_range;
>         int rc;
>
>         if (!xen_domain())
> @@ -349,13 +348,6 @@ int arch_xen_unpopulated_init(struct resource *res)
>                         min_gpaddr = regs[i].start;
>         }
>
> -       /* Check whether the resource range is within the hotpluggable range
> */
> -       mhp_range = mhp_get_pluggable_range(true);
> -       if (min_gpaddr < mhp_range.start)
> -               min_gpaddr = mhp_range.start;
> -       if (max_gpaddr > mhp_range.end)
> -               max_gpaddr = mhp_range.end;
> -
>         res->start = min_gpaddr;
>         res->end = max_gpaddr;
>
> @@ -378,17 +370,6 @@ int arch_xen_unpopulated_init(struct resource *res)
>                 if (start == (end + 1))
>                         continue;
>
> -               /* Check whether the hole range is within the resource range
> */
> -               if (start < res->start || end > res->end) {
> -                       if (start < res->start)
> -                               start = res->start;
> -                       if (end > res->end)
> -                               end = res->end;
> -
> -                       if (start >= (end + 1))
> -                               continue;
> -               }
> -
>                 tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
>                 if (!tmp_res) {
>                         rc = -ENOMEM;
> diff --git a/drivers/xen/unpopulated-alloc.c b/drivers/xen/unpopulated-alloc.c
> index 1f1d8d8..a5d3ebb 100644
> --- a/drivers/xen/unpopulated-alloc.c
> +++ b/drivers/xen/unpopulated-alloc.c
> @@ -39,6 +39,7 @@ static int fill_list(unsigned int nr_pages)
>         void *vaddr;
>         unsigned int i, alloc_pages = round_up(nr_pages, PAGES_PER_SECTION);
>         int ret;
> +       struct range mhp_range;
>
>         res = kzalloc(sizeof(*res), GFP_KERNEL);
>         if (!res)
> @@ -47,8 +48,10 @@ static int fill_list(unsigned int nr_pages)
>         res->name = "Xen scratch";
>         res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>
> +       mhp_range = mhp_get_pluggable_range(true);
> +
>         ret = allocate_resource(target_resource, res,
> -                               alloc_pages * PAGE_SIZE, 0, -1,
> +                               alloc_pages * PAGE_SIZE, mhp_range.start,
> mhp_range.end,
>                                 PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
>         if (ret < 0) {
>                 pr_err("Cannot allocate new IOMEM resource\n");
> (END)
>
> I believe, this will work on x86 as arch_get_mappable_range() is not
> implemented there,
> and the default option contains exactly what being used currently (0, -1).
>
> struct range __weak arch_get_mappable_range(void)
> {
>     struct range mhp_range = {
>         .start = 0UL,
>         .end = -1ULL,
>     };
>     return mhp_range;
> }
>
> And this is going to be more generic and clear, what do you think?

Yeah this is much better, good thinking!