Re: [PATCH 1/1] of: address: Fix address translation when address-size is greater than 2

From: Rob Herring
Date: Tue Oct 03 2023 - 17:12:42 EST


On Tue, Oct 3, 2023 at 1:53 AM Herve Codina <herve.codina@xxxxxxxxxxx> wrote:
>
> With the recent addition of of_pci_prop_ranges() in commit 407d1a51921e
> ("PCI: Create device tree node for bridge"), the ranges property can
> have a 3 cells child address, a 3 cells parent address and a 2 cells
> child size.

Sigh. I'm starting to regret applying this for 6.6... You failed to Cc
the AMD folks too. Lizhi now added.

What's different here from the test cases? The having 3 cells in
parent and child?

>
> A range item property for a PCI device is filled as follow:
> <BAR_nbr> 0 0 <phys.hi> <phys.mid> <phys.low> <BAR_sizeh> <BAR_sizel>
> <-- Child --> <-- Parent (PCI definition) --> <- BAR size (64bit) -->
>
> This allow to translate BAR addresses from the DT. For instance:
> pci@0,0 {
> #address-cells = <0x03>;
> #size-cells = <0x02>;
> device_type = "pci";
> compatible = "pci11ab,100\0pciclass,060400\0pciclass,0604";
> ranges = <0x82000000 0x00 0xe8000000
> 0x82000000 0x00 0xe8000000
> 0x00 0x4400000>;
> ...
> dev@0,0 {
> #address-cells = <0x03>;
> #size-cells = <0x02>;
> compatible = "pci1055,9660\0pciclass,020000\0pciclass,0200";
> /* Translations for BAR0 to BAR5 */
> ranges = <0x00 0x00 0x00 0x82010000 0x00 0xe8000000 0x00 0x2000000
> 0x01 0x00 0x00 0x82010000 0x00 0xea000000 0x00 0x1000000
> 0x02 0x00 0x00 0x82010000 0x00 0xeb000000 0x00 0x800000
> 0x03 0x00 0x00 0x82010000 0x00 0xeb800000 0x00 0x800000
> 0x04 0x00 0x00 0x82010000 0x00 0xec000000 0x00 0x20000
> 0x05 0x00 0x00 0x82010000 0x00 0xec020000 0x00 0x2000>;
> ...
> pci-ep-bus@0 {
> #address-cells = <0x01>;
> #size-cells = <0x01>;
> compatible = "simple-bus";
> /* Translate 0xe2000000 to BAR0 and 0xe0000000 to BAR1 */
> ranges = <0xe2000000 0x00 0x00 0x00 0x2000000
> 0xe0000000 0x01 0x00 0x00 0x1000000>;

Why are you reusing a PCI bus address value for the child bus? I'm
wondering if this is some hackery because the child devices need PCI
addresses to work. What address does a device need for DMA for
example?

Also, I think each BAR should be a separate child. We need to
formalize this BAR addressing in a schema.

> ...
> };
> };
> };
>
> During the translation process, the "default-flags" map() function is
> used to select the matching item in the ranges table and determine the
> address offset from this matching item.
> This map() function simply calls of_read_number() and when address-size
> is greater than 2, the map() function skips the extra high address part
> (ie part over 64bit). This lead to a wrong matching item and a wrong
> offset computation.
> Also during the translation itself, the extra high part related to the
> parent address is not present in the translated address.
>
> Fix the "default-flags" map() and translate() in order to take into
> account the child extra high address part in map() and the parent extra
> high address part in translate() and so having a correct address
> translation for ranges patterns such as the one given in the example
> above.

Please add a test case for this.

>
> Signed-off-by: Herve Codina <herve.codina@xxxxxxxxxxx>
> ---
> drivers/of/address.c | 56 ++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 54 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/of/address.c b/drivers/of/address.c
> index e692809ff822..8665de3a6344 100644
> --- a/drivers/of/address.c
> +++ b/drivers/of/address.c
> @@ -100,6 +100,58 @@ static unsigned int of_bus_default_get_flags(const __be32 *addr)
> return IORESOURCE_MEM;
> }
>
> +static bool of_addr_is_equal(const __be32 *addr1, const __be32 *addr2, int na)
> +{
> + int i;
> + u32 a1;
> + u32 a2;
> +
> + for (i = 0; i < na; i++) {
> + a1 = be32_to_cpup(addr1 + i);
> + a2 = be32_to_cpup(addr2 + i);

You don't need to do the swap. Then isn't this just a memcmp?

> + if (a1 == a2)
> + continue;
> + return false;
> + }
> + return true;
> +}
> +
> +static u64 of_bus_default_flags_map(__be32 *addr, const __be32 *range, int na,
> + int ns, int pna)
> +{
> + u64 cp, s, da;
> + int extra = 0;
> +
> + if (na > 2) {

I believe this clause works for na==2. It wouldn't for na==1 though.
All you really need is:

if (na == 3)
extra = 1;

But at this point, don't we know we have a 3rd cell? We've only
matched if we do, right? If "extra" is always 1, then this function is
pretty much the same as the PCI and ISA versions after the flags
comparison.

> + /*
> + * Given address contains more than 2 cells.
> + * The address high extra part must match the range extra part
> + * and must be filtered-out from 64bit offset computation.
> + */
> + extra = na - 2;
> + if (!of_addr_is_equal(addr, range, extra))

A bit misleading as all you are doing is comparing the 3rd (top) cell.

> + return OF_BAD_ADDR;
> +
> + cp = of_read_number(range + extra, na - extra);
> + s = of_read_number(range + na + pna, ns);
> + da = of_read_number(addr + extra, na - extra);
> + } else {
> + cp = of_read_number(range, na);
> + s = of_read_number(range + na + pna, ns);
> + da = of_read_number(addr, na);
> + }
> + pr_debug("default flags map, extra=%d cp=%llx, s=%llx, da=%llx\n", extra, cp, s, da);
> +
> + if (da < cp || da >= (cp + s))
> + return OF_BAD_ADDR;
> + return da - cp;
> +}
> +
> +static int of_bus_default_flags_translate(__be32 *addr, u64 offset, int na)

Same as of_bus_pci_translate() and of_bus_isa_translate(). 3rd case
gets to rename and consolidate.

> +{
> + /* Keep "flags" part in translated address */
> + return of_bus_default_translate(addr + 1, offset, na - 1);
> +}
>
> #ifdef CONFIG_PCI
> static unsigned int of_bus_pci_get_flags(const __be32 *addr)
> @@ -374,8 +426,8 @@ static struct of_bus of_busses[] = {
> .addresses = "reg",
> .match = of_bus_default_flags_match,
> .count_cells = of_bus_default_count_cells,
> - .map = of_bus_default_map,
> - .translate = of_bus_default_translate,
> + .map = of_bus_default_flags_map,
> + .translate = of_bus_default_flags_translate,
> .has_flags = true,
> .get_flags = of_bus_default_flags_get_flags,
> },
> --
> 2.41.0
>