Re: [PATCH RFC v2 11/27] arm64: mte: Reserve tag storage memory

From: Rob Herring
Date: Tue Dec 12 2023 - 13:44:35 EST


On Tue, Dec 12, 2023 at 10:38 AM Alexandru Elisei
<alexandru.elisei@xxxxxxx> wrote:
>
> Hi Rob,
>
> Thank you so much for the feedback, I'm not very familiar with device tree,
> and any comments are very useful.
>
> On Mon, Dec 11, 2023 at 11:29:40AM -0600, Rob Herring wrote:
> > On Sun, Nov 19, 2023 at 10:59 AM Alexandru Elisei
> > <alexandru.elisei@xxxxxxx> wrote:
> > >
> > > Allow the kernel to get the size and location of the MTE tag storage
> > > regions from the DTB. This memory is marked as reserved for now.
> > >
> > > The DTB node for the tag storage region is defined as:
> > >
> > > tags0: tag-storage@8f8000000 {
> > > compatible = "arm,mte-tag-storage";
> > > reg = <0x08 0xf8000000 0x00 0x4000000>;
> > > block-size = <0x1000>;
> > > memory = <&memory0>; // Associated tagged memory node
> > > };
> >
> > I skimmed thru the discussion some. If this memory range is within
> > main RAM, then it definitely belongs in /reserved-memory.
>
> Ok, will do that.
>
> If you don't mind, why do you say that it definitely belongs in
> reserved-memory? I'm not trying to argue otherwise, I'm curious about the
> motivation.

Simply so that /memory nodes describe all possible memory and
/reserved-memory is just adding restrictions. It's also because
/reserved-memory is what gets handled early, and we don't need
multiple things to handle early.

> Tag storage is not DMA and can live anywhere in memory.

Then why put it in DT at all? The only reason CMA is there is to set
the size. It's not even clear to me we need CMA in DT either. The
reasoning long ago was the kernel didn't do a good job of moving and
reclaiming contiguous space, but that's supposed to be better now (and
most h/w figured out they need IOMMUs).

But for tag storage you know the size as it is a function of the
memory size, right? After all, you are validating the size is correct.
I guess there is still the aspect of whether you want enable MTE or
not which could be done in a variety of ways.

> In
> arm64_memblock_init(), the kernel first removes the memory that it cannot
> address from memblock. For example, because it has been compiled with
> CONFIG_ARM64_VA_BITS_39=y. And then calls
> early_init_fdt_scan_reserved_mem().
>
> What happens if reserved memory is above what the kernel can address?

I would hope the kernel handles it. That's the kernel's problem unless
there's some h/w limitation to access some region. The DT can't have
things dependent on the kernel config.

> From my testing, when the kernel is compiled with 39 bit VA, if I use
> reserved memory to discover tag storage the lives above the virtua address
> limit and then I try to use CMA to manage the tag storage memory, I get a
> kernel panic:

Looks like we should handle that better...

>> [ 0.000000] Reserved memory: created CMA memory pool at 0x0000010000000000, size 64 MiB
> [ 0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
> [ 0.000000] OF: reserved mem: 0x0000010000000000..0x0000010003ffffff (65536 KiB) map reusable linux,cma
> [..]
> [ 0.806945] Unable to handle kernel paging request at virtual address 00000001fe000000
> [ 0.807277] Mem abort info:
> [ 0.807277] ESR = 0x0000000096000005
> [ 0.807693] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 0.808110] SET = 0, FnV = 0
> [ 0.808443] EA = 0, S1PTW = 0
> [ 0.808526] FSC = 0x05: level 1 translation fault
> [ 0.808943] Data abort info:
> [ 0.808943] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> [ 0.809360] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [ 0.809776] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [ 0.810221] [00000001fe000000] user address but active_mm is swapper
> [..]
> [ 0.820887] Call trace:
> [ 0.821027] cma_init_reserved_areas+0xc4/0x378
>
> >
> > You need a binding for this too.
>
> By binding you mean having an yaml file in dt-schem [1] describing the tag
> storage node, right?

Yes, but in the kernel tree is fine.

[...]

> > > +static int __init tag_storage_of_flat_get_range(unsigned long node, const __be32 *reg,
> > > + int reg_len, struct range *range)
> > > +{
> > > + int addr_cells = dt_root_addr_cells;
> > > + int size_cells = dt_root_size_cells;
> > > + u64 size;
> > > +
> > > + if (reg_len / 4 > addr_cells + size_cells)
> > > + return -EINVAL;
> > > +
> > > + range->start = PHYS_PFN(of_read_number(reg, addr_cells));
> > > + size = PHYS_PFN(of_read_number(reg + addr_cells, size_cells));
> > > + if (size == 0) {
> > > + pr_err("Invalid node");
> > > + return -EINVAL;
> > > + }
> > > + range->end = range->start + size - 1;
> >
> > We have a function to read (and translate which you forgot) addresses.
> > Add what's missing rather than open code your own.
>
> I must have missed that there's already a function to read addresses. Would
> you mind pointing me in the right direction?

drivers/of/fdt_address.c

Though it doesn't provide getting the size, so that will have to be added.


> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int __init tag_storage_of_flat_get_tag_range(unsigned long node,
> > > + struct range *tag_range)
> > > +{
> > > + const __be32 *reg;
> > > + int reg_len;
> > > +
> > > + reg = of_get_flat_dt_prop(node, "reg", &reg_len);
> > > + if (reg == NULL) {
> > > + pr_err("Invalid metadata node");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + return tag_storage_of_flat_get_range(node, reg, reg_len, tag_range);
> > > +}
> > > +
> > > +static int __init tag_storage_of_flat_get_memory_range(unsigned long node, struct range *mem)
> > > +{
> > > + const __be32 *reg;
> > > + int reg_len;
> > > +
> > > + reg = of_get_flat_dt_prop(node, "linux,usable-memory", &reg_len);
> > > + if (reg == NULL)
> > > + reg = of_get_flat_dt_prop(node, "reg", &reg_len);
> > > +
> > > + if (reg == NULL) {
> > > + pr_err("Invalid memory node");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + return tag_storage_of_flat_get_range(node, reg, reg_len, mem);
> > > +}
> > > +
> > > +struct find_memory_node_arg {
> > > + unsigned long node;
> > > + u32 phandle;
> > > +};
> > > +
> > > +static int __init fdt_find_memory_node(unsigned long node, const char *uname,
> > > + int depth, void *data)
> > > +{
> > > + const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> > > + struct find_memory_node_arg *arg = data;
> > > +
> > > + if (depth != 1 || !type || strcmp(type, "memory") != 0)
> > > + return 0;
> > > +
> > > + if (of_get_flat_dt_phandle(node) == arg->phandle) {
> > > + arg->node = node;
> > > + return 1;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int __init tag_storage_get_memory_node(unsigned long tag_node, unsigned long *mem_node)
> > > +{
> > > + struct find_memory_node_arg arg = { 0 };
> > > + const __be32 *memory_prop;
> > > + u32 mem_phandle;
> > > + int ret, reg_len;
> > > +
> > > + memory_prop = of_get_flat_dt_prop(tag_node, "memory", &reg_len);
> > > + if (!memory_prop) {
> > > + pr_err("Missing 'memory' property in the tag storage node");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + mem_phandle = be32_to_cpup(memory_prop);
> > > + arg.phandle = mem_phandle;
> > > +
> > > + ret = of_scan_flat_dt(fdt_find_memory_node, &arg);
> >
> > Do not use of_scan_flat_dt. It is a relic predating libfdt which can
> > get a node by phandle directly.
>
> I used that because that's what drivers/of/fdt.c uses. With reserved memory
> I shouldn't need it, because struct reserved_mem already includes a
> phandle.

Check again. Only some arch/ code (mostly powerpc) uses it. I've
killed off most of it.


> > > + if (ret != 1) {
> > > + pr_err("Associated memory node not found");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + *mem_node = arg.node;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int __init tag_storage_of_flat_read_u32(unsigned long node, const char *propname,
> > > + u32 *retval)
> >
> > If you are going to make a generic function, make it for everyone.
>
> Sure. If I still need it, should I put the function in
> include/linux/of_fdt.h?

Yes.

> > > +{
> > > + const __be32 *reg;
> > > +
> > > + reg = of_get_flat_dt_prop(node, propname, NULL);
> > > + if (!reg)
> > > + return -EINVAL;
> > > +
> > > + *retval = be32_to_cpup(reg);
> > > + return 0;
> > > +}
> > > +
> > > +static u32 __init get_block_size_pages(u32 block_size_bytes)
> > > +{
> > > + u32 a = PAGE_SIZE;
> > > + u32 b = block_size_bytes;
> > > + u32 r;
> > > +
> > > + /* Find greatest common divisor using the Euclidian algorithm. */
> > > + do {
> > > + r = a % b;
> > > + a = b;
> > > + b = r;
> > > + } while (b != 0);
> > > +
> > > + return PHYS_PFN(PAGE_SIZE * block_size_bytes / a);
> > > +}
> > > +
> > > +static int __init fdt_init_tag_storage(unsigned long node, const char *uname,
> > > + int depth, void *data)
> > > +{
> > > + struct tag_region *region;
> > > + unsigned long mem_node;
> > > + struct range *mem_range;
> > > + struct range *tag_range;
> > > + u32 block_size_bytes;
> > > + u32 nid = 0;
> > > + int ret;
> > > +
> > > + if (depth != 1 || !strstr(uname, "tag-storage"))
> > > + return 0;
> > > +
> > > + if (!of_flat_dt_is_compatible(node, "arm,mte-tag-storage"))
> > > + return 0;
> > > +
> > > + if (num_tag_regions == MAX_TAG_REGIONS) {
> > > + pr_err("Maximum number of tag storage regions exceeded");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + region = &tag_regions[num_tag_regions];
> > > + mem_range = &region->mem_range;
> > > + tag_range = &region->tag_range;
> > > +
> > > + ret = tag_storage_of_flat_get_tag_range(node, tag_range);
> > > + if (ret) {
> > > + pr_err("Invalid tag storage node");
> > > + return ret;
> > > + }
> > > +
> > > + ret = tag_storage_get_memory_node(node, &mem_node);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + ret = tag_storage_of_flat_get_memory_range(mem_node, mem_range);
> > > + if (ret) {
> > > + pr_err("Invalid address for associated data memory node");
> > > + return ret;
> > > + }
> > > +
> > > + /* The tag region must exactly match the corresponding memory. */
> > > + if (range_len(tag_range) * 32 != range_len(mem_range)) {
> > > + pr_err("Tag storage region 0x%llx-0x%llx does not cover the memory region 0x%llx-0x%llx",
> > > + PFN_PHYS(tag_range->start), PFN_PHYS(tag_range->end),
> > > + PFN_PHYS(mem_range->start), PFN_PHYS(mem_range->end));
> > > + return -EINVAL;
> > > + }
> > > +
> > > + ret = tag_storage_of_flat_read_u32(node, "block-size", &block_size_bytes);
> > > + if (ret || block_size_bytes == 0) {
> > > + pr_err("Invalid or missing 'block-size' property");
> > > + return -EINVAL;
> > > + }
> > > + region->block_size = get_block_size_pages(block_size_bytes);
> > > + if (range_len(tag_range) % region->block_size != 0) {
> > > + pr_err("Tag storage region size 0x%llx is not a multiple of block size %u",
> > > + PFN_PHYS(range_len(tag_range)), region->block_size);
> > > + return -EINVAL;
> > > + }
> > > +
> > > + ret = tag_storage_of_flat_read_u32(mem_node, "numa-node-id", &nid);
> >
> > I was going to say we already have a way to associate memory nodes
> > other nodes using "numa-node-id", so the "memory" phandle property is
> > somewhat redundant. Maybe the tag node should have a numa-node-id.
> > With that, it looks like you don't even need to access the /memory
> > node. Avoiding that would be good for 2 reasons. It avoids parsing
> > memory nodes twice and it's not the kernel's job to validate the DT.
> > Really, if you want memory info, you should use memblock to get it
> > because all the special cases of memory layout are handled. For
> > example you can have memory nodes with multiple 'reg' entries or
> > multiple memory nodes or both, and then some of those could be
> > contiguous.
>
> I need to have a memory node associated with the tag storage node because
> there is a static relationship between a page from "normal" memory and its
> associated tag storage. If the code doesn't know that the memory region
> A..B has the corresponding tag storage in the region X..Y, then it doesn't
> know which tag storage to reserve when a page is allocated as tagged.
>
> In the example above, assuming that page P is allocated as tagged, the
> corresponding tag storage page that needs to be reserved is:
>
> tag_storage_pfn = (page_to_pfn(P) - PHYS_PFN(A)) / 32* + PHYS_PFN(X)
>
> numa-node-id is not enough for this, because as far I know you can have
> multiple memory regions withing the same numa node.

Okay.

Rob