Re: [PATCH RFC v2 11/27] arm64: mte: Reserve tag storage memory

From: Rob Herring
Date: Thu Dec 14 2023 - 13:55:34 EST


On Thu, Dec 14, 2023 at 9:45 AM Alexandru Elisei
<alexandru.elisei@xxxxxxx> wrote:
>
> Hi,
>
> On Wed, Dec 13, 2023 at 02:30:42PM -0600, Rob Herring wrote:
> > On Wed, Dec 13, 2023 at 11:44 AM Alexandru Elisei
> > <alexandru.elisei@xxxxxxx> wrote:
> > >
> > > On Wed, Dec 13, 2023 at 11:22:17AM -0600, Rob Herring wrote:
> > > > On Wed, Dec 13, 2023 at 8:51 AM Alexandru Elisei
> > > > <alexandru.elisei@xxxxxxx> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Wed, Dec 13, 2023 at 08:06:44AM -0600, Rob Herring wrote:
> > > > > > On Wed, Dec 13, 2023 at 7:05 AM Alexandru Elisei
> > > > > > <alexandru.elisei@xxxxxxx> wrote:
> > > > > > >
> > > > > > > Hi Rob,
> > > > > > >
> > > > > > > On Tue, Dec 12, 2023 at 12:44:06PM -0600, Rob Herring wrote:
> > > > > > > > On Tue, Dec 12, 2023 at 10:38 AM Alexandru Elisei
> > > > > > > > <alexandru.elisei@xxxxxxx> wrote:
> > > > > > > > >
> > > > > > > > > Hi Rob,
> > > > > > > > >
> > > > > > > > > Thank you so much for the feedback, I'm not very familiar with device tree,
> > > > > > > > > and any comments are very useful.
> > > > > > > > >
> > > > > > > > > On Mon, Dec 11, 2023 at 11:29:40AM -0600, Rob Herring wrote:
> > > > > > > > > > On Sun, Nov 19, 2023 at 10:59 AM Alexandru Elisei
> > > > > > > > > > <alexandru.elisei@xxxxxxx> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Allow the kernel to get the size and location of the MTE tag storage
> > > > > > > > > > > regions from the DTB. This memory is marked as reserved for now.
> > > > > > > > > > >
> > > > > > > > > > > The DTB node for the tag storage region is defined as:
> > > > > > > > > > >
> > > > > > > > > > > tags0: tag-storage@8f8000000 {
> > > > > > > > > > > compatible = "arm,mte-tag-storage";
> > > > > > > > > > > reg = <0x08 0xf8000000 0x00 0x4000000>;
> > > > > > > > > > > block-size = <0x1000>;
> > > > > > > > > > > memory = <&memory0>; // Associated tagged memory node
> > > > > > > > > > > };
> > > > > > > > > >
> > > > > > > > > > I skimmed thru the discussion some. If this memory range is within
> > > > > > > > > > main RAM, then it definitely belongs in /reserved-memory.
> > > > > > > > >
> > > > > > > > > Ok, will do that.
> > > > > > > > >
> > > > > > > > > If you don't mind, why do you say that it definitely belongs in
> > > > > > > > > reserved-memory? I'm not trying to argue otherwise, I'm curious about the
> > > > > > > > > motivation.
> > > > > > > >
> > > > > > > > Simply so that /memory nodes describe all possible memory and
> > > > > > > > /reserved-memory is just adding restrictions. It's also because
> > > > > > > > /reserved-memory is what gets handled early, and we don't need
> > > > > > > > multiple things to handle early.
> > > > > > > >
> > > > > > > > > Tag storage is not DMA and can live anywhere in memory.
> > > > > > > >
> > > > > > > > Then why put it in DT at all? The only reason CMA is there is to set
> > > > > > > > the size. It's not even clear to me we need CMA in DT either. The
> > > > > > > > reasoning long ago was the kernel didn't do a good job of moving and
> > > > > > > > reclaiming contiguous space, but that's supposed to be better now (and
> > > > > > > > most h/w figured out they need IOMMUs).
> > > > > > > >
> > > > > > > > But for tag storage you know the size as it is a function of the
> > > > > > > > memory size, right? After all, you are validating the size is correct.
> > > > > > > > I guess there is still the aspect of whether you want enable MTE or
> > > > > > > > not which could be done in a variety of ways.
> > > > > > >
> > > > > > > Oh, sorry, my bad, I should have been clearer about this. I don't want to
> > > > > > > put it in the DT as a "linux,cma" node. But I want it to be managed by CMA.
> > > > > >
> > > > > > Yes, I understand, but my point remains. Why do you need this in DT?
> > > > > > If the location doesn't matter and you can calculate the size from the
> > > > > > memory size, what else is there to add to the DT?
> > > > >
> > > > > I am afraid there has been a misunderstanding. What do you mean by
> > > > > "location doesn't matter"?
> > > >
> > > > You said:
> > > > > Tag storage is not DMA and can live anywhere in memory.
> > > >
> > > > Which I took as the kernel can figure out where to put it. But maybe
> > > > you meant the h/w platform can hard code it to be anywhere in memory?
> > > > If so, then yes, DT is needed.
> > >
> > > Ah, I see, sorry for not being clear enough, you are correct: tag storage
> > > is a hardware property, and software needs a mechanism (in this case, the
> > > dt) to discover its properties.
> > >
> > > >
> > > > > At the very least, Linux needs to know the address and size of a memory
> > > > > region to use it. The series is about using the tag storage memory for
> > > > > data. Tag storage cannot be described as a regular memory node because it
> > > > > cannot be tagged (and normal memory can).
> > > >
> > > > If the tag storage lives in the middle of memory, then it would be
> > > > described in the memory node, but removed by being in reserved-memory
> > > > node.
> > >
> > > I don't follow. Would you mind going into more details?
> >
> > It goes back to what I said earlier about /memory nodes describing all
> > the memory. There's no reason to reserve memory if you haven't
> > described that range as memory to begin with. One could presumably
> > just have a memory node for each contiguous chunk and not need
> > /reserved-memory (ignoring the need to say what things are reserved
> > for). That would become very difficult to adjust. Note that the kernel
> > has a hardcoded limit of 64 reserved regions currently and that is not
> > enough for some people. Seems like a lot, but I have no idea how they
> > are (ab)using /reserved-memory.
>
> Ah, I see what you mean, reserved memory is about marking existing memory
> (from a /memory node) as special, not about adding new memory.
>
> After the memblock allocator is initialized, the kernel can use it for its
> own allocations. Kernel allocations are not movable.
>
> When a page is allocated as tagged, the associated tag storage cannot be
> used for data, otherwise the tags would corrupt that data. To avoid this,
> the requirement is that tag storage pages are only used for movable
> allocations. When a page is allocated as tagged, the data in the associated
> tag storage is migrated and the tag storage is taken from the page
> allocator (via alloc_contig_range()).
>
> My understanding is that the memblock allocator can use all the memory from
> a /memory node. If the tags storage memory is declared in a /memory node,
> there exists the possibility that Linux will use tag storage memory for its
> own allocation, which would make that tags storage memory unmovable, and
> thus unusable for storing tags.

No, because the tag storage would be reserved in /reserved-memory.

Of course, the arch code could do something between scanning /memory
nodes and /reserved-memory, but that would be broken arch code.
Ideally, there wouldn't be any arch code in between those 2 points,
but it's complicated. It used to mainly be powerpc, but we keep adding
to the complexity on arm64.

> Looking at early_init_dt_scan_memory(), even if a /memory node if marked at
> hotpluggable, memblock will still use it, unless "movable_node" is set on
> the kernel command line.
>
> That's the reason why I'm not describing tag storage in a /memory node. Is
> there way to tell the memblock allocator not to use memory from a /memory
> node?
>
> >
> > Let me give an example. Presumably using MTE at all is configurable.
> > If you boot a kernel with MTE disabled (or older and not supporting
> > it), then I'd assume you'd want to use the tag storage for regular
> > memory. Well, If tag storage is already part of /memory, then all you
> > have to do is ignore the tag reserved-memory region. Tweaking the
> > memory nodes would be more work.
>
> Right now, memory is added via memblock_reserve(), and if MTE is disabled
> (for example, via the kernel command line), the code calls
> free_reserved_page() for each tag storage page. I find that straightfoward
> to implement.

But better to just not reserve the region in the first place. Also, it
needs to be simple enough to back port.

Also, does free_reserved_page() work on ranges outside of memblock
range (e.g. beyond end_of_DRAM())? If the tag storage happened to live
at the end of DRAM and you shorten the /memory node size to remove tag
storage, is it still going to work?

Rob