Re: [PATCH] driver: of: overlay: demote message to warning

From: Daniel Walker
Date: Mon Sep 12 2022 - 20:52:04 EST


On Mon, Sep 12, 2022 at 03:32:31PM -0500, Frank Rowand wrote:
> On 9/12/22 12:05, Daniel Walker wrote:
> > On Mon, Sep 12, 2022 at 01:45:40AM -0500, Frank Rowand wrote:
> >> On 9/8/22 12:55, Frank Rowand wrote:
> >>> On 9/7/22 19:35, Daniel Walker wrote:
> >>>> On Wed, Sep 07, 2022 at 06:54:02PM -0500, Frank Rowand wrote:
> >>>>> On 9/7/22 18:07, Daniel Walker wrote:
> >>>>>> This warning message shows by default on the vast majority of overlays
> >>>>>> applied. Despite the text identifying this as a warning it is marked
> >>>>>> with the loglevel for error. At Cisco we filter the loglevels to only
> >>>>>> show error messages. We end up seeing this message but it's not really
> >>>>>> an error.
> >>>>>>
> >>>>>> For this reason it makes sense to demote the message to the warning
> >>>>>> loglevel.
> >>>>>>
> >>>>>> Cc: xe-linux-external@xxxxxxxxx
> >>>>>> Signed-off-by: Daniel Walker <danielwa@xxxxxxxxx>
> >>>>>> ---
> >>>>>> drivers/of/overlay.c | 2 +-
> >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> >>>>>> index bd8ff4df723d..4ae276ed9a65 100644
> >>>>>> --- a/drivers/of/overlay.c
> >>>>>> +++ b/drivers/of/overlay.c
> >>>>>> @@ -358,7 +358,7 @@ static int add_changeset_property(struct overlay_changeset *ovcs,
> >>>>>> }
> >>>>>>
> >>>>>> if (!of_node_check_flag(target->np, OF_OVERLAY))
> >>>>>> - pr_err("WARNING: memory leak will occur if overlay removed, property: %pOF/%s\n",
> >>>>>> + pr_warn("WARNING: memory leak will occur if overlay removed, property: %pOF/%s\n",
> >>>>>> target->np, new_prop->name);
> >>>>>>
> >>>>>> if (ret) {
> >>>>>
> >>>>> NACK
> >>>>>
> >>>>> This is showing a real problem with the overlay.
> >>>>
> >>>> What's the real problem ?
> >>>>
> >>>> Daniel
> >>>
> >>> A memory leak when the overlay is removed.
> >>>
> >>> I'll send a patch to update the overlay file in Documumentation/devicetree/ to provide
> >>> more information about this. If you don't see a patch by tomorrow, feel free to
> >>> ping me.
> >>>
> >>> -Frank
> >>
> >> The good news is that your question prodded me to start improving the in kernel documentation
> >> of overlays. The promised patch is a rough start at:
> >>
> >> https://lore.kernel.org/all/20220912062615.3727029-1-frowand.list@xxxxxxxxx/
> >>
> >> The bad news is that what I wrote doesn't explain the memory leak in any more detail.
> >> If an overlay adds a property to a node in the base device tree then the memory
> >> allocated to do the add will not be freed when the overlay is removed. Since it is
> >> possible to add and remove overlays multiple times, the ensuing size of the memory
> >> leak is potentially unbounded.
> >
> > Isn't this only a problem if you remove the overlay?
>
> Yes, but we don't know if the overlay will be removed. And I will not accept a
> change that suppresses the message if there is no expectation to remove the
> overlay.

I haven't researched the whole overlay system but there was one removal function
that I noted, I think in the link you provided above, called
of_overlay_remove(). It appears to call free_overlay_changeset() which calls kfree().

so your API seems to deal with freeing the memory. I would think the expectation is that
people using the API would free the overlay thru your API.

The only in tree usage of your API (besides the unit test) was drm/rcar-du which
had no ability to remove the overlay that I can see. That component of the driver was
removed several months ago.

> >
> > if the dt fixup driver does have the ability to remove the overlay doesn't it
> > have responsibility to free the memory? Or is it impossible to free the memory?
>
> It is difficult due to architectural issues. Reference counting occurs at the node
> level, and not at the property level. So memory related to properties is freed
> when the corresponding overlay node reference count leads to the node being freed.

How does of_overlay_remove() work then? It seems like it might not be possible
to do overlay removal, but your code has removal functions. I also see this one
of_overlay_remove_all() ..

It seems like your API supports removal. Is there an issue where your API is
maybe not complete or maybe doesn't currentily work ?

Maybe you could add a flag or other indicator which would indicate the overlay will never be
removed. Then your code could rely on this property to inform on if the author
has consider the removal issues related to overlays.

Daniel