Re: [PATCH v1] clk: Convert managed get functions to devm_add_action API

From: Robin Murphy
Date: Mon Dec 02 2019 - 08:51:13 EST


On 02/12/2019 9:25 am, Marc Gonzalez wrote:
On 02/12/2019 02:42, Dmitry Torokhov wrote:

On Thu, Nov 28, 2019 at 10:56:30AM -0800, Bjorn Andersson wrote:

On Tue 26 Nov 08:13 PST 2019, Marc Gonzalez wrote:

Date: Tue, 26 Nov 2019 13:56:53 +0100

Using devm_add_action_or_reset() produces simpler code and smaller
object size:

1 file changed, 16 insertions(+), 46 deletions(-)

text data bss dec hex filename
- 1797 80 0 1877 755 drivers/clk/clk-devres.o
+ 1499 56 0 1555 613 drivers/clk/clk-devres.o

Signed-off-by: Marc Gonzalez <marc.w.gonzalez@xxxxxxx>

Looks neat

Reviewed-by: Bjorn Andersson <bjorn.andersson@xxxxxxxxxx>

This however increases the runtime costs as each custom action cost us
an extra pointer. Given that in a system we likely have many clocks
managed by devres, I am not sure that this code savings is actually
gives us overall win. It might still, I just want to understand how we
are allocating/packing devres structures.

I'm not 100% sure what you are saying.

You reduce the text size by a constant amount, at the cost of allocating twice as much runtime data per clock (struct action_devres vs. void*). Assuming 64-bit pointers, that means that in principle your ~320-byte saving would be cancelled out at ~40 managed clocks. However, that's also assuming that the minimum allocation granularity is no larger than a single pointer, which generally isn't true, so in reality it depends on whether the difference in data pushes the total struct devres allocation over the next ARCH_KMALLOC_MINALIGN boundary - if it doesn't, the difference comes entirely for free; if it does, the memory cost tradeoff gets even worse.

Robin.

Are you arguing that the proposed patch increases the run-time cost of
devm_clk_put() so much that the listed improvements (simpler source code,
smaller object size) are not worth it?

AFAIU, the release action is only called
- explicitly, when devm_clk_put() is called
- implicitly, when the device is removed

How often are clocks removed?

In hot code-path (called hundreds of times per second) it makes sense to
write more complex code, to shave a few cycles every iteration. But in
cold code-path, I think it's better to write short/simple code.

Regards.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel