Re: [tip: timers/core] clocksource/drivers/timer-probe: Avoid creating dead devices

From: Saravana Kannan
Date: Thu Mar 26 2020 - 14:09:02 EST


On Thu, Mar 26, 2020 at 8:02 AM Rob Herring <robh+dt@xxxxxxxxxx> wrote:
>
> On Thu, Mar 26, 2020 at 4:34 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >
> > Saravana Kannan <saravanak@xxxxxxxxxx> writes:
> > > On Wed, Mar 25, 2020 at 2:47 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> > >
> > >> Saravana Kannan <saravanak@xxxxxxxxxx> writes:
> > >> > On Tue, Mar 24, 2020 at 11:34 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > >> > I took a closer look. So two different drivers [1] [2] are saying they
> > >> > know how to handle "arm,vexpress-sysreg" and are expecting to run at
> > >> > the same time. That seems a bit unusual to me. I wonder if this is a
> > >> > violation of the device-driver model because this expectation would
> > >> > never be allowed if these device drivers were actual drivers
> > >> > registered with driver-core. But that's a discussion for another time.
> > >> >
> > >> > To fix this issue you are facing, this patch should work:
> > >> > https://lore.kernel.org/lkml/20200324195302.203115-1-saravanak@xxxxxxxxxx/T/#u
> > >>
> > >> Sorry, that's not a fix. That's a crude hack.
> > >
> > > If device nodes are being handled by drivers without binding a driver
> > > to struct devices, then not setting OF_POPULATED is wrong. So the
> > > original patch sets it. There are also very valid reasons for allowing
> > > OF_POPULATED to be cleared by a driver as discussed here [1].
> > >
> > > The approach of the original patch (setting the flag and letting the
> > > driver sometimes clear it) is also followed by many other frameworks
> > > like irq, clk, i2c, etc. Even ingenic-timer.c already does it for the
> > > exact same reason.
> > >
> > > So, why is the vexpress fix a crude hack?
> >
> > If it's the right thing to do and accepted by the DT folks, then the
> > changelog should provide a proper explanation for it. The one you
> > provided just baffles me. Plus the clearing of the flag really needs a
> > big fat comment.
>
> IMO, commit 4f41fe386a946 should be reverted and be done with it.
> There's no way the timer core can know whether a specific node should
> be scanned or not. If you really want to avoid a struct device, then
> set OF_POPULATED in specific timer drivers. But I'd rather not see
> more places mucking with OF_POPULATED. It's really only bus code that
> should be touching it.

Since most drivers don't need the struct device, my patch sets the
flag in the timer core. And for the few exception cases where the
device is needed, we can clear the flag in the driver. That'll reduce
the number of places mucking with OF_POPULATED.

Does this seem okay to you?

> Is having a struct device really a problem? If we want to save memory
> usage, I have some ideas that would save much more than 1 or 2 struct
> devices.

Keep in mind that struct devices cost more than one struct device of
memory. They also create a bunch of sysfs nodes, uevents, etc.

> > It still does not make any sense to me.
> >
> > arm,vexpress-sysreg is a MFD device, so can the ARM people please
> > explain, why the sched clock part is not just another MFD sub-device or
> > simply has it's own DT match?
>
> The issue is DT nodes and Linux drivers aren't necessarily 1-1. That
> would be nice, but hardware is messy and DT doesn't abstract that
> away. If we tried to always make things 1-1, then if/when the Linux
> driver structure changes we'd have to change the DT.

I agree with this. I'm definitely not asking to create a node just
because we want another struct device.

> If we decided to
> add a node now, we'd still have to support the old DT for backwards
> compatibility.

Right, I agree it's too late to fix this DT for vexpress-sysreg now.

> We also have to consider the structure for another OS
> may be different.
>
> Generally, if I see a node with a compatible only it gets NAKed as
> that's a sure sign of someone just trying to bind a driver and not
> describing the h/w. We only do MFD sub-devices if those devices
> provide or consume other DT resources.

If we have a timer MFD, it'd technically consume a fixed-clock and not
be empty. vexpress-sysreg just assumes the clock is 24 MHz right now.

My point about the vexpress DT nodes being weird is not about Linux
devices, rather it's that:
1. It's already a MFD
2. Most of the functions are separated clearly into their functional
device nodes
3. However, the timer functionality is combined with the parent MFD
device when the parent MFD device implements a completely separate
function (gpio?). Why?

If one wants to split out the functions, do it fully. If one wants it
all under one mega driver (ugh!) then combine it all. Going halfway
causes these weird issues. That's my complaint about how the DT layout
is for this device.

Having said all that, I'd rather manipulate the OF_POPULATED flag than
break DT backward compatibility at this point.

But in general, I think we should try to reject two separate drivers
claiming to be compatible with the same device and expecting to work
at the same time. That's just weird.

-Saravana