Re: [PATCH] spi: bcm2835: do not unregister controller in shutdown handler

From: Florian Fainelli
Date: Mon Oct 04 2021 - 12:36:45 EST


On 10/4/21 9:31 AM, Mark Brown wrote:
> On Mon, Oct 04, 2021 at 12:44:36PM -0300, Jason Gunthorpe wrote:
>> On Mon, Oct 04, 2021 at 03:12:20PM +0100, Mark Brown wrote:
>>> On Mon, Oct 04, 2021 at 10:17:56AM -0300, Jason Gunthorpe wrote:
>
>>>> When something like kexec happens we need the machine to be in a state
>>>> where random DMA's are not corrupting memory.
>
>>> That's all well and good but there's no point in implementing something
>>> half baked that's opening up a whole bunch of opportunities to crash the
>>> system if more work comes in after it's half broken the device setup.
>
>> Well, that is up to the driver implementing this. It looks like device
>> shutdown is called before the userspace is all nuked so yes,
>> concurrency with userspace is a possible concern here.
>
> It's not just userspace that can initiate things - interrupts are also
> an issue, someone could press a button or whatever. Frankly for SPI the
> quiescing part doesn't seem like logic that should be implemented in
> drivers, it's a subsystem level thing since there's nothing driver
> specific about it.

Surely the SPI subsystem can help avoid queuing new transfers towards
the SPI controller while the controller can shut down the resources that
only it knows about.

>
>>>> Due to the emergency sort of nature it is not appropriate to do
>>>> locking complicated sorts of things like struct device unregistrations
>>>> here.
>
>>> That's just not what's actually implemented in a bunch of places, nor
>>> something one would infer from the documentation ("Called at shut-down
>>> to quiesce the device", no mention of emergency cases which I'd guess
>>> would just be kdump) -
>
>> Drivers mis understanding stuff is not new..
>
> Not just drivers, entire subsystems. And like I say given the
> documentation I'd be hard pressed to say that it's a misunderstanding.
>
>>> that's a different thing and definitely abusing the API. I would guess
>>> that a good proportion of people implementing it are more worried about
>>> clean system shutdown than they are about kdump.
>
>> The other important case is to get the device cleaned up enough to
>> pass back to firmware for platforms that use a firmware
>> shutdown/reboot path.
>
> Right, so the other cases I'm aware of are doing pretty much that -
> bringing things down to a state where the system can reboot cleanly.
> That can definitely include things like blocking for some hardware, and
> you're going to need some concurrency handling which means a combination
> of locking and infrequently tested lockless code paths.
>
> In the case of this specific driver I'm still not clear that the best
> thing isn't just to delete the shutdown callback and let any ongoing
> transfers complete, though I guess there'd be issues in kexec cases with
> long enough tansfers.

No please don't, I should have arguably justified the reasons why
better, but the main reason is that one of the platforms on which this
driver is used has received extensive power management analysis and
changes, and shutting down every bit of hardware, including something as
small as a SPI controller, and its clock (and its PLL) helped meet
stringent power targets.

TBH, I still wonder why we have .shutdown() and we simply don't use
.remove() which would reduce the amount of work that people have to do
validate that the hardware is put in a low power state and would also
reduce the amount of burden on the various subsystems.
--
Florian