Re: [PATCH 1/2] devcoredump: Remove devcoredump device if failing device is gone

From: Johannes Berg
Date: Mon Jan 29 2024 - 16:51:54 EST


On Mon, 2024-01-29 at 16:29 -0500, Rodrigo Vivi wrote:
> >
> > > On top of that, for PCI devices, the unbind of the device will
> > > call the pci .remove void function, that cannot fail. At that
> > > time, our device is pretty much gone, but the read and free
> > > functions are alive trough the devcoredump device and they
> > ^ through, I guess
> >
> > > can get some NULL dereferences or use after free.
> >
> > Not sure I understand this part, how's this related to PCI's .remove?
>
> Well, this is my secondary concern that the idea of the link_auto_removal
> doesn't cover.
>
> If the failing_device is gone, the 'data cookie' it used to register with
> dev_coredumpm(... void *data,...), is also likely gone on a clean removal.

That's on the user. You'll always be able to shoot yourself in the foot.

> And to be honest, we shouldn't even count that the registered *read()
> function pointer is valid anymore.

That's not true: the module cannot be removed, there's a reference to it
if you're using dev_coredumpm() correctly (which is to say: pass
THIS_MODULE to the struct module *owner argument).

> Well, we could indeed. And that would unblock our CI, but I'm afraid
> it wouldn't protect the final user from bad memory access on a direct
> $ cat /sys/class/devcoredump/devcd<n>/data
>
> Shouldn't we consider this critical itself to justify this entirely
> removal?

No? IMHO that's totally on the user. If you absolutely cannot make a
standalone dump 'data' pointer (why not?! you can always stick the
actual data into a vmalloc chunk and use dev_coredumpv()?) then maybe we
can offer ways of removing it when you need to? But I'd rather not, it
feels weird to have a need for it.

johannes