Re: [v1 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM

From: Dan Williams
Date: Sat Apr 20 2019 - 17:02:26 EST


On Sat, Apr 20, 2019 at 10:02 AM Pavel Tatashin
<pasha.tatashin@xxxxxxxxxx> wrote:
>
> > > Thank you for looking at this. Are you saying, that if drv.remove()
> > > returns a failure it is simply ignored, and unbind proceeds?
> >
> > Yeah, that's the problem. I've looked at making unbind able to fail,
> > but that can lead to general bad behavior in device-drivers. I.e. why
> > spend time unwinding allocated resources when the driver can simply
> > fail unbind? About the best a driver can do is make unbind wait on
> > some event, but any return results in device-unbind.
>
> Hm, just tested, and it is indeed so.
>
> I see the following options:
>
> 1. Move hot remove code to some other interface, that can fail. Not
> sure what that would be, but outside of unbind/remove_id. Any
> suggestion?
> 2. Option two is don't attept to offline memory in unbind. Do
> hot-remove memory in unbind if every section is already offlined.
> Basically, do a walk through memblocks, and if every section is
> offlined, also do the cleanup.

I think something like option-2 could work just as long as the user is
ok with failure and prepared to handle it. It's already the case that
the request_region() in kmem permanently prevents the memory range
from being reused by any other driver. So if the hot-unplug fails it
could skip the corresponding release_region() and effectively it's the
same as what we have now in terms of reuse protection. In your flow if
the memory remove failed then the conversion attempt from devdax to
raw mode would also fail and presumably you could fall back to doing a
full reboot / rebuild of the application state?