Re: [PATCH v2 1/3] nvme-hwmon: Return error on kzalloc failure

From: Serge Semin
Date: Tue Oct 04 2022 - 10:51:00 EST


On Fri, Sep 30, 2022 at 08:57:02AM -0600, Keith Busch wrote:
> On Fri, Sep 30, 2022 at 12:52:47PM +0300, Serge Semin wrote:
> > On Thu, Sep 29, 2022 at 05:53:43PM -0600, Keith Busch wrote:
> > > On Fri, Sep 30, 2022 at 01:46:46AM +0300, Serge Semin wrote:
> > > > Inability to allocate a buffer is a critical error which shouldn't be
> > > > tolerated since most likely the rest of the driver won't work correctly.
> > > > Thus instead of returning the zero status let's return the -ENOMEM error
> > > > if the nvme_hwmon_data structure instance couldn't be created.
> > >
> >
> > > Nak for this one. The hwmon is not necessary for the rest of the driver to
> > > function, so having the driver detach from the device seems a bit harsh.
> >
> > Even if it is as you say, neither the method semantic nor the way it's
> > called imply that. Any failures except the allocation one are
> > perceived as erroneous.
>

> This is called by nvme_init_ctrl_finish(), and returns the error to
> nvme_reset_work() only if it's < 0, which indicates we can't go on and the
> driver unbinds.

That's obvious. One of the my question was that what makes the no
memory error different from the rest of the errors causing the
nvme_hwmon_init() method to fail?

>
> This particular condition for hwmon is not something that prevents us from
> making forward progress.

If you consider the hwmon functionality as optional (AFAIU you are),
then just ignore the return value no matter the reason. If the problem
caused the hwmon initialization process to fail turns to be critical
it will be raised in some other place which is required for the NVME
driver to work properly. Otherwise the hwmon module initialization may
still cause the probe procedure to halt, which makes it not optional.
That's what I meant when was saying about "the function and its
caller semantics not implying that".

>
> > > The
> > > driver can participate in memory reclaim, so failing on a low memory condition
> > > can make matters worse.
> >
> > Yes it can, so can many other places in the driver utilizing kmalloc
> > with just GFP_KERNEL flag passed including on the same path as the
> > nvme_hwmon_init() execution. Kmalloc will make sure the reclaim is
> > either finished or executed in background anyway in all cases.
>
> This path is in the first initialization before we've set up a namespace that
> can be used as a reclaim destination.
>
> > Don't
> > really see why memory allocation failure is less worse in this case
> > than in many others in the same driver especially seeing as I said
>
> The other initialization kmalloc's are required to make forward progress toward
> setting up a namespace. This one is not required.

Anyway what you say seems still contradicting. First you said that the
hwmon functionality was optional, but the only error being ignored was
the no-memory one which was very rare and turned to be not ignored in
the most of the other places. Second you got to accept the second
patch of the series, which introduced a one more kmalloc followed
right after the first one in the same function nvme_hwmon_init(). That
kmalloc failure wasn't ignored but caused the nvme_hwmon_init()
function to return an error. If you suggest to forget about the first
part (which IMO still counts, but AFAICS is a common pattern in the
NVME core driver, i.e. nvme_configure_apst() and
nvme_configure_host_options()), the second part still applies.

-Sergey