Re: [PATCH v2] cpu/hotplug: Do not bail-out in DYING/STARTING sections

From: Vincent Donnefort
Date: Mon Jun 13 2022 - 11:55:34 EST


On Mon, Jun 13, 2022 at 02:36:18PM +0200, Thomas Gleixner wrote:
> Vincent,
>
> On Mon, May 23 2022 at 17:05, Vincent Donnefort wrote:
> > +static int _cpuhp_invoke_callback_range(bool bringup,
> > + unsigned int cpu,
> > + struct cpuhp_cpu_state *st,
> > + enum cpuhp_state target,
> > + bool nofail)
> > {
> > enum cpuhp_state state;
> > - int err = 0;
> > + int ret = 0;
> >
> > while (cpuhp_next_state(bringup, &state, st, target)) {
> > + int err;
> > +
> > err = cpuhp_invoke_callback(cpu, state, bringup, NULL, NULL);
> > - if (err)
> > + if (!err)
> > + continue;
> > +
> > + if (nofail) {
> > + pr_warn("CPU %u %s state %s (%d) failed (%d)\n",
> > + cpu, bringup ? "UP" : "DOWN",
> > + cpuhp_get_step(st->state)->name,
> > + st->state, err);
> > + ret = -1;
>
> I have a hard time to map this to the changelog:
>
> > those sections. In that case, there's nothing the hotplug machinery can do,
> > so let's just proceed and log the failures.
>
> That's still returning an error code at the end. Confused.

It is, but after returning from this function, only a warning will be raised
(cpuhp_invoke_callback_range_nofail()) instead of stopping the HP machinery
(cpuhp_invoke_callback_range()). How about this changelog?

The DYING/STARTING callbacks are not expected to fail. However, as reported by
Derek, drivers such as tboot are still free to return errors within those
sections, which halts the hot(un)plug and leaves the CPU in an unrecoverable
state.

No rollback being possible there, let's only log the failures and proceed
with the following steps. This restores the hotplug behaviour prior to
453e41085183 (cpu/hotplug: Add cpuhp_invoke_callback_range())

>
> Thanks,
>
> tglx