RE: [PATCH V2] PCI/DOE: Detect on stack work items automatically

From: Dan Williams
Date: Fri Nov 18 2022 - 13:14:51 EST


David Laight wrote:
> From: ira.weiny@xxxxxxxxx
> > Sent: 18 November 2022 00:05
> >
> > Work item initialization needs to be done with either
> > INIT_WORK_ONSTACK() or INIT_WORK() depending on how the work item is
> > allocated.
> >
> > The callers of pci_doe_submit_task() allocate struct pci_doe_task on the
> > stack and pci_doe_submit_task() incorrectly used INIT_WORK().
> >
> > Jonathan suggested creating doe task allocation macros such as
> > DECLARE_CDAT_DOE_TASK_ONSTACK().[1] The issue with this is the work
> > function is not known to the callers and must be initialized correctly.
> >
> > A follow up suggestion was to have an internal 'pci_doe_work' item
> > allocated by pci_doe_submit_task().[2] This requires an allocation which
> > could restrict the context where tasks are used.
> >
> > Another idea was to have an intermediate step to initialize the task
> > struct with a new call.[3] This added a lot of complexity.
> >
> > Lukas pointed out that object_is_on_stack() is available to detect this
> > automatically.
> >
> > Use object_is_on_stack() to determine the correct init work function to
> > call.
>
> This is all a bit strange.
> The 'onstack' flag is needed for the diagnostic check:
> is_on_stack = object_is_on_stack(addr);
> if (is_on_stack == onstack)
> return;
> pr_warn(...);
> WARN_ON(1);
>
> So setting the flag to the location of the buffer just subverts the check.
> It that is sane there ought to be a proper way to do it.
>
> OTOH using an on-stack structure for INIT_WORK seems rather strange.
> Since the kernel thread must sleep waiting for the 'work' to complete
> why not just perform the required code there.

To have the option to support both async and sync flows through this
driver interface. It is similar to the internal distinction between:

submit_bio_wait()

...and:

submit_bio()

Where the former just layers an on on-stack completion over the
asynchronous submit_bio().

> Also you really don't want to OOPS with anything from the stack
> linked into global kernel data structures.
> While wait queues are pretty limited in scope and probably ok,
> this looks like a big accident waiting to happen.

I do not see the cause for alarm, this sync-wait design pattern is not
new.