Re: [PATCH] dmaengine: idxd: Set workqueue state to disabled before trying to re-enable

From: Dave Jiang
Date: Wed Aug 24 2022 - 18:20:00 EST



On 8/24/2022 3:07 PM, Jerry Snitselaar wrote:
On Wed, 2022-08-24 at 14:59 -0700, Dave Jiang wrote:
On 8/24/2022 2:16 PM, Jerry Snitselaar wrote:
On Wed, Aug 24, 2022 at 01:29:03PM -0700, Dave Jiang wrote:
On 8/24/2022 12:29 PM, Jerry Snitselaar wrote:
For a software reset idxd_device_reinit() is called, which will
walk
the device workqueues to see which ones were enabled, and try
to
re-enable them. It keys off wq->state being iDXD_WQ_ENABLED,
but the
first thing idxd_enable_wq() will do is see that the state of
the
workqueue is enabled, and return 0 instead of attempting to
issue
a command to enable the workqueue.

So once a workqueue is found that needs to be re-enabled,
set the state to disabled prior to calling idxd_enable_wq().
This would accurately reflect the state if the enable fails
as well.

Cc: Fenghua Yu <fenghua.yu@xxxxxxxxx>
Cc: Dave Jiang <dave.jiang@xxxxxxxxx>
Cc: Vinod Koul <vkoul@xxxxxxxxxx>
Cc: dmaengine@xxxxxxxxxxxxxxx
Fixes: bfe1d56091c1 ("dmaengine: idxd: Init and probe for Intel
data accelerators")
Signed-off-by: Jerry Snitselaar <jsnitsel@xxxxxxxxxx>
---
   drivers/dma/idxd/irq.c | 1 +
   1 file changed, 1 insertion(+)

diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c
index 743ead5ebc57..723eeb5328d6 100644
--- a/drivers/dma/idxd/irq.c
+++ b/drivers/dma/idxd/irq.c
@@ -52,6 +52,7 @@ static void idxd_device_reinit(struct
work_struct *work)
                struct idxd_wq *wq = idxd->wqs[i];
                if (wq->state == IDXD_WQ_ENABLED) {
+                       wq->state = IDXD_WQ_DISABLED;
Might be better off to insert this line in
idxd_wq_disable_cleanup(). I
think that should put it in sane state.
I don't think that is called in the code path that I was lookng at.
I've been
looking at this bit of process_misc_interrupts():

halt:
        gensts.bits = ioread32(idxd->reg_base +
IDXD_GENSTATS_OFFSET);
        if (gensts.state == IDXD_DEVICE_STATE_HALT) {
                idxd->state = IDXD_DEV_HALTED;
                if (gensts.reset_type ==
IDXD_DEVICE_RESET_SOFTWARE) {
                        /*
                         * If we need a software reset, we will
throw the work
                         * on a system workqueue in order to allow
interrupts
                         * for the device command completions.
                         */
                        INIT_WORK(&idxd->work, idxd_device_reinit);
                        queue_work(idxd->wq, &idxd->work);
                } else {
                        idxd->state = IDXD_DEV_HALTED;
                        idxd_wqs_quiesce(idxd);
                        idxd_wqs_unmap_portal(idxd);
                        spin_lock(&idxd->dev_lock);
                        idxd_device_clear_state(idxd);
                        dev_err(&idxd->pdev->dev,
                                "idxd halted, need %s.\n",
                                gensts.reset_type ==
IDXD_DEVICE_RESET_FLR ?
                                "FLR" : "system reset");
                        spin_unlock(&idxd->dev_lock);
                        return -ENXIO;
                }
        }

        return 0;
}

So it sees that the device is halted, and sticks
idxd_device_reinint() on that
workqueue. The idxd_device_reinit() has this loop to re-enable the
idxd wqs:
idxd_device_reinit() should called idxd_device_reset() first. And
that
should at some point call idxd_wq_disable_cleanup() and clean up the
states.

https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/irq.c#L42

https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L725

https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L711

https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L376

So if we stick the wq state reset in there, it should show up as
"disabled" by the time we try to enable the WQs again. Does that look
reasonable?

Ah, yeah I see that now. So, if it does set the state to disabled in
idxd_wq_disable_cleanup(), does it have another means to track which
wqs need to be re-enabled for that loop that happens after the
idxd_device_reset() call?

Oh I see what you mean... So we can either do what you did or create a mask and mark the WQ that are "enabled" before reset. Maybe that's cleaner rather than relying on the side effect of the WQ state isn't cleared? Thoughts?



        for (i = 0; i < idxd->max_wqs; i++) {
                struct idxd_wq *wq = idxd->wqs[i];

                if (wq->state == IDXD_WQ_ENABLED) {
                        wq->state = IDXD_WQ_DISABLED;
                        rc = idxd_wq_enable(wq);
                        if (rc < 0) {
                                dev_warn(dev, "Unable to re-enable
wq %s\n",
                                         dev_name(wq_confdev(wq)));
                        }
                }
        }

Once you go into idxd_wq_enable() though you get this check at the
beginning:

        if (wq->state == IDXD_WQ_ENABLED) {
                dev_dbg(dev, "WQ %d already enabled\n", wq->id);
                return 0;
        }

So IIUC it sees the device is halted, goes to reset it, figures out
a wq
should be re-enabled, calls idxd_wq_enable() which hits the check,
returns
0 and the wq is never really re-enabled, though it will still have
wq state
set to IDXD_WQ_ENABLED.

Or am I missing something?

Regards,
Jerry

                        rc = idxd_wq_enable(wq);
                        if (rc < 0) {
                                dev_warn(dev, "Unable to re-
enable wq %s\n",