Re: [PATCH 3/8] dmaengine: hisilicon: Add multi-thread support for a DMA channel

From: Vinod Koul
Date: Mon Jun 27 2022 - 02:21:38 EST


On 25-06-22, 15:44, Jie Hai wrote:
> When we get a DMA channel and try to use it in multiple threads it
> will cause oops and hanging the system.
>
> % echo 100 > /sys/module/dmatest/parameters/threads_per_chan
> % echo 100 > /sys/module/dmatest/parameters/iterations
> % echo 1 > /sys/module/dmatest/parameters/run
> [383493.327077] Unable to handle kernel paging request at virtual
> address dead000000000108
> [383493.335103] Mem abort info:
> [383493.335103] ESR = 0x96000044
> [383493.335105] EC = 0x25: DABT (current EL), IL = 32 bits
> [383493.335107] SET = 0, FnV = 0
> [383493.335108] EA = 0, S1PTW = 0
> [383493.335109] FSC = 0x04: level 0 translation fault
> [383493.335110] Data abort info:
> [383493.335111] ISV = 0, ISS = 0x00000044
> [383493.364739] CM = 0, WnR = 1
> [383493.367793] [dead000000000108] address between user and kernel
> address ranges
> [383493.375021] Internal error: Oops: 96000044 [#1] PREEMPT SMP
> [383493.437574] CPU: 63 PID: 27895 Comm: dma0chan0-copy2 Kdump:
> loaded Tainted: GO 5.17.0-rc4+ #2
> [383493.457851] pstate: 204000c9 (nzCv daIF +PAN -UAO -TCO -DIT
> -SSBS BTYPE=--)
> [383493.465331] pc : vchan_tx_submit+0x64/0xa0
> [383493.469957] lr : vchan_tx_submit+0x34/0xa0
>
> This happens because of data race. Each thread rewrite channels's
> descriptor as soon as device_issue_pending is called. It leads to
> the situation that the driver thinks that it uses the right
> descriptor in interrupt handler while channels's descriptor has
> been changed by other thread.
>
> With current fixes channels's descriptor changes it's value only
> when it has been used. A new descriptor is acquired from
> vc->desc_issued queue that is already filled with descriptors
> that are ready to be sent. Threads have no direct access to DMA
> channel descriptor. Now it is just possible to queue a descriptor
> for further processing.
>
> Fixes: e9f08b65250d ("dmaengine: hisilicon: Add Kunpeng DMA engine support")
> Signed-off-by: Jie Hai <haijie1@xxxxxxxxxx>
> ---
> drivers/dma/hisi_dma.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/dma/hisi_dma.c b/drivers/dma/hisi_dma.c
> index 0a0f8a4d168a..0385419be8d5 100644
> --- a/drivers/dma/hisi_dma.c
> +++ b/drivers/dma/hisi_dma.c
> @@ -271,7 +271,6 @@ static void hisi_dma_start_transfer(struct hisi_dma_chan *chan)
>
> vd = vchan_next_desc(&chan->vc);
> if (!vd) {
> - dev_err(&hdma_dev->pdev->dev, "no issued task!\n");

how is this a fix?

> chan->desc = NULL;
> return;
> }
> @@ -303,7 +302,7 @@ static void hisi_dma_issue_pending(struct dma_chan *c)
>
> spin_lock_irqsave(&chan->vc.lock, flags);
>
> - if (vchan_issue_pending(&chan->vc))
> + if (vchan_issue_pending(&chan->vc) && !chan->desc)

This looks good

> hisi_dma_start_transfer(chan);
>
> spin_unlock_irqrestore(&chan->vc.lock, flags);
> @@ -442,11 +441,10 @@ static irqreturn_t hisi_dma_irq(int irq, void *data)
> chan->qp_num, chan->cq_head);
> if (FIELD_GET(STATUS_MASK, cqe->w0) == STATUS_SUCC) {
> vchan_cookie_complete(&desc->vd);
> + hisi_dma_start_transfer(chan);

Why should this fix the error reported?

> } else {
> dev_err(&hdma_dev->pdev->dev, "task error!\n");
> }
> -
> - chan->desc = NULL;
> }
>
> spin_unlock(&chan->vc.lock);
> --
> 2.33.0

--
~Vinod