RE: [PATCH v2] blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags

From: Kashyap Desai
Date: Mon Dec 13 2021 - 08:15:19 EST


> On 09/12/2021 13:52, Kashyap Desai wrote:
> > + scsi mailing list
> >
> >> On Mon, 18 Oct 2021 17:41:23 +0800, John Garry wrote:
> >>> Since it is now possible for a tagset to share a single set of tags,
> >>> the iter function should not re-iter the tags for the count of #hw
> >>> queues in that case. Rather it should just iter once.
> > John - Recently we found issue of error hander thread never kicked off
> > and this patch fix the issue.
> > Without this patch, scsi error hander will not find correct host_busy
> > counter.
> >
> > Take one simple case. There is one IO outstanding and that is getting
> > timedout.
> > Now SML wants to wake up EH thread only if, below condition met
> > "scsi_host_busy(shost) == shost->host_failed"
> >
> > Without this patch, shared host tag enabled meagaraid_sas driver will
> > find host_busy = actual outstanding * nr_hw_queues.
> > Error handler thread will never be kicked-off.
> >
> > This patch is mandatory for fixing shared host tag feature and require
> > to be part of stable kernel.
> >
> > Do you need more data for posting to stable kernel ?
>
> To be clear, are you saying that you see the issue which patch "blk-mq:
> Fix blk_mq_tagset_busy_iter() for shared tags" fixes before v5.16-rc?
>
> This patch (now commit 0994c64eb415) and the commit which it is supposed
> to fix, e155b0c238b2, will only be in v5.16, so I don't see anything which
> is
> needed in stable.

Hi John

Yes. No need of posting this to stable. There is still an issue which we
are tracking. It is not always reproducible. I am injecting artificial Task
abort on my setup to reproduce it.
It happens on rhel8.5 most of the time. It is a timing issue so thinking of
reproducing on other kernel as well.
I am suspecting issue might be due to missing commit -
67f3b2f822b7e71cfc9b42dbd9f3144fa2933e0b of [PATCH] blk-mq: avoid to
iterate over stale request

Whenever I notice the issue, there was a symptoms that host_busy is getting
counted for each hctx individually. Let me collect more data and I will
start another thread.

Kashyap

>
> Thanks,
> John

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature