scsi regression that after months is still not addressed and now bothering 6.1.y users, too

From: Thorsten Leemhuis
Date: Tue Nov 21 2023 - 04:51:08 EST


* @SCSI maintainers: could you please look into below please?

* @Stable team: you might want to take a look as well and consider a
revert in 6.1.y (yes, I know, those are normally avoided, but here it
might make sense).

Hi everyone!

TLDR: I noticed a regression (Adaptec 71605z with aacraid sometimes
hangs for a while) that was reported months ago already but is still not
fixed. Not only that, it apparently more and more users run into this
recently, as the culprit was recently integrated into 6.1.y; I wonder if
it would be best to revert it there, unless a fix for mainline comes
into reach soon.

Details:

Quite a few machines with Adaptec controllers seems to hang for a few
tens of seconds to a few minutes before things start to work normally
again for a while:
https://bugzilla.kernel.org/show_bug.cgi?id=217599

That problem is apparently caused by 9dc704dcc09eae ("scsi: aacraid:
Reply queue mapping to CPUs based on IRQ affinity") [v6.4-rc7]. That
commit despite a warning of mine to Sasha recently made it into 6.1.53
-- and that way apparently recently reached more users recently, as
quite a few joined that ticket.

The culprit is authored by Sagar Biradar who unless I missed something
never replied even once to the ticket or earlier mails about it. Lore
has no messages from him since early June.

Hannes Reinecke at least tried to fix it a few weeks ago (many thx), but
that didn't work out (see the ticket for details). Since then things
look stalled again, which is, ehh, unfortunate when it comes to
regressions.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.