Re: [PATCH 6.4 000/227] 6.4.7-rc1 review

From: Guenter Roeck
Date: Thu Jul 27 2023 - 10:40:01 EST


On 7/27/23 07:06, Paul E. McKenney wrote:
On Thu, Jul 27, 2023 at 09:26:52AM -0400, Joel Fernandes wrote:


On Jul 27, 2023, at 7:35 AM, Pavel Machek <pavel@xxxxxxx> wrote:

Hi!

This is the start of the stable review cycle for the 6.4.7 release.
There are 227 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Thu, 27 Jul 2023 10:44:26 +0000.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.7-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y
and the diffstat can be found below.

I saw this when running rcutorture, this one happened in the TREE04
configuration. This is likely due to the stuttering issues we are discussing
in the other thread. Anyway I am just making a note here while I am
continuing to look into it.

So is the stuttering new in 6.4.7?

No it is an old feature in RCU torture tests. But is dependent on timing. Something
changed in recent kernels that is making the issues with it more likely. Its hard to bisect as failure sometimes takes hours.


Other than that, all tests pass:
Tested-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>

...or you still believe 6.4.7 is okay to release?

As such, it should be Ok. However naturally I am not happy that the RCU testing
is intermittently failing. These issues have been seen in last several 6.4 stable releases
so since those were released, maybe this one can be too?
The fix for stuttering is currently being reviewed.

Or, to look at it another way, the stuttering fix is specific to torture
testing. Would we really want to hold up a -stable release only because
rcutorture occasionally gives a false-positive failure on certain types
of systems?


No. However, (unrelated) in linux-next, rcu tests sometimes result in apparent hangs
or long runtime.

[ 0.778841] Mount-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
[ 0.779011] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
[ 0.797998] Running RCU synchronous self tests
[ 0.798209] Running RCU synchronous self tests
[ 0.912368] smpboot: CPU0: AMD Opteron 63xx class CPU (family: 0x15, model: 0x2, stepping: 0x0)
[ 0.923398] RCU Tasks: Setting shift to 2 and lim to 1 rcu_task_cb_adjust=1.
[ 0.925419] Running RCU-tasks wait API self tests

(hangs until aborted). This is primarily with Opteron CPUs, but also with others such as Haswell,
Icelake-Server, and pentium3. It is all but impossible to bisect because it doesn't happen
all the time. All I was able to figure out was that it has to do with rcu changes in linux-next.
I'd be much more concerned about that.

Guenter