Re: [PATCH 6.4 000/227] 6.4.7-rc1 review

From: Guenter Roeck
Date: Mon Jul 31 2023 - 00:17:12 EST


On 7/30/23 20:54, Paul E. McKenney wrote:
On Thu, Jul 27, 2023 at 09:22:52PM -0700, Guenter Roeck wrote:
On 7/27/23 13:33, Paul E. McKenney wrote:
[ ... ]

So which of the following Kconfig options is defined in your .config?
CONFIG_TASKS_RCU, CONFIG_TASKS_RUDE_RCU, and CONFIG_TASKS_TRACE_RCU.


Only CONFIG_TASKS_RCU. I added another log message after call_rcu_tasks().
It never returns from that function.

[ 1.168993] Running RCU synchronous self tests
[ 1.169219] Running RCU synchronous self tests
[ 1.285795] smpboot: CPU0: Intel Xeon Processor (Cascadelake) (family: 0x6, model: 0x55, stepping: 0x6)
[ 1.302827] RCU Tasks: Setting shift to 0 and lim to 1 rcu_task_cb_adjust=1.
[ 1.304526] Running RCU Tasks wait API self tests

... and then nothing for at least 10 minutes (then I gave up and stopped the test).

Qemu command line:

qemu-system-x86_64 -kernel \
arch/x86/boot/bzImage -M q35 -cpu Cascadelake-Server -no-reboot \
-snapshot -device e1000e,netdev=net0 -netdev user,id=net0 -m 256 \
-drive file=rootfs.iso,format=raw,if=ide,media=cdrom \
--append "earlycon=uart8250,io,0x3f8,9600n8 panic=-1 slub_debug=FZPUA root=/dev/sr0 rootwait console=ttyS0 noreboot" \
-d unimp,guest_errors -nographic -monitor none

Again, this doesn't happen all the time. With Cascadelake-Server
I see it maybe once every 5 boot attempts. I tried with qemu v8.0
and v8.1. Note that it does seem to happen with various CPU types,
only for some it seems to me more likely to happen (so maybe the
CPU type was a red herring). It does seem to depend on the system
load, and happen more often if the system is under heavy load.

Hmmm... What kernel are you using as your qemu/KVM hypervisor?


Not sure I understand the question. KVM is disabled in my systems.
The host CPUs are Ryzen 3900X and 5900X, but I don't really see why
that would matter.

And I echo Joel's requests for your .config file.


Did you see the e-mail I sent about this problem earlier today ?

https://lore.kernel.org/lkml/3da81a5c-700b-8e21-1bde-27dd3a0b8945@xxxxxxxxxxxx/

I think I'll declare this to be a problem with my test environment and disable
RCU debugging.

Thanks,
Guenter