Re: Commit 282d8998e997 (srcu: Prevent expedited GPs and blocking readers from consuming CPU) cause qemu boot slow

From: zhangfei.gao@xxxxxxxxxxx
Date: Sun Jun 12 2022 - 11:00:03 EST


Hi, Paul

On 2022/6/12 下午9:36, Paul E. McKenney wrote:
On Sun, Jun 12, 2022 at 03:40:30PM +0800, zhangfei.gao@xxxxxxxxxxx wrote:
Hi, Paul

On 2022/6/12 上午12:59, Paul E. McKenney wrote:
On Sun, Jun 12, 2022 at 12:32:59AM +0800, Zhangfei Gao wrote:
Hi, Paul

When verifying qemu with acpi rmr feature on v5.19-rc1, the guest kernel
stuck for several minutes.
Stuck for several minutes but then continues normally? Or stuck for
several minutes before you kill qemu?
qemu boot stuck for several minutes, then guest can bootup normally, just
slower.
And I have to ask... What happened without the ACPI RMR feature?
If no ACPI, qemu boot quickly without stuck.
build/aarch64-softmmu/qemu-system-aarch64 -machine
virt,gic-version=3,iommu=smmuv3 \
-enable-kvm -cpu host -m 1024 \
-kernel Image -initrd mini-rootfs.cpio.gz -nographic -append \
"rdinit=init console=ttyAMA0 earlycon=pl011,0x9000000 kpti=off"

Adding acpi=force & -bios QEMU_EFI.fd, qemu boot stuck for several minutes.

By the way, my hardware platform is aarch64.
Thank you for the information! The problem is excessive delay rather
than a hang, and it is configuration-dependent. Good to know!

Only change this can solve the stuck issue.

--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -524,6 +524,10 @@ static unsigned long srcu_get_delay(struct srcu_struct
*ssp)
 {
        unsigned long jbase = SRCU_INTERVAL;

+       if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq),
READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
+               return 0;
+       return SRCU_INTERVAL;
I am glad that you have a workaround for this problem, but this change
would re-introduce the problem that commit 282d8998e997 ("srcu: Prevent
expedited GPs and blocking readers from consuming CPU") was intended
to fix. For one example, your change can prevent kernel live patching
from applying a patch. So something else is needed.
Understand, it is just debug where has issue.

Does changing the value of SRCU_MAX_INTERVAL to (say) 3 decrease the delay
significantly? (This is not a fix, either, but instead a debug check.)
No use.

Your change always returns zero if another SRCU grace period is needed.
Let's look at the callers of srcu_get_delay():

o cleanup_srcu_struct() uses it to check whether there is an
expedited grace period pending, leaking the srcu_struct if so.
This should not affect boot delay. (Unless you are invoking
init_srcu_struct() and cleanup_srcu_struct() really really
often.)

o srcu_gp_end() uses it to determine whether or not to allow
a one-jiffy delay before invoking callbacks at the end of
a grace period.

o srcu_funnel_gp_start() uses it to determine whether or not to
allow a one-jiffy delay before starting the process of checking
for the end of an SRCU grace period.

o try_check_zero() uses it to add an additional short delay
(instead of a long delay) between checks of reader state.

o process_srcu() uses it to calculate the long delay between
checks of reader state.

These add one-jiffy delays, except for process_srcu(), which adds a delay
of up to 10 jiffies. Even given HZ=100 (as opposed to the HZ=1000 that
I normally use), this requires thousands of such delays to add up to the
several minutes that you are seeing. (In theory, the delays could also
be due to SRCU readers, except that in that case adjusting timeouts in
the grace-period processing would not make things go faster.)

So, does acpi=force & -bios QEMU_EFI.fd add SRCU grace periods? If so,
it would be very good make sure that this code is using SRCU efficiently.
One way to check would be to put a printk() into synchronize_srcu(),
though maintaining a counter and printing (say) every 1000th invocation
might be easier on the console output.
good idea.

And on 5.18, there is no such problem.

After revert this patch, the issue solved.
Commit 282d8998e997 (srcu: Prevent expedited GPs and blocking readers from
consuming CPU)


qemu cmd:
build/aarch64-softmmu/qemu-system-aarch64 -machine
virt,gic-version=3,iommu=smmuv3 \
-enable-kvm -cpu host -m 1024 \
-kernel Image -initrd mini-rootfs.cpio.gz -nographic -append \
"rdinit=init console=ttyAMA0 earlycon=pl011,0x9000000 kpti=off acpi=force" \
-bios QEMU_EFI.fd

log:
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 7AA4D040
add-symbol-file /home/linaro/work/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC48/AARCH64/NetworkPkg/IScsiDxe/IScsiDxe/DEBUG/IScsiDxe.dll
0x75459000
Loading driver at 0x00075458000 EntryPoint=0x00075459058 IScsiDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 7AA4DE98
ProtectUefiImageCommon - 0x7AA4D040
  - 0x0000000075458000 - 0x000000000003F000
SetUefiImageMemoryAttributes - 0x0000000075458000 - 0x0000000000001000
(0x0000000000004008)
SetUefiImageMemoryAttributes - 0x0000000075459000 - 0x000000000003B000
(0x0000000000020008)
SetUefiImageMemoryAttributes - 0x0000000075494000 - 0x0000000000003000
(0x0000000000004008)
InstallProtocolInterface: 18A031AB-B443-4D1A-A5C0-0C09261E9F71 754952C8
InstallProtocolInterface: 107A772C-D5E1-11D4-9A46-0090273FC14D 75495358
InstallProtocolInterface: 6A7A5CFF-E8D9-4F70-BADA-75AB3025CE14 75495370
InstallProtocolInterface: 18A031AB-B443-4D1A-A5C0-0C09261E9F71 754952F8
InstallProtocolInterface: 107A772C-D5E1-11D4-9A46-0090273FC14D 75495358
InstallProtocolInterface: 6A7A5CFF-E8D9-4F70-BADA-75AB3025CE14 75495370
InstallProtocolInterface: 59324945-EC44-4C0D-B1CD-9DB139DF070C 75495348
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 754953E8
InstallProtocolInterface: 330D4706-F2A0-4E4F-A369-B66FA8D54385 7AA4D728


Not sure it is either reported or solved.
This is the first I have heard of it, so thank you for reporting it.

Do you have a way of collecting something sysrq-t output?
Do you mean "echo t > /proc/sysrq-trigger",
There are too much output and kernel dump can not stop.
OK. What other tools do you have to work out what is happening during
temporary hangs such as this one?

The question to be answered: "Is there usually at least one task waiting
in synchronize_srcu() during these hangs, and if so, which srcu_struct
is passed to those synchronize_srcu() calls?"

As you suggested, add print in __synchronize_srcu, 1000 times print once.

With acpi=force & -bios QEMU_EFI.fd

When qemu stuck in
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 7AA4D040
add-symbol-file /home/linaro/work/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC48/AARCH64/NetworkPkg/IScsiDxe/IScsiDxe/DEBUG/IScsiDxe.dll 0x75459000

The print in  __synchronize_srcu is print from 0 t0 9001.

[   94.271350] gzf __synchronize_srcu loop=1001
....

[  222.621659]  __synchronize_srcu loop=9001
[  222.621664] CPU: 96 PID: 2294 Comm: qemu-system-aar Not tainted 5.19.0-rc1-15071-g697f40b5235f-dirty #615
[  222.621666] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD, BIOS 2280-V2 CS V5.B133.01 03/25/2021
[  222.621667] Call trace:
[  222.621668]  dump_backtrace+0xe4/0xf0
[  222.621670] show_stack+0x20/0x70
[  222.621672] dump_stack_lvl+0x8c/0xb8
[  222.621674]  dump_stack+0x18/0x34
[  222.621676] __synchronize_srcu+0x120/0x128
[  222.621678] synchronize_srcu_expedited+0x2c/0x40
[  222.621680] kvm_swap_active_memslots+0x130/0x198
[  222.621683] kvm_activate_memslot+0x40/0x68
[  222.621684]  kvm_set_memslot+0x184/0x3b0
[  222.621686]  __kvm_set_memory_region+0x288/0x438
[  222.621688] kvm_set_memory_region+0x3c/0x60
[  222.621689]  kvm_vm_ioctl+0x5a0/0x13e0
[  222.621691]  __arm64_sys_ioctl+0xb0/0xf8
[  222.621693]  invoke_syscall+0x4c/0x110
[  222.621696] el0_svc_common.constprop.0+0x68/0x128
[  222.621698]  do_el0_svc+0x34/0xc0
[  222.621701]  el0_svc+0x30/0x98
[  222.621704]  el0t_64_sync_handler+0xb8/0xc0
[  222.621706]  el0t_64_sync+0x18c/0x190


If no acpi=force, no print at all, 1000 times one print.

Thanks