Re: probable quota bug introduced in 6.6-rc1

From: Bagas Sanjaya
Date: Wed Oct 04 2023 - 02:58:19 EST


On Tue, Oct 03, 2023 at 08:11:11PM -0400, Eric Whitney wrote:
> When run on my test hardware, generic/270 triggers hung task timeouts when
> run on a 6.6-rc1 (or -rc2, -rc3, -rc4) kernel with kvm-xfstests using the
> nojournal test scenario. The test always passes, but about 60% of the time
> the running time of the test increases by an order of magnitude or more and
> one or more of the hung task timeout warnings included below can be found in
> the log.
>
> This does not reproduce on 6.5. Bisection leads to this patch:
>
> dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should
> provide")

Can you revert the culprit to see if it helps?

>
> >From the log:
>
> generic/270 306s ... [20:08:45][ 311.322318] run fstests generic/270 at 2023-10-03 20:08:45
> [ 311.579641] EXT4-fs (vdc): mounted filesystem d0e542a0-3342-4d43-aa1f-c918cc92aafa r/w without journal. Quota mode: writeback.
> [ 311.587978] EXT4-fs (vdc): re-mounted d0e542a0-3342-4d43-aa1f-c918cc92aafa ro. Quota mode: writeback.
> [ 311.592725] EXT4-fs (vdc): re-mounted d0e542a0-3342-4d43-aa1f-c918cc92aafa r/w. Quota mode: writeback.
> [ 335.491107] 270 (3092): drop_caches: 3
> [ 491.167988] INFO: task quotaon:3450 blocked for more than 122 seconds.
> [ 491.168334] Not tainted 6.4.0+ #13
> [ 491.168544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 491.168936] task:quotaon state:D stack:0 pid:3450 ppid:3092 flags:0x00004000
> [ 491.169363] Call Trace:
> [ 491.169503] <TASK>
> [ 491.169620] __schedule+0x394/0xd40
> [ 491.169813] schedule+0x5d/0xd0
> [ 491.169981] schedule_timeout+0x1a7/0x1c0
> [ 491.170191] ? lock_release+0x139/0x280
> [ 491.170395] ? lock_acquire+0xb9/0x180
> [ 491.170605] ? do_raw_spin_unlock+0x4b/0xa0
> [ 491.170837] __wait_for_common+0xb6/0x1e0
> [ 491.171046] ? __pfx_schedule_timeout+0x10/0x10
> [ 491.171324] __flush_work+0x2da/0x430
> [ 491.171517] ? __pfx_wq_barrier_func+0x10/0x10
> [ 491.171747] ? 0xffffffff81000000
> [ 491.171932] dquot_disable+0x3e5/0x670
> [ 491.172134] ext4_quota_off+0x50/0x1a0
> [ 491.172332] __x64_sys_quotactl+0x87/0x1c0
> [ 491.172545] do_syscall_64+0x38/0x90
> [ 491.172731] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [ 491.172990] RIP: 0033:0x7f3c7c79eada
> [ 491.173176] RSP: 002b:00007ffed2ff4478 EFLAGS: 00000246 ORIG_RAX: 00000000000000b3
> [ 491.173558] RAX: ffffffffffffffda RBX: 000055886a3997d0 RCX: 00007f3c7c79eada
> [ 491.173915] RDX: 0000000000000000 RSI: 000055886bf43de0 RDI: 0000000080000301
> [ 491.174271] RBP: 000055886bf43de0 R08: 0000000000000001 R09: 0000000000000002
> [ 491.174657] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
> [ 491.175014] R13: 000055886bf43ea0 R14: 0000000000000001 R15: 0000000000000000
> [ 491.175373] </TASK>
> [ 491.175491]
> [ 491.175491] Showing all locks held in the system:
> [ 491.176706] 1 lock held by rcu_tasks_kthre/12:
> [ 491.178126] #0: ffffffff82763970 (rcu_tasks.tasks_gp_mutex){....}-{3:3}, at: rcu_tasks_one_gp+0x30/0x3f0
> [ 491.180955] 1 lock held by rcu_tasks_rude_/13:
> [ 491.182394] #0: ffffffff827636f0 (rcu_tasks_rude.tasks_gp_mutex){....}-{3:3}, at: rcu_tasks_one_gp+0x30/0x3f0
> [ 491.194388] 1 lock held by khungtaskd/26:
> [ 491.196153] #0: ffffffff82764020 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0xe/0x110
> [ 491.199676] 2 locks held by kworker/u4:4/59:
> [ 491.200722] #0: ffff88800385cd38 ((wq_completion)events_unbound){....}-{0:0}, at: process_one_work+0x1f6/0x550
> [ 491.201600] #1: ffffc90000513e80 ((quota_release_work).work){....}-{0:0}, at: process_one_work+0x1f6/0x550
> [ 491.202746] 1 lock held by quotaon/3450:
> [ 491.203184] #0: ffff88800afd60e0 (&type->s_umount_key#33){....}-{3:3}, at: user_get_super+0xd3/0x100
> [ 491.204217]
> [ 491.204373] =============================================
>

Anyway, thanks for the regression report. I'm adding it to regzbot:

#regzbot ^introduced: dabc8b20756601
#regzbot title: dqput() fix causes kvm-xfstests nojournal test time longer

--
An old man doll... just what I always wanted! - Clara

Attachment: signature.asc
Description: PGP signature