Re: [PATCH v4 00/13] Generalized Priority Inheritance via Proxy Execution v3

From: K Prateek Nayak
Date: Mon Jun 12 2023 - 23:14:15 EST

Next message: Edgecombe, Rick P: "Re: [PATCH v9 00/42] Shadow stacks for userspace"
Previous message: R.F. Burns: "PC speaker"
In reply to: John Stultz: "Re: [PATCH v4 00/13] Generalized Priority Inheritance via Proxy Execution v3"
Next in thread: Dietmar Eggemann: "Re: [PATCH v4 00/13] Generalized Priority Inheritance via Proxy Execution v3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello John,

On 6/13/2023 12:22 AM, John Stultz wrote:
> On Mon, Jun 12, 2023 at 10:21 AM K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
>> On 6/1/2023 11:28 AM, John Stultz wrote:
>>> [..snip..]
>>>
>>> Issues still to address:
>>> —----------
>>> * Occasional getting null scheduler entities from pick_next_entity() in
>>> CFS. I’m a little stumped as to where this is going awry just yet, and
>>> delayed sending this out, but figured it was worth getting it out for
>>> review on the other issues while I chase this down.
>>
>> I'm consistently hitting the above issue early during the boot when I was
>> trying to test the series on a dual socket 3rd Generation EPYC platform
>> (2 x 64C/128T). Sharing the trace below:
>>
>> [ 20.821808] ------------[ cut here ]------------
>> [ 20.826432] kernel BUG at kernel/sched/core.c:7462!
>> [ 20.831322] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>> [ 20.836545] CPU: 250 PID: 0 Comm: swapper/250 Not tainted 6.4.0-rc4-proxy-execution-v4+ #474
>> [ 20.844976] Hardware name: Dell Inc. PowerEdge R6525/024PW1, BIOS 2.7.3 03/30/2022
>> [ 20.852544] RIP: 0010:__schedule+0x18b6/0x20a0
>> [ 20.856998] Code: 0f 85 51 04 00 00 83 ad 50 ff ff ff 01 0f 85 05 e9 ff ff f3 0f 1e fa 48 8b 35 0e 0c fe 00 48 c7 c7 33 a1 c1 85 e8 ca 37 23 ff <0f> 0b 4c 89 ff 4c 8b 6d 98 e8 1c 82 00 00 4c 89 f7 e8 14 82 00 00
>> [ 20.875744] RSP: 0018:ffffbd1e4d1d7dd0 EFLAGS: 00010082
>> [ 20.880970] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000005
>> [ 20.888104] RDX: ffff9d4d0006b000 RSI: 0000000000000200 RDI: ffff9d4d0004d400
>> [ 20.895235] RBP: ffffbd1e4d1d7e98 R08: 0000000000000024 R09: ffffffffff7edbd0
>> [ 20.902369] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9d4d12e25a20
>> [ 20.909501] R13: ffff9dcbffab3840 R14: ffffbd1e4d1d7e50 R15: ffff9dcbff2b3840
>> [ 20.916632] FS: 0000000000000000(0000) GS:ffff9dcbffa80000(0000) knlGS:0000000000000000
>> [ 20.924709] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 20.930449] CR2: 00007f92120c4800 CR3: 000000011477a002 CR4: 0000000000770ee0
>> [ 20.937581] PKRU: 55555554
>> [ 20.940292] Call Trace:
>> [ 20.942741] <TASK>
>> [ 20.944845] ? show_regs+0x6e/0x80
>> [ 20.948259] ? die+0x3c/0xa0
>> [ 20.951146] ? do_trap+0xd4/0xf0
>> [ 20.954377] ? do_error_trap+0x75/0xa0
>> [ 20.958129] ? __schedule+0x18b6/0x20a0
>> [ 20.961971] ? exc_invalid_op+0x57/0x80
>> [ 20.965808] ? __schedule+0x18b6/0x20a0
>> [ 20.969648] ? asm_exc_invalid_op+0x1f/0x30
>> [ 20.973835] ? __schedule+0x18b6/0x20a0
>> [ 20.977672] ? cpuidle_enter_state+0xde/0x710
>> [ 20.982033] schedule_idle+0x2e/0x50
>> [ 20.985614] do_idle+0x15d/0x240
>> [ 20.988845] cpu_startup_entry+0x24/0x30
>> [ 20.992772] start_secondary+0x126/0x160
>> [ 20.996695] secondary_startup_64_no_verify+0x10b/0x10b
>> [ 21.001924] </TASK>
>> [ 21.004117] Modules linked in: sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_devintf ipmi_msghandler msr ramoops reed_solomon pstore_blk pstore_zone efi_pstore
>> ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mgag200
>> i2c_algo_bit drm_shmem_helper drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect aesni_intel sysimgblt crypto_simd crc32_pclmul cryptd crct10dif_pclmul sha512_ssse3 xhci_pci tg3 drm
>> xhci_pci_renesas megaraid_sas wmi
>> [ 21.055707] Dumping ftrace buffer:
>> [ 21.059291] ---------------------------------
>> [ 21.063697] <idle>-0 250dn.2. 21175635us : __schedule: JDB: BUG!!! pick next retry_count > 50
>> [ 21.072915] ---------------------------------
>> [ 21.077282] ---[ end trace 0000000000000000 ]---
>>
>> $ sed -n 7460,7462p kernel/sched/core.c
>> if (retry_count++ > 50) {
>> trace_printk("JDB: BUG!!! pick next retry_count > 50\n");
>> BUG();
>>
>> Hope it helps during the debug. If you have a fix in mind that you
>> would like me to test, please do let me know.
>
> Thank you for the testing and feedback here! I really appreciate it!
> And my apologies that you're hitting trouble here!

No worries! Better to hit the snags now than later :)

>
>>> * Better deadlock handling in proxy(): With the ww_mutex issues
>>> resolved, we shouldn’t see circular blocked_on references, but a
>>> number of the bugs I’ve been chasing recently come from getting stuck
>>> with proxy() returning null forcing a reselection over and over. These
>>> are still bugs to address, but my current thinking is that if we get
>>> stuck like this, we can start to remove the selected mutex blocked
>>> tasks from the rq, and let them be woken from the mutex waiters list
>>> as is done currently? Thoughts here would be appreciated.
>>>
>>> * More work on migration correctness (RT/DL load balancing,etc). I’m
>>> still seeing occasional trouble as cpu counts go up which seems to be
>>> due to a bunch of tasks being proxy migrated to a cpu, then having to
>>> migrate them all away at once (seeing lots of pick again iterations).
>>> This may actually be correct, due to chain migration, but it ends up
>>> looking similar to a deadlock.
>
> So I suspect what you're seeing is a combination of the two items
> above. With 128 threads, my deadlock detection BUG() at 50 is probably
> far too low, as migration chains can get pretty long.
> Clearly BUG'ing at a fixed count is the wrong approach (but was
> helpful for quickly catching problems and debugging in my
> environment).

Ah! I see. Thank you for clarifying. Let me check if going to commit
259a8134aa27 ("sched: Potential fixup, not sure why rq_selected() is used
here") helps.

>
> My main priority is trying to get the null se crashes resolved (almost
> have my hands around it, but took a little break from it end of last
> week), and once I have something there I'll update and share my WIP
> tree:
> https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-WIP

I'll keep an eye out for any updates on the branch.

>
> To include some extra trace logging and I'll reach out to see if you
> can capture the issue again.
>
> Thanks so much again for your interest and help in testing!
> -john

--
Thanks and Regards,
Prateek

Next message: Edgecombe, Rick P: "Re: [PATCH v9 00/42] Shadow stacks for userspace"
Previous message: R.F. Burns: "PC speaker"
In reply to: John Stultz: "Re: [PATCH v4 00/13] Generalized Priority Inheritance via Proxy Execution v3"
Next in thread: Dietmar Eggemann: "Re: [PATCH v4 00/13] Generalized Priority Inheritance via Proxy Execution v3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]