Re: [PATCH] sched/fair: Don't balance migration disabled tasks

From: Valentin Schneider
Date: Tue May 23 2023 - 07:48:01 EST


On 16/05/23 19:10, Yicong Yang wrote:
> Hi Valentin,
> Sorry for the late reply. Yes it can be reproduced on the upstream kernel (tested below on
> 6.4-rc1). Since it happens occasionally with the normal setup, I wrote a test kthread
> with migration enable/disable periodically:
>
> static int workload_func(void *data)
> {
> cpumask_var_t cpumask;
> int i;
>
> if (!zalloc_cpumask_var(&cpumask, GFP_KERNEL))
> return -ENOMEM;
>
> for (i = 0; i < 8; i++)
> cpumask_set_cpu(i, cpumask);
>
> set_cpus_allowed_ptr(current, cpumask);
> free_cpumask_var(cpumask);
>
> while (!kthread_should_stop()) {
> migrate_disable();
> mdelay(1000);
> cond_resched();
> migrate_enable();
> mdelay(1000);
> }
>
> return -1;
> }
>
> Launching this and bind another workload to the same CPU it's currently running like
> `taskset -c $workload_cpu stress-ng -c 1` will trigger the issue. In fact, the problem
> is not because of the migration disable mechanism which works well, but because the
> balancing policy after found all the tasks on the source CPU are pinned. With below
> debug print added:
>
> @@ -8527,6 +8527,20 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
> if (kthread_is_per_cpu(p))
> return 0;
>
> + if (is_migration_disabled(p)) {
> + if (!p->on_cpu && cpumask_test_cpu(env->dst_cpu, p->cpus_ptr))
> + pr_err("dst_cpu %d on_cpu %d cpus_ptr %*pbl cpus_mask %*pbl",
> + env->dst_cpu, p->on_cpu, cpumask_pr_args(p->cpus_ptr),
> + cpumask_pr_args(&p->cpus_mask));
> + }
> +
> if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {
> int cpu;
>
> I got below output:
>
> [ 686.135619] dst_cpu 1 on_cpu 0 cpus_ptr 1 cpus_mask 0-7
> [ 686.148809] ------------[ cut here ]------------
> [ 686.169505] WARNING: CPU: 64 PID: 0 at kernel/sched/core.c:3210 set_task_cpu+0x190/0x250
> [ 686.186537] Modules linked in: kthread_workload(O) bluetooth rfkill xt_CHECKSUM iptable_mangle xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_filter ip6_tables iptable_filter ib_isert iscsi_target_mod ib_ipoib ib_umad rpcrdma ib_iser libiscsi scsi_transport_iscsi crct10dif_ce hns_roce_hw_v2 arm_spe_pmu sbsa_gwdt sm4_generic sm4 xts ecb hisi_hpre hisi_uncore_l3c_pmu hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu hisi_trng_v2 rng_core hisi_uncore_pmu spi_dw_mmio hisi_zip hisi_sec2 hisi_qm uacce hclge hns3 hnae3 hisi_sas_v3_hw hisi_sas_main libsas [last unloaded: kthread_workload(O)]
> [ 686.293937] CPU: 64 PID: 0 Comm: swapper/64 Tainted: G O 6.4.0-rc1-migration-race-debug+ #24
> [ 686.314616] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V5.B211.01 11/10/2021
> [ 686.333285] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 686.347930] pc : set_task_cpu+0x190/0x250
> [...]
>
> It shows that we're going to balance the task to its current CPU (CPU 1) rather than
> the balancer CPU (CPU 64). It's because we're going to find a new dst_cpu if the task
> on the src_cpu is pinned, and the new_dst_cpu happens to be the task's current CPU.
>

Nicely found! Thanks for having spent time on this. I haven't been able to
retrigger the issue using your reproducer, but I think you have indeed
found the root cause of it.

> So the right way to solve this maybe avoid selecting the src_cpu as the new_dst_cpu and
> below patch works to solve this issue.
>
> @@ -8550,7 +8564,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>
> /* Prevent to re-select dst_cpu via env's CPUs: */
> for_each_cpu_and(cpu, env->dst_grpmask, env->cpus) {
> - if (cpumask_test_cpu(cpu, p->cpus_ptr)) {
> + if (cpumask_test_cpu(cpu, p->cpus_ptr) && cpu != env->src_cpu) {
> env->flags |= LBF_DST_PINNED;
> env->new_dst_cpu = cpu;
> break;
>

Other than having some better inplace cpumask helpers, I don't think we can
make this look better. Could you send this change as a proper patch, please?

> Thanks,
> Yicong