Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete

From: Waiman Long
Date: Tue Sep 06 2022 - 17:02:37 EST



On 9/6/22 16:50, Peter Zijlstra wrote:
On Tue, Sep 06, 2022 at 04:40:03PM -0400, Waiman Long wrote:

I've not followed the earlier stuff due to being unreadable; just
reacting to this..

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 838623b68031..5d9ea1553ec0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct
task_struct *p,
                if (cpumask_equal(&p->cpus_mask, new_mask))
                        goto out;

-               if (WARN_ON_ONCE(p == current &&
-                                is_migration_disabled(p) &&
-                                !cpumask_test_cpu(task_cpu(p), new_mask)))
{
+               if (is_migration_disabled(p) &&
+                   !cpumask_test_cpu(task_cpu(p), new_mask)) {
+                       WARN_ON_ONCE(p == current);
                        ret = -EBUSY;
                        goto out;
                }
@@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct
task_struct *p,
        if (flags & SCA_USER)
                user_mask = clear_user_cpus_ptr(p);

-       ret = affine_move_task(rq, p, rf, dest_cpu, flags);
+       if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) {
+               ret = affine_move_task(rq, p, rf, dest_cpu, flags);
+       } else {
+               task_rq_unlock(rq, p, rf);
+       }
This cannot be right. There might be previous set_cpus_allowed_ptr()
callers that are blocked and waiting for the task to land on a valid
CPU.

You are probably right. I haven't fully understand all the migration disable code yet. However, if migration is disabled, there are some corner cases we need to handle properly.

Cheers,
Longman