Re: UBSAN: shift-out-of-bounds in load_balance

From: Vincent Guittot
Date: Tue Feb 23 2021 - 08:46:41 EST


On Tue, 23 Feb 2021 at 13:03, Valentin Schneider
<valentin.schneider@xxxxxxx> wrote:
>
>
> +Vincent
>
> On 22/02/21 09:12, syzbot wrote:
> > syzbot has found a reproducer for the following issue on:
> >
> > HEAD commit: 31caf8b2 Merge branch 'linus' of git://git.kernel.org/pub/..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=16ab2682d00000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=b81388f0b32761d4
> > dashboard link: https://syzkaller.appspot.com/bug?extid=d7581744d5fd27c9fbe1
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1277457f500000
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+d7581744d5fd27c9fbe1@xxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > ================================================================================
> > UBSAN: shift-out-of-bounds in kernel/sched/fair.c:7712:14
> > shift exponent 149 is too large for 64-bit type 'long unsigned int'
>
> That 149 is surprising.

Yes, surprising. But is it really a problem in itself ? shifting left
would be a problem because of the overflow but here we shift right to
divide and the result is correct

Beside this, it seems that a significant number of previous attempts
to balance load has been done with another migration_type otherwise it
would have raised a problem earlier (at 65) if previous LB were also
migration_load. It would be good to understand why the 148 previous
ones failed

>
> sd->cache_nice_tries is \in {1, 2}, and sd->nr_balanced_failed should be in
> the same ballpark.
>
> A successful load_balance() resets it to 0; a failed one increments
> it. Once it gets to sd->cache_nice_tries + 3, this should trigger an active
> balance, which will either set it to sd->cache_nice_tries+1 or reset it to
> 0. There is this one condition that could let it creep up uncontrollably:
>
> /*
> * Don't kick the active_load_balance_cpu_stop,
> * if the curr task on busiest CPU can't be
> * moved to this_cpu:
> */
> if (!cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr)) {
> raw_spin_unlock_irqrestore(&busiest->lock,
> flags);
> goto out_one_pinned;
> }
>
> So despite the resulting sd->balance_interval increase, repeatedly hitting
> this might yield the above. Would we then want something like this?
>
> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8a8bd7b13634..b65c24b5ae91 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7422,6 +7422,11 @@ struct lb_env {
> struct list_head tasks;
> };
>
> +static inline unsigned int sd_balance_failed_cap(struct sched_domain *sd)
> +{
> + return sd->cache_nice_tries + 3;
> +}
> +
> /*
> * Is this task likely cache-hot:
> */
> @@ -9493,7 +9498,7 @@ imbalanced_active_balance(struct lb_env *env)
> * threads on a system with spare capacity
> */
> if ((env->migration_type == migrate_task) &&
> - (sd->nr_balance_failed > sd->cache_nice_tries+2))
> + (sd->nr_balance_failed >= sd_balance_failed_cap(sd)))
> return 1;
>
> return 0;
> @@ -9737,8 +9742,10 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> * frequent, pollute the failure counter causing
> * excessive cache_hot migrations and active balances.
> */
> - if (idle != CPU_NEWLY_IDLE)
> - sd->nr_balance_failed++;
> + if (idle != CPU_NEWLY_IDLE) {
> + sd->nr_balance_failed = min(sd->nr_balance_failed + 1,
> + sd_balance_failed_cap(sd));

nr_balance_failed is an interesting metric that we want to monitor
sometimes and we would like to be able to divide higher than
2^(sd->cache_nice_tries + 3).

If we really want to prevent out of bound shift, The below is more
appropriate IMO:

index 636741fa27c9..4d0b3fa30849 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7707,7 +7707,7 @@ static int detach_tasks(struct lb_env *env)
* migrate.
*/

- if ((load >> env->sd->nr_balance_failed) >
env->imbalance)
+ if ((load >> min_t(int,
env->sd->nr_balance_failed, BITS_PER_LONG)) > env->imbalance)
goto next;

env->imbalance -= load;


> + }
>
> if (need_active_balance(&env)) {
> unsigned long flags;
>