Re: [PATCH 2/2] sched: update runqueue clock before migrations away

From: bsegall
Date: Mon Dec 09 2013 - 13:13:39 EST


Chris Redpath <chris.redpath@xxxxxxx> writes:

> If we migrate a sleeping task away from a CPU which has the
> tick stopped, then both the clock_task and decay_counter will
> be out of date for that CPU and we will not decay load correctly
> regardless of how often we update the blocked load.
>
> This is only an issue for tasks which are not on a runqueue
> (because otherwise that CPU would be awake) and simultaneously
> the CPU the task previously ran on has had the tick stopped.
>
> Signed-off-by: Chris Redpath <chris.redpath@xxxxxxx>

This looks like it is basically correct, but it seems unfortunate to
take any rq lock for these ttwus. I don't know enough about the nohz
machinery to know if that's at all avoidable.


> ---
> kernel/sched/fair.c | 30 ++++++++++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b7e5945..0af1dc2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4324,6 +4324,7 @@ unlock:
> return new_cpu;
> }
>
> +static int nohz_test_cpu(int cpu);
> /*
> * Called immediately before a task is migrated to a new cpu; task_cpu(p) and
> * cfs_rq_of(p) references at time of call are still valid and identify the
> @@ -4343,6 +4344,25 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu)
> * be negative here since on-rq tasks have decay-count == 0.
> */
> if (se->avg.decay_count) {
> + /*
> + * If we migrate a sleeping task away from a CPU
> + * which has the tick stopped, then both the clock_task
> + * and decay_counter will be out of date for that CPU
> + * and we will not decay load correctly.
> + */
> + if (!se->on_rq && nohz_test_cpu(task_cpu(p))) {
p->on_rq - se->on_rq must be false to call set_task_cpu at all. That
said, barring bugs like the one you fixed in patch 1 I think decay_count
!= 0 should also imply !p->on_rq.

> + struct rq *rq = cpu_rq(task_cpu(p));
> + unsigned long flags;
> + /*
> + * Current CPU cannot be holding rq->lock in this
> + * circumstance, but another might be. We must hold
> + * rq->lock before we go poking around in its clocks
> + */
> + raw_spin_lock_irqsave(&rq->lock, flags);
> + update_rq_clock(rq);
> + update_cfs_rq_blocked_load(cfs_rq, 0);
> + raw_spin_unlock_irqrestore(&rq->lock, flags);
> + }
> se->avg.decay_count = -__synchronize_entity_decay(se);
> atomic_long_add(se->avg.load_avg_contrib,
> &cfs_rq->removed_load);
> @@ -6507,6 +6527,11 @@ static struct {
> unsigned long next_balance; /* in jiffy units */
> } nohz ____cacheline_aligned;
>
> +static int nohz_test_cpu(int cpu)
> +{
> + return cpumask_test_cpu(cpu, nohz.idle_cpus_mask);
> +}
> +
> static inline int find_new_ilb(int call_cpu)
> {
> int ilb = cpumask_first(nohz.idle_cpus_mask);
> @@ -6619,6 +6644,11 @@ static int sched_ilb_notifier(struct notifier_block *nfb,
> return NOTIFY_DONE;
> }
> }
> +#else
> +static int nohz_test_cpu(int cpu)
> +{
> + return 0;
> +}
> #endif
>
> static DEFINE_SPINLOCK(balancing);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/