Re: [RFC] Extend mwait idle to optimize away IPIs when possible

From: Venki Pallipadi
Date: Wed Feb 08 2012 - 18:28:44 EST


On Tue, Feb 7, 2012 at 10:51 PM, Yong Zhang <yong.zhang0@xxxxxxxxx> wrote:
> On Mon, Feb 06, 2012 at 12:42:13PM -0800, Venkatesh Pallipadi wrote:
>> smp_call_function_single and ttwu_queue_remote sends unconditional IPI
>> to target CPU. However, if the target CPU is in mwait based idle, we can
>> do IPI-less wakeups using the magical powers of monitor-mwait.
>> Doing this has certain advantages:
>
> Actually I'm trying to do the similar thing on MIPS.
>
> The difference is that I want task_is_polling() to do something. The basic
> idea is:
>
>> +                     if (ipi_pending()) {
>> +                             clear_ipi_pending();
>> +                             local_bh_disable();
>> +                             local_irq_disable();
>> +                             generic_smp_call_function_single_interrupt();
>> +                             scheduler_wakeup_self_check();
>> +                             local_irq_enable();
>> +                             local_bh_enable();
>
> I let cpu_idle() check if there is anything to do as your above code.
>
> And task_is_polling() handle the others with below patch:
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 5255c9d..09f633d 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -527,15 +527,16 @@ void resched_task(struct task_struct *p)
>                smp_send_reschedule(cpu);
>  }
>
> -void resched_cpu(int cpu)
> +int resched_cpu(int cpu)
>  {
>        struct rq *rq = cpu_rq(cpu);
>        unsigned long flags;
>
>        if (!raw_spin_trylock_irqsave(&rq->lock, flags))
> -               return;
> +               return 0;
>        resched_task(cpu_curr(cpu));
>        raw_spin_unlock_irqrestore(&rq->lock, flags);
> +       return 1;
>  }

Two points -
rq->lock: I tried something similar first. One hurdle with checking
task_is_polling() is that you need rq->lock to check it. And adding
lock+unlock (without wait) in wakeup path ended up being no net gain
compared to IPI. And when we actually end up spinning on that lock,
thats going to add overhead in the common path. That is the reason I
switched to atomic compare exchange and moving any wait onto the
target CPU coming out of idle.

resched_task: ttwu_queue_remote() does not imply that the remote CPU
will do a resched. Today there is a IPI and IPI handler calls onto
check_preempt_wakeup() and if the current task has higher precedence
than the waking up task, then there will be just an activation of new
task and no resched. Using resched_task above breaks
check_preempt_wakeup() and always calls a resched on remote CPU after
the IPI, which would be change in behavior.

Thanks,
Venki

>
>  #ifdef CONFIG_NO_HZ
> @@ -1484,7 +1485,8 @@ void scheduler_ipi(void)
>
>  static void ttwu_queue_remote(struct task_struct *p, int cpu)
>  {
> -       if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list))
> +       if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list) &&
> +                                               !resched_cpu(cpu))
>                smp_send_reschedule(cpu);
>  }
>
> Thought?
>
> Thanks,
> Yong
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/