Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

From: Viresh Kumar
Date: Tue May 29 2018 - 05:32:30 EST


Hi Daniel,

Thanks for yet another version :)

On 25-05-18, 11:49, Daniel Lezcano wrote:
> +++ b/drivers/powercap/idle_injection.c
> +static void idle_injection_wakeup(struct idle_injection_device *ii_dev)
> +{
> + struct idle_injection_thread *iit;
> + int cpu;
> +
> + for_each_cpu_and(cpu, ii_dev->cpumask, cpu_online_mask) {
> + iit = per_cpu_ptr(&idle_injection_thread, cpu);
> + iit->should_run = 1;
> + wake_up_process(iit->tsk);
> + }
> +}
> +
> +/**
> + * idle_injection_wakeup_fn - idle injection timer callback
> + * @timer: a hrtimer structure
> + *
> + * This function is called when the idle injection timer expires which
> + * will wake up the idle injection tasks and these ones, in turn, play
> + * idle a specified amount of time.
> + *
> + * Return: HRTIMER_NORESTART.
> + */
> +static enum hrtimer_restart idle_injection_wakeup_fn(struct hrtimer *timer)
> +{
> + struct idle_injection_device *ii_dev =
> + container_of(timer, struct idle_injection_device, timer);
> +
> + idle_injection_wakeup(ii_dev);
> +
> + return HRTIMER_NORESTART;
> +}
> +
> +/**
> + * idle_injection_fn - idle injection routine
> + * @cpu: the CPU number the tasks belongs to
> + *
> + * The idle injection routine will stay idle the specified amount of
> + * time
> + */
> +static void idle_injection_fn(unsigned int cpu)
> +{
> + struct idle_injection_device *ii_dev;
> + struct idle_injection_thread *iit;
> + int run_duration_ms, idle_duration_ms;
> +
> + ii_dev = per_cpu(idle_injection_device, cpu);
> +
> + if (WARN_ON_ONCE(!ii_dev))

Yes, this is marked as "unlikely" and is kind of not that harmful, but
I would suggest to just drop it and let the kernel crash if that
serious of a bug is present in this code where ii_dev can be NULL
here.

> + return;
> +
> + iit = per_cpu_ptr(&idle_injection_thread, cpu);

See, we don't check this one, why check only ii_dev ? :)

> +
> + /*
> + * Boolean used by the smpboot main loop and used as a
> + * flip-flop in this function
> + */
> + iit->should_run = 0;
> +
> + atomic_inc(&ii_dev->count);
> +
> + idle_duration_ms = atomic_read(&ii_dev->idle_duration_ms);
> + if (idle_duration_ms)
> + play_idle(idle_duration_ms);
> +
> + /*
> + * The last CPU waking up is in charge of setting the timer. If
> + * the CPU is hotplugged, the timer will move to another CPU
> + * (which may not belong to the same cluster) but that is not a
> + * problem as the timer will be set again by another CPU
> + * belonging to the cluster. This mechanism is self adaptive.
> + */

I am afraid that the above comment may not be completely true all the
time. For a quad-core platform, it is possible for 3 CPUs (0,1,2) to
run this function as soon as the kthread is woken up, but one of the
CPUs (3) may be stuck servicing an IRQ, Deadline or RT activity.
Because you do atomic_inc() also in this function (above) itself,
below decrement may return a true value for the CPU2 and that will
restart the hrtimer, while one of the CPUs never got a chance to
increment count in the first place.

The fix is simple though, do the increment in idle_injection_wakeup()
and things should be fine then.

> + if (!atomic_dec_and_test(&ii_dev->count))
> + return;
> +
> + run_duration_ms = atomic_read(&ii_dev->run_duration_ms);
> + if (run_duration_ms) {
> + hrtimer_start(&ii_dev->timer, ms_to_ktime(run_duration_ms),
> + HRTIMER_MODE_REL_PINNED);
> + return;
> + }
> +
> + complete(&ii_dev->stop_complete);

So you call complete() after hrtimer is potentially restarted. This
can happen if idle_injection_stop() is called right after the above
atomic_read() has finished :)

IOW, this doesn't look safe now as well.

> +}

> +/**
> + * idle_injection_stop - stops the idle injections
> + * @ii_dev: a pointer to an idle injection_device structure
> + *
> + * The function stops the idle injection by resetting the idle and
> + * running durations and wait for the threads to complete. If we are
> + * in the process of injecting an idle cycle, then this will wait the
> + * end of the cycle.
> + */
> +void idle_injection_stop(struct idle_injection_device *ii_dev)
> +{
> + pr_debug("Stopping injecting idle cycles on CPUs '%*pbl'\n",
> + cpumask_pr_args(ii_dev->cpumask));
> +
> + init_completion(&ii_dev->stop_complete);

This looks completely Borken (yeah, broken :)). complete() may be
running in parallel under spinlock and updating x->done while you just
set it to 0 here without any locking in place. init_completion()
should be used only once after ii_dev is allocated and I don't see
that being done either, so that looks incorrect as well.

> +
> + idle_injection_set_duration(ii_dev, 0, 0);
> +
> + wait_for_completion_interruptible(&ii_dev->stop_complete);
> +}
> +
> +/**
> + * idle_injection_setup - initialize the current task as a RT task
> + * @cpu: the CPU number where the kthread is running on (not used)
> + *
> + * Called one time, this function is in charge of setting the task
> + * scheduler parameters.
> + */
> +static void idle_injection_setup(unsigned int cpu)
> +{
> + struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO / 2 };
> +
> + set_freezable();
> +
> + sched_setscheduler(current, SCHED_FIFO, &param);
> +}
> +
> +/**
> + * idle_injection_should_run - function helper for the smpboot API
> + * @cpu: the CPU number where the kthread is running on
> + *
> + * Return: a boolean telling if the thread can run.
> + */
> +static int idle_injection_should_run(unsigned int cpu)
> +{
> + struct idle_injection_thread *iit =
> + per_cpu_ptr(&idle_injection_thread, cpu);
> +
> + return iit->should_run;
> +}
> +
> +static struct idle_injection_device *ii_dev_alloc(void)
> +{
> + struct idle_injection_device *ii_dev;
> +
> + ii_dev = kzalloc(sizeof(*ii_dev), GFP_KERNEL);
> + if (!ii_dev)
> + return NULL;
> +
> + if (!alloc_cpumask_var(&ii_dev->cpumask, GFP_KERNEL)) {
> + kfree(ii_dev);
> + return NULL;
> + }
> +
> + return ii_dev;
> +}
> +
> +static void ii_dev_free(struct idle_injection_device *ii_dev)
> +{
> + free_cpumask_var(ii_dev->cpumask);
> + kfree(ii_dev);
> +}
> +
> +/**
> + * idle_injection_register - idle injection init routine
> + * @cpumask: the list of CPUs managed by the idle injection device
> + *
> + * This is the initialization function in charge of creating the
> + * initializing of the timer and allocate the structures. It does not
> + * starts the idle injection cycles.
> + *
> + * Return: NULL if an allocation fails.
> + */
> +struct idle_injection_device *idle_injection_register(struct cpumask *cpumask)
> +{
> + struct idle_injection_device *ii_dev;
> + int cpu, cpu2;
> +
> + ii_dev = ii_dev_alloc();
> + if (!ii_dev)
> + return NULL;
> +
> + cpumask_copy(ii_dev->cpumask, cpumask);
> + hrtimer_init(&ii_dev->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> + ii_dev->timer.function = idle_injection_wakeup_fn;
> +
> + for_each_cpu(cpu, ii_dev->cpumask) {
> +
> + if (per_cpu(idle_injection_device, cpu)) {

Maybe add unlikely here ?

> + pr_err("cpu%d is already registered\n", cpu);
> + goto out_rollback_per_cpu;
> + }
> +
> + per_cpu(idle_injection_device, cpu) = ii_dev;
> + }
> +
> + return ii_dev;
> +
> +out_rollback_per_cpu:
> + for_each_cpu(cpu2, ii_dev->cpumask) {
> + if (cpu == cpu2)

And is this really required? Perhaps this conditional is making it
worse and just setting the per-cpu variable forcefully to NULL would
be much faster :)

Maybe move this for-loop into iid_dev_free() and you can make this
look simple then.

> + break;
> +
> + per_cpu(idle_injection_device, cpu2) = NULL;
> + }
> +
> + ii_dev_free(ii_dev);
> +
> + return NULL;
> +}

--
viresh