Re: [PATCH 11/14] sched/kcpustat: Introduce vtime-aware kcpustat accessor for CPUTIME_SYSTEM

From: Peter Zijlstra
Date: Thu Oct 24 2019 - 07:51:03 EST


On Wed, Oct 16, 2019 at 04:56:57AM +0200, Frederic Weisbecker wrote:

> +static int kcpustat_field_vtime(u64 *cpustat,
> + struct vtime *vtime,
> + enum cpu_usage_stat usage,
> + int cpu, u64 *val)
> +{
> + unsigned int seq;
> + int err;
> +
> + do {
> + seq = read_seqcount_begin(&vtime->seqcount);
> +
> + /*
> + * We raced against context switch, fetch the
> + * kcpustat task again.
> + */
> + if (vtime->cpu != cpu && vtime->cpu != -1) {
> + err = -EAGAIN;
> + continue;

Did that want to be break?

> + }
> +
> + /*
> + * Two possible things here:
> + * 1) We are seeing the scheduling out task (prev) or any past one.
> + * 2) We are seeing the scheduling in task (next) but it hasn't
> + * passed though vtime_task_switch() yet so the pending
> + * cputime of the prev task may not be flushed yet.
> + *
> + * Case 1) is ok but 2) is not. So wait for a safe VTIME state.
> + */
> + if (vtime->state == VTIME_INACTIVE) {
> + err = -EAGAIN;
> + continue;

Idem.

If so, you can do return -EAGAIN here, and return 0 at the end and get
rid of err.

Also, if you're spin-waiting here, there should probably be a
cpu_relax() before the return -EAGAIN.

And in case that is so, you probably want the rcu_read_lock() section
below _inside_ the do{}while loop, such that the RCU section doesn't
cover the entire spin-wait.

> + }
> +
> + err = 0;
> +
> + *val = cpustat[usage];
> +
> + if (vtime->state == VTIME_SYS)
> + *val += vtime->stime + vtime_delta(vtime);
> +
> + } while (read_seqcount_retry(&vtime->seqcount, seq));
> +
> + return err;
> +}
> +
> +u64 kcpustat_field(struct kernel_cpustat *kcpustat,
> + enum cpu_usage_stat usage, int cpu)
> +{
> + u64 val;
> + int err;
> + u64 *cpustat = kcpustat->cpustat;
> +
> + if (!vtime_accounting_enabled_cpu(cpu))
> + return cpustat[usage];
> +
> + /* Only support sys vtime for now */
> + if (usage != CPUTIME_SYSTEM)
> + return cpustat[usage];
> +
> + rcu_read_lock();
> +
> + do {
> + struct rq *rq = cpu_rq(cpu);
> + struct task_struct *curr;
> + struct vtime *vtime;
> +
> + curr = rcu_dereference(rq->curr);

This is indeed safe now (relies on commit

5311a98fef7d ("tasks, sched/core: RCUify the assignment of rq->curr")

)

> + if (WARN_ON_ONCE(!curr)) {
> + val = cpustat[usage];
> + break;
> + }
> +
> + vtime = &curr->vtime;
> + err = kcpustat_field_vtime(cpustat, vtime, usage, cpu, &val);
> + } while (err == -EAGAIN);
> +
> + rcu_read_unlock();
> +
> + return val;
> +}
> +
> #endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
> --
> 2.23.0
>