Re: [PATCH v2 6/7] sched: Shard per-LLC shared runqueues

From: Peter Zijlstra
Date: Tue Jul 11 2023 - 06:50:37 EST


On Mon, Jul 10, 2023 at 03:03:41PM -0500, David Vernet wrote:

> +struct shared_runq_shard {
> struct list_head list;
> spinlock_t lock;
> } ____cacheline_aligned;
>
> +struct shared_runq {
> + u32 num_shards;
> + struct shared_runq_shard shards[];
> +} ____cacheline_aligned;
> +
> +/* This would likely work better as a configurable knob via debugfs */
> +#define SHARED_RUNQ_SHARD_SZ 6
> +
> #ifdef CONFIG_SMP
> static struct shared_runq *rq_shared_runq(struct rq *rq)
> {
> return rq->cfs.shared_runq;
> }
>
> -static struct task_struct *shared_runq_pop_task(struct rq *rq)
> +static struct shared_runq_shard *rq_shared_runq_shard(struct rq *rq)
> +{
> + return rq->cfs.shard;
> +}
> +
> +static int shared_runq_shard_idx(const struct shared_runq *runq, int cpu)
> +{
> + return cpu % runq->num_shards;

I would suggest either:

(cpu >> 1) % num_shards

or keeping num_shards even, to give SMT siblings a fighting chance to
hit the same bucket.

(I've no idea how SMT4 (or worse SMT8) is typically enumerated, so
someone from the Power/Sparc/MIPS world would have to go play with that
if they so care)

> +}

> + num_shards = max(per_cpu(sd_llc_size, i) /
> + SHARED_RUNQ_SHARD_SZ, 1);

> + shared_runq->num_shards = num_shards;