Re: [RFC PATCH v2 0/2] sched/fair migration reduction features

From: Mathieu Desnoyers
Date: Mon Nov 06 2023 - 11:32:00 EST

Next message: Alexander Lobakin: "Re: [alobakin:pfcp 11/19] include/linux/bitmap.h:642:17: warning: array subscript [1, 1024] is outside array bounds of 'long unsigned int[1]'"
Previous message: Rafael J. Wysocki: "[PATCH v3 0/7] ACPI: scan: MIPI DisCo for Imaging support"
In reply to: K Prateek Nayak: "Re: [RFC PATCH v2 0/2] sched/fair migration reduction features"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2023-10-26 23:27, K Prateek Nayak wrote:
[...]

--
It is a mixed bag of results, as expected. I would love to hear your
thoughts on the results. Meanwhile, I'll try to get some more data
from other benchmarks.

I suspect that workloads that exhibit a client-server (1:1) pairing pattern are hurt by the bias towards leaving tasks on their prev runqueue: they benefit from moving both client/server tasks as close as possible so they share either the same core or a common cache.

The hackbench workload is also client-server, but there are N-client and N-server threads, creating a N:N relationship which really does not work well when trying to pull tasks on sync wakeup: tasks then bounce all over the place.

It's tricky though. If we try to fix the "1:1" client-server pattern with a heuristic, we may miss scenarios which are close to 1:1 but don't exactly match.

I'm working on a rewrite of select_task_rq_fair, with the aim to tackle the more general task placement problem taking into account the following:

- We want to converge towards a task placement that moves tasks with
most waker/wakee interactions as close as possible in the cache
topology,
- We can use the core util_est/capacity metrics to calculate whether we
have capacity left to enqueue a task in a core's runqueue.
- The underlying assumption is that work conserving [1] is not a good
characteristic to aim for, because it does not take into account the
overhead associated with migrations, and thus lack of cache locality.

Thanks,

Mathieu

[1] I use the definition of "work conserving" found here:
https://people.ece.ubc.ca/sasha/papers/eurosys16-final29.pdf

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Next message: Alexander Lobakin: "Re: [alobakin:pfcp 11/19] include/linux/bitmap.h:642:17: warning: array subscript [1, 1024] is outside array bounds of 'long unsigned int[1]'"
Previous message: Rafael J. Wysocki: "[PATCH v3 0/7] ACPI: scan: MIPI DisCo for Imaging support"
In reply to: K Prateek Nayak: "Re: [RFC PATCH v2 0/2] sched/fair migration reduction features"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]