Re: [RFC PATCH v6] sched: Fix performance regression introduced by mm_cid

From: Mathieu Desnoyers
Date: Fri Apr 14 2023 - 10:11:27 EST

Next message: Peter Xu: "Re: [PATCH 1/6] mm/hugetlb: Fix uffd-wp during fork()"
Previous message: Matthieu Baerts: "[PATCH net-next 5/5] mptcp: fastclose msk when cleaning unaccepted sockets"
In reply to: Aaron Lu: "Re: [RFC PATCH v6] sched: Fix performance regression introduced by mm_cid"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2023-04-14 10:07, Aaron Lu wrote:

On Thu, Apr 13, 2023 at 06:33:56PM -0400, Mathieu Desnoyers wrote:

Introduce per-mm/cpu current concurrency id (mm_cid) to fix a PostgreSQL
sysbench regression reported by Aaron Lu.

Keep track of the currently allocated mm_cid for each mm/cpu rather than
freeing them immediately on context switch. This eliminates most atomic
operations when context switching back and forth between threads
belonging to different memory spaces in multi-threaded scenarios (many
processes, each with many threads). The per-mm/per-cpu mm_cid values are
serialized by their respective runqueue locks.

Thread migration is handled by introducing a task-work executed
periodically, similarly to NUMA work, which delays reclaim of cid
values when they are unused for a period of time.

Keep track of the allocation time for each per-cpu cid, and let the task
work clear them when they are observed to be older than
SCHED_MM_CID_PERIOD_NS and unused.

This fix is going for a task-work and delayed reclaim approach rather
than adding hooks to migrate-from and migrate-to because migration
happens to be a hot path for various real-world workloads.

Because we want to ensure the mm_cid converges towards the smaller
values as migrations happen, the prior optimization that was done when
context switching between threads belonging to the same mm is removed,
because it could delay the lazy release of the destination runqueue
mm_cid after it has been replaced by a migration. Removing this prior
optimization is not an issue performance-wise because the introduced
per-mm/per-cpu mm_cid tracking also covers this more specific case.

I was wondering, if a thread was migrated to all possible cpus in the
SCHED_MM_CID_PERIOD_NS window, its mm_cidmask will be full. For user
space, if cid can be the full set of cpus, then it will have to prepare
storage for the full set. Then what's the point of doing compaction? Or
do I understand it wrong?

Yes, that's a limit of this approach I am aware of. I'm currently trying to combine the best parts of v5 and v6 together to add back a low overhead migration hook that will preserve the compactness in those migration scenarios.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Next message: Peter Xu: "Re: [PATCH 1/6] mm/hugetlb: Fix uffd-wp during fork()"
Previous message: Matthieu Baerts: "[PATCH net-next 5/5] mptcp: fastclose msk when cleaning unaccepted sockets"
In reply to: Aaron Lu: "Re: [RFC PATCH v6] sched: Fix performance regression introduced by mm_cid"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]