Re: fuse uring / wake_up on the same core

From: Bernd Schubert
Date: Thu Apr 27 2023 - 09:36:05 EST

Next message: Christophe JAILLET: "[PATCH] mISDN: Use list_count_nodes()"
Previous message: Oliver Neukum: "[PATCH 8/8] usb_pcwd: remove superfluous usb_device pointer"
In reply to: Bernd Schubert: "Re: fuse uring / wake_up on the same core"
Next in thread: Bernd Schubert: "Re: fuse uring / wake_up on the same core"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 4/27/23 14:24, Hillf Danton wrote:
> On 26 Apr 2023 22:40:32 +0000 Bernd Schubert <bschubert@xxxxxxx>
>> My issue is now that these patches are not enough and contrary to
>> previous testing, forcefully disabling cpu migration using
>> migrate_disable() before wait_event_* in fuse's request_wait_answer()
>> and enabling it after does not help either - my process to create files
>> (bonnie++) somewhere migrates to another cpu at a later time.
>
> Less than 2 migrates every ten minutes?

The test does not run that long... kind of migrate immediately,
I think in less than a second.

>
>> The only workaround I currently have is to set the ring thread
>> processing vfs/fuse events in userspace to SCHED_IDLE. In combination
>> with WF_CURRENT_CPU performance then goes from ~2200 to ~9000 file
>> creates/s for a single thread in the latest branch (should be scalable).
>> Which is very close to binding the bonnie++ process to a single core
>> (~9400 creates/s).
>
> The scheduler is good at dispatching tasks to CPUs at least, and it works
> better with userspace hints as both Prateek and Andrei's works propose. 9400
> shows positive feedback from kernel, and the question is, is it feasible
> in your production environment to set CPU affinity? If yes, what else do
> you want?

Well, this is the fuse file system - each and every user would need to do that
and get core affinity right. I'm personally not setting core affinity for
any 'cp' or 'rsync' I'm doing.

Btw, a very hackish way to 'solve' the issue is this

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index cd7aa679c3ee..dd32effb5010 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -373,6 +373,26 @@ static void request_wait_answer(struct fuse_req *req)
int err;
int prev_cpu = task_cpu(current);

+ /* When running over uring and core affined userspace threads, we
+ * do not want to let migrate away the request submitting process.
+ * Issue is that even after waking up on the right core, processes
+ * that have submitted requests might get migrated away, because
+ * the ring thread is still doing a bit of work or is in the process
+ * to go to sleep. Assumption here is that processes are started on
+ * the right core (i.e. idle cores) and can then stay on that core
+ * when they come and do file system requests.
+ * Another alternative way is to set SCHED_IDLE for ring threads,
+ * but that would have an issue if there are other processes keeping
+ * the cpu busy.
+ * SCHED_IDLE or this hack here result in about factor 3.5 for
+ * max meta request performance.
+ *
+ * Ideal would to tell the scheduler that ring threads are not disturbing
+ * that migration away from it should very very rarely happen.
+ */
+ if (fc->ring.ready)
+ migrate_disable();
+
if (!fc->no_interrupt) {
/* Any signal may interrupt this */
err = wait_event_interruptible(req->waitq,

So it disables migration and never re-enables it...
I'm still continuing to digg if there is a better way, any
hints are very welcome.

Thanks,
Bernd

Next message: Christophe JAILLET: "[PATCH] mISDN: Use list_count_nodes()"
Previous message: Oliver Neukum: "[PATCH 8/8] usb_pcwd: remove superfluous usb_device pointer"
In reply to: Bernd Schubert: "Re: fuse uring / wake_up on the same core"
Next in thread: Bernd Schubert: "Re: fuse uring / wake_up on the same core"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]