Re: [PATCH] io_uring: Use io_schedule* in cqring wait

From: Pavel Begunkov
Date: Mon Jul 24 2023 - 15:23:52 EST


On 7/24/23 16:58, Jens Axboe wrote:
On 7/24/23 9:50?AM, Jens Axboe wrote:
On 7/24/23 9:48?AM, Greg KH wrote:
On Mon, Jul 24, 2023 at 04:35:43PM +0100, Phil Elwell wrote:
Hi Andres,

With this commit applied to the 6.1 and later kernels (others not
tested) the iowait time ("wa" field in top) in an ARM64 build running
on a 4 core CPU (a Raspberry Pi 4 B) increases to 25%, as if one core
is permanently blocked on I/O. The change can be observed after
installing mariadb-server (no configuration or use is required). After
reverting just this commit, "wa" drops to zero again.

This has been discussed already:
https://lore.kernel.org/r/12251678.O9o76ZdvQC@xxxxxxxxxxxxxx

It's not a bug, mariadb does have pending I/O, so the report is correct,
but the CPU isn't blocked at all.

Indeed - only thing I can think of is perhaps mariadb is having a
separate thread waiting on the ring in perpetuity, regardless of whether
or not it currently has IO.

But yes, this is very much ado about nothing...

Current -git and having mariadb idle:

Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
Average: all 0.00 0.00 0.04 12.47 0.04 0.00 0.00 0.00 0.00 87.44
Average: 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Average: 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Average: 2 0.00 0.00 0.00 0.00 0.33 0.00 0.00 0.00 0.00 99.67
Average: 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Average: 4 0.00 0.00 0.33 0.00 0.00 0.00 0.00 0.00 0.00 99.67
Average: 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Average: 6 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00

which is showing 100% iowait on one cpu, as mariadb has a thread waiting
on IO. That is obviously a valid use case, if you split submission and
completion into separate threads. Then you have the latter just always
waiting on something to process.

With the suggested patch, we do eliminate that case and the iowait on
that task is gone. Here's current -git with the patch and mariadb also
running:

09:53:49 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
09:53:50 AM all 0.00 0.00 0.00 0.00 0.00 0.75 0.00 0.00 0.00 99.25
09:53:50 AM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:53:50 AM 1 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00
09:53:50 AM 2 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00
09:53:50 AM 3 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00
09:53:50 AM 4 0.00 0.00 0.00 0.00 0.00 0.99 0.00 0.00 0.00 99.01
09:53:50 AM 5 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00
09:53:50 AM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:53:50 AM 7 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00


Even though I don't think this is an actual problem, it is a bit
confusing that you get 100% iowait while waiting without having IO
pending. So I do think the suggested patch is probably worthwhile
pursuing. I'll post it and hopefully have Andres test it too, if he's
available.

Emmm, what's the definition of the "IO" state? Unless we can say what exactly
it is there will be no end to adjustments, because I can easily argue that
CQ waiting by itself is IO.
Do we consider sleep(N) to be "IO"? I don't think the kernel uses io
schedule around that, and so it'd be different from io_uring waiting for
a timeout request. What about epoll waiting, etc.?

--
Pavel Begunkov