Re: [PATCH 6.4 800/800] io_uring: Use io_schedule* in cqring wait

From: Andres Freund
Date: Sun Jul 23 2023 - 14:59:08 EST


Hi,

On 2023-07-23 20:06:19 +0200, Oleksandr Natalenko wrote:
> On neděle 23. července 2023 19:43:50 CEST Genes Lists wrote:
> > On 7/23/23 11:31, Jens Axboe wrote:
> > ...
> > > Just read the first one, but this is very much expected. It's now just
> > > correctly reflecting that one thread is waiting on IO. IO wait being
> > > 100% doesn't mean that one core is running 100% of the time, it just
> > > means it's WAITING on IO 100% of the time.
> > >
> >
> > Seems reasonable thank you.
> >
> > Question - do you expect the iowait to stay high for a freshly created
> > mariadb doing nothing (as far as I can tell anyway) until process
> > exited? Or Would you think it would drop in this case prior to the
> > process exiting.
> >
> > For example I tried the following - is the output what you expect?
> >
> > Create a fresh mariab with no databases - monitor the core showing the
> > iowaits with:
> >
> > mpstat -P ALL 2 100
> >
> > # rm -f /var/lib/mysql/*
> > # mariadb-install-db --user=mysql --basedir=/usr --datadir=/var/lib/mysql
> >
> > # systemctl start mariadb (iowaits -> 100%)
> >
> >
> > # iotop -bo |grep maria (shows no output, iowait stays 100%)
> >
> > (this persists until mariadb process exits)
> >
> >
> > # systemctl stop mariadb (iowait drops to 0%)
>
> This is a visible userspace behaviour change with no changes in the
> userspace itself, so we cannot just ignore it. If for some reason this is
> how it should be now, how do we explain it to MariaDB devs to get this
> fixed?

Just to confirm I understand: Your concern is how it looks in mpstat, not
performance or anything like that?

As far as I can tell, mariadb submits a bunch of IOs, which all have
completed:
...
mariadbd 438034 [000] 67593.094595: io_uring:io_uring_submit_req: ring 0xffff8887878ac800, req 0xffff888929df2400, user_data 0x55d5eaf29488, opcode READV, flags 0x0, sq_thread 0
mariadbd 438034 [000] 67593.094604: io_uring:io_uring_submit_req: ring 0xffff8887878ac800, req 0xffff888929df2500, user_data 0x55d5eaf29520, opcode READV, flags 0x0, sq_thread 0
mariadbd 438034 [000] 67593.094690: io_uring:io_uring_complete: ring 0xffff8887878ac800, req 0xffff888929df2400, user_data 0x55d5eaf29488, result 16384, cflags 0x0 extra1 0 extra2 0
mariadbd 438034 [000] 67593.094699: io_uring:io_uring_complete: ring 0xffff8887878ac800, req 0xffff888929df2500, user_data 0x55d5eaf29520, result 16384, cflags 0x0 extra1 0 extra2 0

Then waits for io_uring events:
mariadbd 438032 [003] 67593.095282: io_uring:io_uring_cqring_wait: ring 0xffff8887878ac800, min_events 1

There won't be any completions without further IO being submitted.

io_uring could have logic to somehow report a different state in such a case
(where there'll not be any completions before new IOs being submitted), but
that'd likely not be free.

Greetings,

Andres Freund