Re: [GIT PULL] pipe: nonblocking rw for io_uring

From: Linus Torvalds
Date: Mon Apr 24 2023 - 17:05:42 EST


On Fri, Apr 21, 2023 at 7:02 AM Christian Brauner <brauner@xxxxxxxxxx> wrote:
>
> This contains Jens' work to support FMODE_NOWAIT and thus IOCB_NOWAIT
> for pipes ensuring that all places can deal with non-blocking requests.
>
> To this end, pass down the information that this is a nonblocking
> request so that pipe locking, allocation, and buffer checking correctly
> deal with those.

Ok, I pulled this, but then I unpulled it again.

Doing conditional locking for O_NONBLOCK and friends is not ok. Yes,
it's been done, and I may even have let some slip through, but it's
just WRONG.

There is absolutely no situation where a "ok, so the lock on this data
structure was taken, we'll go to some async model" is worth it.

Every single time I've seen this, it's been some developer who thinks
that O_NONBLOCk is somehow some absolute "this cannot schedule" thing.
And every single time it's been broken and horrid crap that just made
everything more complicated and slowed things down.

If some lock wait is a real problem, then the lock needs to be just
fixed. Not "ok, let's make a non-blocking version and fall back if
it's held".

Note that FMODE_NOWAIT does not mean (and *CANNOT* mean) that you'd
somehow be able to do the IO in some atomic context anyway. Many of
our kernel locks don't even support that (eg mutexes).

So thinking that FMODE_NOWAIT is that kind of absolute is the wrong
kind of thinking entirely.

FMODE_NOWAIT should mean that no *IO* gets done. And yes, that might
mean that allocations fail too. But not this kind of "let's turn
locking into 'trylock' stuff".

The whoe flag is misnamed. It should have been FMODE_NOIO, the same
way we have IOCB_NOIO.

If you want FMODE_ATOMIC, then that is something entirely and
completely different, and is probably crazy.

We have done it in one area (RCU pathname lookup), and it was worth it
there, and it was a *huge* undertaking. It was worth it, but it was
worth it because it was a serious thing with some serious design and a
critical area.

Not this kind of "conditional trylock" garbage which just means that
people will require 'poll()' to now add the lock to the waitqueue, or
make all callers go into some "let's use a different thread instead"
logic.

Linus