Re: [PATCH RFC 2/2] io_uring: add support for getdents

From: Dominique Martinet
Date: Mon Apr 24 2023 - 04:41:51 EST


Thanks!

Clay Harris wrote on Mon, Apr 24, 2023 at 02:29:46AM -0500:
> This also seems like a good place to bring up a point I made with
> the last attempt at this code. You're missing an optimization here.
> getdents knows whether it is returning a buffer because the next entry
> won't fit versus because there are no more entries. As it doesn't
> return that information, callers must always keep calling it back
> until EOF. This means a completely unnecessary call is made for
> every open directory. In other words, for a directory scan where
> the buffers are large enough to not overflow, that literally twice
> as many calls are made to getdents as necessary. As io_uring is
> in-kernel, it could use an internal interface to getdents which would
> return an EOF indicator along with the (probably non-empty) buffer.
> io_uring would then return that flag with the CQE.

Sorry I didn't spot that comment in the last iteration of the patch,
that sounds interesting.

This isn't straightforward even in-kernel though: the ctx.actor callback
(filldir64) isn't called when we're done, so we only know we couldn't
fill in the buffer.
We could have the callback record 'buffer full' and consider we're done
if the buffer is full, or just single-handedly declare we are if we have
more than `MAXNAMLEN + sizeof(struct linux_dirent64)` left over, but I
assume a filesystem is allowed to return what it has readily available
and expect the user to come back later?
In which case we cannot use this as an heuristic...

So if we do this, it'll require a way for filesystems to say they're
filling in as much as they can, or go the sledgehammer way of adding an
extra dir_context dir_context callback, either way I'm not sure I want
to deal with all that immediately unless I'm told all filesystems will
fill as much as possible without ever failing for any temporary reason
in the middle of iterate/iterate_shared().
Call me greedy but I believe such a flag in the CQE could also be added
later on without any bad side effects (as it's optional to check on it
to stop calling early and there's no harm in not setting it)?


> (* As an aside, the only place I've ever seen a non-zero lseek on a
> directory, is in a very resource limited environment, e.g. too small
> open files limit. In the case of a depth-first directory scan, it
> must close directories before completely reading them, and reopen /
> lseek to their previous position in order to continue. This scenario
> is certainly not worth bothering with for io_uring.)

(I also thought of userspace NFS/9P servers are these two at least get
requests from clients with an arbitrary offset, but I'll be glad to
forget about them for now...)

--
Dominique Martinet | Asmadeus