Re: [PATCH RFC 2/2] io_uring: add support for getdents

From: Dominique Martinet
Date: Mon Apr 24 2023 - 06:56:05 EST


Clay Harris wrote on Mon, Apr 24, 2023 at 04:20:55AM -0500:
> > This isn't straightforward even in-kernel though: the ctx.actor callback
> > (filldir64) isn't called when we're done, so we only know we couldn't
> > fill in the buffer.
> > We could have the callback record 'buffer full' and consider we're done
> > if the buffer is full, or just single-handedly declare we are if we have
> > more than `MAXNAMLEN + sizeof(struct linux_dirent64)` left over, but I
> > assume a filesystem is allowed to return what it has readily available
> > and expect the user to come back later?
> > In which case we cannot use this as an heuristic...
> >
> > So if we do this, it'll require a way for filesystems to say they're
> > filling in as much as they can, or go the sledgehammer way of adding an
> > extra dir_context dir_context callback, either way I'm not sure I want
> > to deal with all that immediately unless I'm told all filesystems will
> > fill as much as possible without ever failing for any temporary reason
> > in the middle of iterate/iterate_shared().
>
> I don't have a complete understanding of this area, but my thought was
> not that we would look for any buffer full condition, but rather that
> an iterator could be tested for next_entry == EOF.

I'm afraid I don't see any such iterator readily available in the common
code, so that would also have to be per fs as far as I can see :/

Also some filesystems just don't have a next_entry iterator readily
available; for example on 9p the network protocol is pretty much exactly
like getdents and a readdir call is issued continuously until either
filldir64 returns an error (e.g. buffer full) or the server returned 0
(or an error); if you leave the v9fs_dir_readdir{,_dotl}() call you
loose this information immediately.
Getting this information out would be doubly appreciable because the
"final getdents" to confirm the dir has been fully read is redundant
with that previous last 9p readdir which ended the previous iteration
(could also be fixed by caching...), but that'll really require fixing
up the iterate_shared() operation to signal such a thing...

> > Call me greedy but I believe such a flag in the CQE could also be added
> > later on without any bad side effects (as it's optional to check on it
> > to stop calling early and there's no harm in not setting it)?
>
> Certainly it could be added later, but I wanted to make sure some thought
> was put into it now. It would be nice to have it sooner rather than later
> though...

Yes, if there's an easy way forward I overlooked I'd be happy to spend a
bit more time on it; and discussing never hurts.
In for a penny, in for a pound :)

--
Dominique Martinet | Asmadeus