Re: [PATCH v7 1/8] fs: introduce kernel_pread_file* support

From: Scott Branden
Date: Tue Jun 09 2020 - 18:55:36 EST


Hi Matthew,

On 2020-06-09 6:21 a.m., Matthew Wilcox wrote:
On Mon, Jun 08, 2020 at 03:29:22PM -0700, Scott Branden wrote:
Hi Matthew,

I am requesting the experts in the filesystem subsystem to come to a
consensus here.
This is not my area of expertise at all but every time I have addressed all
of the
outstanding concerns someone else comes along and raises another one.
I appreciate it's frustrating for you, but this is the nature of
patch review. I haven't even read the first five or so submissions.
I can see them in my inbox and they look like long threads. I'm not
particularly inclined to read them. I happened to read v6, and reacted
to the API being ugly.
Thanks for the review. Yes, I do see the enum being ugly now
and have removed it in v8 of the patch. Hopefully it addresses
your concerns. More comments below.

Please see me comments below.

On 2020-06-06 8:52 a.m., Matthew Wilcox wrote:
On Fri, Jun 05, 2020 at 10:04:51PM -0700, Scott Branden wrote:
-int kernel_read_file(struct file *file, void **buf, loff_t *size,
- loff_t max_size, enum kernel_read_file_id id)
-{
- loff_t i_size, pos;
Please note that how checkpatch generated the diff here. The code
modifications
below are for a new function kernel_pread_file, they do not modify the
existing API
kernel_read_file. kernel_read_file requests the ENTIRE file is read. So we
need to be
able to differentiate whether it is ok to read just a portion of the file or
not.
You've gone about this in entirely the wrong way though. This enum to
read the entire file or a partial is just bad design.
Your point on the enum is valid.
I've removed it from design. Hopefully it is cleaner now.

+int kernel_pread_file(struct file *file, void **buf, loff_t *size,
+ loff_t pos, loff_t max_size,
+ enum kernel_pread_opt opt,
+ enum kernel_read_file_id id)
So, to share common code a new kernel_pread_opt needed to be added in order
to specify whether
it was ok to read a partial file or not, and provide an offset into the file
where to begin reading.
The meaning of parameters doesn't change in the bonkers API. max_size still
means max size, etc.
These options are needed so common code can be shared with kernel_read_file
api.
Does pread() in userspace take seven parameters? No. It takes four.
What you're doing is taking all the complexity of all of the interfaces
and stuffing it all down into the bottom function instead of handling
some of the complexity in the wrapper functions. For example, you
could support the functionality of 'max_size' in kernel_read_file()
and leave it out of the kernel_pread_file() interface.
I have removed the enum necessary in the kernel pread call now,
so it is down to 6.
The other 2 parameters are necessary as they are in kernel read.

max_size makes no sense to remove - it serves the same purpose
as in userspace pread and read functions. To specify the max size
to read.
I think what we actually want is:

ssize_t vmap_file_range(struct file *, loff_t start, loff_t end, void **bufp);
void vunmap_file_range(struct file *, void *buf);

If end > i_size, limit the allocation to i_size. Returns the number
of bytes allocated, or a negative errno. Writes the pointer allocated
to *bufp. Internally, it should use the page cache to read in the pages
(taking appropriate reference counts). Then it maps them using vmap()
instead of copying them to a private vmalloc() array.
kernel_read_file() can be converted to use this API. The users will
need to be changed to call kernel_read_end(struct file *file, void *buf)
instead of vfree() so it can call allow_write_access() for them.

vmap_file_range() has a lot of potential uses. I'm surprised we don't
have it already, to be honest.
Such a change sounds like it could be done in a later patch series.
It's an incomplete solution. It would work for some of the needed
operations but not others.
For kernel_read_file, I don't see how in your new API it indicates if the
end of the file was reached or not.
That's the point. It doesn't. If a caller needs that, then they can
figure that out themselves.
No, they can't. The caller only calls kernel_read_file once and expects
the whole file to be read. The kernel_read_file doesn't work like userspace.
There is no tracking like userspace of where in the file you read?

Also, please note that buffers may be preallocated and shouldn't be freed
by the kernel in some cases and
allocated and freed by the kernel in others.
You're trying to build the swiss army knife of functions. Swiss army
knives are useful, but they're no good for carving a steak.
Hopefully I'm carving steak now.
I would like the experts here to decide on what needs to be done so we can
move forward
and get kernel_pread_file support added soon.
You know, you haven't even said _why_ you want this. The cover letter
just says "I want this", and doesn't say why it's needed.
Cover letter updated.

Thanks,
Scott