Re: [PATCH v5 2/3] fs/proc/task_mmu: Implement IOCTL to get and/or the clear info about PTEs

From: Michał Mirosław
Date: Tue Nov 08 2022 - 11:00:33 EST


On Tue, 8 Nov 2022 at 15:25, Muhammad Usama Anjum
<usama.anjum@xxxxxxxxxxxxx> wrote:
>
> Hi Michał,
>
> Thank you so much for reviewing.
>
> On 11/7/22 5:26 PM, Michał Mirosław wrote:
> >> +
> >> +/*
> >> + * struct page_region - Page region with bitmap flags
> >> + * @start: Start of the region
> >> + * @len: Length of the region
> >> + * bitmap: Bits sets for the region
> >> + */
> >> +struct page_region {
> >> + __u64 start;
> >> + __u64 len;
> >> + __u32 bitmap;
> >> + __u32 __reserved;
> >
> > "u64 flags"? If an extension is needed it would already require a new
> > ioctl or something in the `arg` struct.
> I feel like the masks must have the same type as this bitmap variable as
> the return_mask specifies the flags to be returned in bitmap. All the
> masks are of type __u32. This is why I'd kept the bitmap of type _u32 as
> well. I've kept them of 32 bit size as currently we are adding support
> for 4 flags and there is still room to add 28 more bits in the future.
> Do you still think that I should update the masks and bitmap to _u64?

I agree that the `bitmap` (I'd rather call it `flags` though) should
have the type matching the masks in the request. But the size I'm not
sure about if u32 is enough compared to what is used (or will be in
the future) for page flags in MM code. I suppose the ioctl() is not
expected to be a fast path, so I would go with u64 and assume that
overhead of the extra bytes read by the kernel won't matter.

> >> + * @start: Starting address of the region
> >> + * @len: Length of the region (All the pages in this length are included)
> >> + * @vec: Address of page_region struct array for output
> >> + * @vec_len: Length of the page_region struct array
> >> + * @max_pages: Optional max return pages (It must be less than vec_len if specified)
> >
> > I think we discussed that this is not counting the same things as
> > vec_len, so there should not be a reference between the two. The limit
> > is whatever fits under both conditions (IOW: n_vecs <= vec_len &&
> > (!max_pages || n_pages <= max_pages).
> In worse case when pages cannot be folded into the page_region, the one
> page_region may have information of only one page. This is why I've
> compared them. I want to communicate to the user that if max_pages is
> used, the vec_len should be of equal or greater size (to cater worse
> case which can happen at any time). Otherwise in worse case, the api can
> return without finding the max_pages number of pages. I don't know how
> should I put this in the comment.

I'm not sure you need to, as this conclusion follows from the range vs
page distinction.
A user who wants to cater for the worst case will provide big-enough
`vec` array, but another, who might be memory-constrained, could
instead just retry the call with `start` updated to just after the
last returned page until the ioctl() returns less ranges than
`vec_len` allows.

Best Regards
Michał Mirosław