Re: [PATCH v28 2/6] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs

From: Michał Mirosław
Date: Sat Aug 12 2023 - 10:22:26 EST


On Fri, Aug 11, 2023 at 08:30:16PM +0500, Muhammad Usama Anjum wrote:
> On 8/11/23 6:32 PM, Michał Mirosław wrote:
> > On Fri, Aug 11, 2023 at 05:02:44PM +0500, Muhammad Usama Anjum wrote:
> >> Now we are walking the entire range walk_page_range(). We don't break loop
> >> when we get -ENOSPC as this error may only mean that the temporary buffer
> >> is full. So we need check if max pages have been found or output buffer is
> >> full or ret is 0 or any other error. When p.arg.vec_len = 1 is end
> >> condition as the last entry is in cur. As we have walked over the entire
> >> range, cur must be full after which the walk returned.
> >>
> >> So current condition is necessary. I've double checked it. I'll change it
> >> to `p.arg.vec_len == 1`.
> > If we have walked the whole range, then the loop will end anyway due to
> > `walk_start < walk_end` not held in the `for()`'s condition.
> Sorry, for not explaining to-the-point.
> Why would we walk the entire range when we should recognize that the output
> buffer is full and break the loop?
>
> I've test cases written for this case. If I remove `p.arg.vec_len == 1`
> check, there is infinite loop for walking. So we are doing correct thing here.

It seems there is a bug somewhere then. I'll take a look at v29.

> > [...]
> >>>> +/*
> >>>> + * struct pm_scan_arg - Pagemap ioctl argument
> >>>> + * @size: Size of the structure
> >>>> + * @flags: Flags for the IOCTL
> >>>> + * @start: Starting address of the region
> >>>> + * @end: Ending address of the region
> >>>> + * @walk_end Address where the scan stopped (written by kernel).
> >>>> + * walk_end == end informs that the scan completed on entire range.
> >>>
> >>> Can we ensure this holds also for the tagged pointers?
> >> No, we cannot.
> > So this need explanation in the comment here. (Though I'd still like to
> > know how the address tags are supposed to be used from someone that
> > knows them.)
> I've looked at some documentations (presentations/talks) about tags. Tags
> is more like userspace feature. Kernel should just ignore them for our use
> case. I'll add comment.

Kernel does ignore them when reading, but what about returning a tagged
pointer? How that should work? In case of `walk_end` we can safely copy
the tag from `end` or `start` when we return exactly on of those. But what
about other addresses? When fed back as `start` any tag will work, so
the question is only what to do with pointers in the middle? We can clear
those of course - this should be mentioned in the doc - so userspace always
gets a predictable value (note: 'predictable' does not require treating
`start` and `end` the same way as addresses between them, just that what
happens is well defined). (I think making `walk_end` == `end` work
regardless of pointer tagging will make userspace happier, but I guess
doc will also make it workable. And I'm repeating myself. ;-)

Best Regards
Michał Mirosław