Re: fs/proc/task_mmu: Implement IOCTL for efficient page table scanning

From: Michał Mirosław
Date: Fri Jul 21 2023 - 20:22:31 EST


On Fri, Jul 21, 2023 at 10:50:19PM +0500, Muhammad Usama Anjum wrote:
> On 7/21/23 4:23 PM, Michał Mirosław wrote:
> > On Fri, Jul 21, 2023 at 03:48:22PM +0500, Muhammad Usama Anjum wrote:
> >> On 7/21/23 12:28 AM, Michał Mirosław wrote:
[...]
> >>> d. change the ioctl to be a SCAN with optional WP. Addressing the
> >>> original use-case, GetWriteWatch() can be implemented as:
> >> As I've mentioned several times previously (without the name of
> >> ResetWriteWatch()) that we need exclusive WP without GET. This could be
> >> implemented with UFFDIO_WRITEPROTECT. But when we use UFFDIO_WRITEPROTECT,
> >> we hit some special case and performance is very slow. So with UFFD WP
> >> expert, Peter Xu we have decided to put exclusive WP in this IOCTL for
> >> implementation of ResetWriteWatch().
> >>
> >> A lot of simplification of the patch is made possible because of not
> >> keeping exclusive WP. (You have also written some quality code, more better.)
> >>>
> >>> memset(&args, 0, sizeof(args));
> >>> args.start = lpBaseAddress;
> >>> args.end = lpBaseAddress + dwRegionSize;
> >>> args.max_pages = *lpdwCount;
> >>> *lpdwGranularity = PAGE_SIZE;
> >>> args.flags = PM_SCAN_CHECK_WPASYNC;
> >>> if (dwFlags & WRITE_WATCH_FLAG_RESET)
> >>> args.flags |= PM_SCAN_WP_MATCHED;
> >>> args.categories_mask = PAGE_IS_WRITTEN;
> >>> args.return_mask = PAGE_IS_WRITTEN;
> >
> > For ResetWriteWatch() you would:
> >
> > args.flags = PM_SCAN_WP_MATCHING;
> > args.categories_mask = PAGE_IS_WPASYNC | PAGE_IS_WRITTEN;
> > args.return_mask = 0;
> >
> > Or (if you want to error out if the range doesn't have WP enabled):
> >
> > args.flags = PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC;
> > args.categories_mask = PAGE_IS_WRITTEN;
> > args.return_mask = 0;
> >
> > (PM_SCAN_CHECK_WPASYNC is effectively adding PAGE_IS_WPASYNC to the
> > required categories.)
> Right. But we don't want to perform GET in case of ResetWriteWatch(). It'll
> be wasted effort to perform GET as well when we don't care about the dirty
> status of the pages.

This doesn't really do GET: return_mask == 0 means that there won't be any
ranges reported (and you can pass {NULL, 0} for arg.{vec, vec_len}). But
I've changed the no-GET criteria to vec == NULL (requires vec_len == 0).

> Is it possible for you to fix the above mentioned 3 things and send the
> patch for my testing:
> 1 Make GET optional
> 2 Detect partial THP WP operation and split
> 3 Optimization of moving this interesting logic to output() function
>
> Please let me know if you cannot make the above fixes. I'll mix my patch
> version and your patch and fix these things up.

Sending as a reply.

Best Regards
Michał Mirosław