Re: Re: [PATCH v2 01/14] mm: Batch-copy PTE ranges during fork()

From: Barry Song
Date: Mon Nov 27 2023 - 03:42:44 EST


>> + for (i = 0; i < nr; i++, page++) {
>> + if (anon) {
>> + /*
>> + * If this page may have been pinned by the
>> + * parent process, copy the page immediately for
>> + * the child so that we'll always guarantee the
>> + * pinned page won't be randomly replaced in the
>> + * future.
>> + */
>> + if (unlikely(page_try_dup_anon_rmap(
>> + page, false, src_vma))) {
>> + if (i != 0)
>> + break;
>> + /* Page may be pinned, we have to copy. */
>> + return copy_present_page(
>> + dst_vma, src_vma, dst_pte,
>> + src_pte, addr, rss, prealloc,
>> + page);
>> + }
>> + rss[MM_ANONPAGES]++;
>> + VM_BUG_ON(PageAnonExclusive(page));
>> + } else {
>> + page_dup_file_rmap(page, false);
>> + rss[mm_counter_file(page)]++;
>> + }
>> }
>> - rss[MM_ANONPAGES]++;
>> - } else if (page) {
>> - folio_get(folio);
>> - page_dup_file_rmap(page, false);
>> - rss[mm_counter_file(page)]++;
>> +
>> + nr = i;
>> + folio_ref_add(folio, nr);
>
> You're changing the order of mapcount vs. refcount increment. Don't.
> Make sure your refcount >= mapcount.
>
> You can do that easily by doing the folio_ref_add(folio, nr) first and
> then decrementing in case of error accordingly. Errors due to pinned
> pages are the corner case.
>
> I'll note that it will make a lot of sense to have batch variants of
> page_try_dup_anon_rmap() and page_dup_file_rmap().
>

i still don't understand why it is not a entire map+1, but an increment
in each basepage.

as long as it is a CONTPTE large folio, there is no much difference with
PMD-mapped large folio. it has all the chance to be DoubleMap and need
split.

When A and B share a CONTPTE large folio, we do madvise(DONTNEED) or any
similar things on a part of the large folio in process A,

this large folio will have partially mapped subpage in A (all CONTPE bits
in all subpages need to be removed though we only unmap a part of the
large folioas HW requires consistent CONTPTEs); and it has entire map in
process B(all PTEs are still CONPTES in process B).

isn't it more sensible for this large folios to have entire_map = 0(for
process B), and subpages which are still mapped in process A has map_count
=0? (start from -1).

> Especially, the batch variant of page_try_dup_anon_rmap() would only
> check once if the folio maybe pinned, and in that case, you can simply
> drop all references again. So you either have all or no ptes to process,
> which makes that code easier.
>
> But that can be added on top, and I'll happily do that.
>
> --
> Cheers,
>
> David / dhildenb

Thanks
Barry