Re: [patch 20/20] Add apply_to_page_range() which applies a functionto a pte range.

From: Jeremy Fitzhardinge
Date: Thu Apr 05 2007 - 02:53:44 EST


Matt Mackall wrote:
>> +/*
>> + * Scan a region of virtual memory, filling in page tables as necessary
>> + * and calling a provided function on each leaf page table.
>> + */
>>
>
> But I'm not sure what the use case is that wants filling in the page
> table..? If both modes really make sense, perhaps a flag could unify
> these differences.
>

Well, two reasons:

One is the general one that if you're traversing ptes then they need to
exist to traverse them (for example, if you're creating new mappings).
Obviously if you want to just visit existing mappings, then
instantiating new pagetable is not the right thing to do (and I could
make use of this too).

The other is that there are various places in the Xen hypervisor API
where you pass in a reference to pte entry for the hypervisor to put
mappings into, and the rest of the pagetable needs to exist. The Xen
code uses the side-effect of apply_to_page_range() to create pagetable
for these calls.

>> +typedef int (*pte_fn_t)(pte_t *pte, struct page *pmd_page, unsigned long addr,
>> + void *data);
>>
>
> I'd gotten the impression that these sorts of typedefs were out of
> fashion.
>

In general yes, but for function pointers the syntax is so clumsy that I
think typedefs are OK.

>> +static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd,
>> + unsigned long addr, unsigned long end,
>> + pte_fn_t fn, void *data)
>> +{
>> + pte_t *pte;
>> + int err;
>> + struct page *pmd_page;
>> + spinlock_t *ptl;
>> +
>> + pte = (mm == &init_mm) ?
>> + pte_alloc_kernel(pmd, addr) :
>> + pte_alloc_map_lock(mm, pmd, addr, &ptl);
>> + if (!pte)
>> + return -ENOMEM;
>>
>
> Seems a bit awkward to pass mm all the way down the tree just for this
> quirk. Which is a bit awkward as it means that whether or not a lock
> is held in the callback is context dependent.
>

Well, it would need mm for just pte_alloc_map_lock() anyway.

> smaps, clear_ref, and my pagemap code all use the callback at the
> pmd_range level, which a) localizes the pte-level locking concerns
> with the user b) amortizes the indirection overhead and c)
> (unfortunately) makes the user a bit more complex.
>
> We should try to measure whether (b) actually makes a difference.
>

I'll need to look closely at your code again. It would be nice to have
one pagewalker.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/