Re: [PATCH v6 3/3] mm/gup: disallow FOLL_LONGTERM GUP-fast writing to file-backed mappings

From: Peter Zijlstra
Date: Tue May 02 2023 - 07:14:58 EST


On Tue, May 02, 2023 at 12:11:49AM +0100, Lorenzo Stoakes wrote:
> @@ -95,6 +96,77 @@ static inline struct folio *try_get_folio(struct page *page, int refs)
> return folio;
> }
>
> +#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
> +static bool stabilise_mapping_rcu(struct folio *folio)
> +{
> + struct address_space *mapping = READ_ONCE(folio->mapping);
> +
> + rcu_read_lock();
> +
> + return mapping == READ_ONCE(folio->mapping);

This doesn't make sense; why bother reading the same thing twice?

Who cares if the thing changes from before; what you care about is that
the value you see has stable storage, this doesn't help with that.

> +}
> +
> +static void unlock_rcu(void)
> +{
> + rcu_read_unlock();
> +}
> +#else
> +static bool stabilise_mapping_rcu(struct folio *)
> +{
> + return true;
> +}
> +
> +static void unlock_rcu(void)
> +{
> +}
> +#endif

Anyway, this all can go away. RCU can't progress while you have
interrupts disabled anyway.

> +/*
> + * Used in the GUP-fast path to determine whether a FOLL_PIN | FOLL_LONGTERM |
> + * FOLL_WRITE pin is permitted for a specific folio.
> + *
> + * This assumes the folio is stable and pinned.
> + *
> + * Writing to pinned file-backed dirty tracked folios is inherently problematic
> + * (see comment describing the writeable_file_mapping_allowed() function). We
> + * therefore try to avoid the most egregious case of a long-term mapping doing
> + * so.
> + *
> + * This function cannot be as thorough as that one as the VMA is not available
> + * in the fast path, so instead we whitelist known good cases.
> + *
> + * The folio is stable, but the mapping might not be. When truncating for
> + * instance, a zap is performed which triggers TLB shootdown. IRQs are disabled
> + * so we are safe from an IPI, but some architectures use an RCU lock for this
> + * operation, so we acquire an RCU lock to ensure the mapping is stable.
> + */
> +static bool folio_longterm_write_pin_allowed(struct folio *folio)
> +{
> + bool ret;
> +
> + /* hugetlb mappings do not require dirty tracking. */
> + if (folio_test_hugetlb(folio))
> + return true;
> +

This:

> + if (stabilise_mapping_rcu(folio)) {
> + struct address_space *mapping = folio_mapping(folio);

And this is 3rd read of folio->mapping, just for giggles?

> +
> + /*
> + * Neither anonymous nor shmem-backed folios require
> + * dirty tracking.
> + */
> + ret = folio_test_anon(folio) ||
> + (mapping && shmem_mapping(mapping));
> + } else {
> + /* If the mapping is unstable, fallback to the slow path. */
> + ret = false;
> + }
> +
> + unlock_rcu();
> +
> + return ret;

then becomes:


if (folio_test_anon(folio))
return true;

/*
* Having IRQs disabled (as per GUP-fast) also inhibits RCU
* grace periods from making progress, IOW. they imply
* rcu_read_lock().
*/
lockdep_assert_irqs_disabled();

/*
* Inodes and thus address_space are RCU freed and thus safe to
* access at this point.
*/
mapping = folio_mapping(folio);
if (mapping && shmem_mapping(mapping))
return true;

return false;

> +}