Re: [PATCH] mm: remove unintentional voluntary preemption in get_mmap_lock_carefully

From: Mateusz Guzik
Date: Sun Aug 20 2023 - 08:48:58 EST


On 8/20/23, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> On Sun, Aug 20, 2023 at 12:43:03PM +0200, Mateusz Guzik wrote:
>> Should the trylock succeed (and thus blocking was avoided), the routine
>> wants to ensure blocking was still legal to do. However, the method
>> used ends up calling __cond_resched injecting a voluntary preemption
>> point with the freshly acquired lock.
>>
>> One can hack around it using __might_sleep instead of mere might_sleep,
>> but since threads keep going off CPU here, I figured it is better to
>> accomodate it.
>
> Except now we search the exception tables every time we call it.
> The now-deleted comment (c2508ec5a58d) suggests this is slow:
>

I completely agree it a rather unfortunate side-effect.

> - /*
> - * Kernel-mode access to the user address space should only occur
> - * on well-defined single instructions listed in the exception
> - * tables. But, an erroneous kernel fault occurring outside one of
> - * those areas which also holds mmap_lock might deadlock attempting
> - * to validate the fault against the address space.
> - *
> - * Only do the expensive exception table search when we might be at
> - * risk of a deadlock. This happens if we
> - * 1. Failed to acquire mmap_lock, and
> - * 2. The access did not originate in userspace.
> - */
>
> Now, this doesn't mean we're doing it on every page fault. We skip
> all of this if we're able to handle the fault under the VMA lock,
> so the effect is probably much smaller than it was. But I'm surprised
> not to see you send any data quantifying the effect of this change!
>

Going off CPU *after* taking the lock sounds like an obviously bad
thing to happen and as such I did not think it warrants any
measurements.

My first patch looked like this:
diff --git a/mm/memory.c b/mm/memory.c
index 1ec1ef3418bf..8662fd69eae8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5259,7 +5259,9 @@ static inline bool
get_mmap_lock_carefully(struct mm_struct *mm, struct pt_regs
{
/* Even if this succeeds, make it clear we *might* have slept */
if (likely(mmap_read_trylock(mm))) {
- might_sleep();
+#if defined(CONFIG_DEBUG_ATOMIC_SLEEP)
+ __might_sleep(__FILE__, __LINE__);
+#endif
return true;
}

This keeps assertions while dodging __cond_resched.

But then I figured someone may complain about scheduling latency which
was not there prior to the patch.

Between the 2 not so great choices I rolled with what I considered the
safer one.

However, now that I said it, I wonder if perhaps the search could be
circumvented on x86-64? The idea would be to check if SMAP got
disabled (and assuming the CPU supports it) -- if so and the faulting
address belongs to userspace, assume it's all good. This is less
precise in that SMAP can be disabled over the intended users access
but would be fine as far as I'm concerned if the search is such a big
deal.

--
Mateusz Guzik <mjguzik gmail.com>