Re: [PATCH] VFS: use synchronize_rcu_expedited() in namespace_unlock()

From: Florian Weimer
Date: Mon Nov 27 2017 - 06:27:14 EST


On 10/26/2017 02:27 PM, Paul E. McKenney wrote:
But just for completeness, one way to make this work across the board
might be to instead use call_rcu(), with the callback function kicking
off a workqueue handler to do the rest of the unmount. Of course,
in saying that, I am ignoring any mutexes that you might be holding
across this whole thing, and also ignoring any problems that might arise
when returning to userspace with some portion of the unmount operation
still pending. (For example, someone unmounting a filesystem and then
immediately remounting that same filesystem.)

You really need to complete all side effects of deallocating a resource before returning to user space. Otherwise, it will never be possible to allocate and deallocate resources in a tight loop because you either get spurious failures because too many unaccounted deallocations are stuck somewhere in the system (and the user can't tell that this is due to a race), or you get an OOM because the user manages to queue up too much state.

We already have this problem with RLIMIT_NPROC, where waitpid etc. return before the process is completely gone. On some kernels/configurations, the resulting race is so wide that parallel make no longer works reliable because it runs into fork failures.

Thanks,
Florian