Re: [PATCH v2] proc: Allow pid_revalidate() during LOOKUP_RCU

From: Eric W. Biederman
Date: Tue Dec 15 2020 - 16:48:12 EST


Stephen Brennan <stephen.s.brennan@xxxxxxxxxx> writes:

> ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes:
>
>> Stephen Brennan <stephen.s.brennan@xxxxxxxxxx> writes:
>>
>>> The pid_revalidate() function requires dropping from RCU into REF lookup
>>> mode. When many threads are resolving paths within /proc in parallel,
>>> this can result in heavy spinlock contention as each thread tries to
>>> grab a reference to the /proc dentry lock (and drop it shortly
>>> thereafter).
>>
>> I am feeling dense at the moment. Which lock specifically are you
>> referring to? The only locks I can thinking of are sleeping locks,
>> not spinlocks.
>
> The lock in question is the d_lockref field (aliased as d_lock) of
> struct dentry. It is contended in this code path while processing the
> "/proc" dentry, switching from RCU to REF mode.
>
> walk_component()
> lookup_fast()
> d_revalidate()
> pid_revalidate() // returns -ECHILD
> unlazy_child()
> lockref_get_not_dead(&nd->path.dentry->d_lockref)
>
>>
>>> diff --git a/fs/proc/base.c b/fs/proc/base.c
>>> index ebea9501afb8..833d55a59e20 100644
>>> --- a/fs/proc/base.c
>>> +++ b/fs/proc/base.c
>>> @@ -1830,19 +1846,22 @@ static int pid_revalidate(struct dentry *dentry, unsigned int flags)
>>> {
>>> struct inode *inode;
>>> struct task_struct *task;
>>> + int rv = 0;
>>>
>>> - if (flags & LOOKUP_RCU)
>>> - return -ECHILD;
>>> -
>>> - inode = d_inode(dentry);
>>> - task = get_proc_task(inode);
>>> -
>>> - if (task) {
>>> - pid_update_inode(task, inode);
>>> - put_task_struct(task);
>>> - return 1;
>>> + if (flags & LOOKUP_RCU) {
>>
>> Why do we need to test flags here at all?
>> Why can't the code simply take an rcu_read_lock unconditionally and just
>> pass flags into do_pid_update_inode?
>>
>
> I don't have any good reason. If it is safe to update the inode without
> holding a reference to the task struct (or holding any other lock) then
> I can consolidate the whole conditional.


I am not certain if there is anything that makes it safe to change uid
and gid on the inode during lookup. The current code might be buggy in
that respect.

However it is absoltely safe to read from the task_struct with just the
rcu_read_lock. Which means it is only do_pid_update_inode that needs
locking to update the inode.

>>> + inode = d_inode_rcu(dentry);
>>> + task = pid_task(proc_pid(inode), PIDTYPE_PID);
>>> + if (task)
>>> + rv = do_pid_update_inode(task, inode, flags);
>>> + } else {
>>> + inode = d_inode(dentry);
>>> + task = get_proc_task(inode);
>>> + if (task) {
>>> + rv = do_pid_update_inode(task, inode, flags);
>>> + put_task_struct(task);
>>> + }
>>
>>> }
>>> - return 0;
>>> + return rv;
>>> }
>>>
>>> static inline bool proc_inode_is_dead(struct inode *inode)
>>
>> Eric

Eric