Re: [PATCH v3 2/2] kernfs: Reduce contention around global per-fs kernfs_rwsem.

From: Imran Khan
Date: Wed Feb 02 2022 - 10:10:40 EST


Hi Tejun,

On 14/1/22 3:42 am, Tejun Heo wrote:
> Hello,
>
> On Thu, Jan 13, 2022 at 09:42:59PM +1100, Imran Khan wrote:
>> @@ -748,11 +749,14 @@ int kernfs_add_one(struct kernfs_node *kn)
>> goto out_unlock;
>>
>> /* Update timestamps on the parent */
>> + rwsem = iattr_rwsem_ptr(parent);
>> + down_write(rwsem);
>> ps_iattr = parent->iattr;
>> if (ps_iattr) {
>> ktime_get_real_ts64(&ps_iattr->ia_ctime);
>> ps_iattr->ia_mtime = ps_iattr->ia_ctime;
>> }
>> + up_write(rwsem);
>>
>> up_write(&root->kernfs_rwsem);
>
> Hmmm, so the additions / removals are still fs-global lock protected. Would
> it be possible to synchronize them through hashed locks too? We can provide
> double locking helpers - look up locks for both parent and child and if
> different lock in the defined order (parent first most likely) and record
> what happened in a token so that it can be undone later.
>
> Without going through the code carefully, I don't remember whether there's
> something which depends on global locking but I'm sure we can fix them too.
> It'd be really nice if we can make all operations similarly scalable cuz
> with heavy stacking addition/removals can get pretty hot too.
>

I have replaced global rwsem with hashed version in v4 of the patch set
at [1].
I have tried to avoid nested locking because of the following deadlock
scenario:

Say node N11 has parent node N1 and node N22 has parent node N2. Also
N11 and N2 hash to same lock and N1 and N22 hash to same lock.
In this case if we have 2 parallel contexts such that one is locking
N11 and it's parent and other is locking N22 and it's parent and
execution happens like below:

Thread 1 Thread 2
Take lock of N11 --------

---- Take lock of N22

Wait for lock of N1 ----------

-------- Wait for lock of N2

the testing that I have done with v4 are:

1. Multiple boots with systemd and udevd in place to create/remove
sysfs, cgroupfs entries

2. CPU hotplug and reading topology attributes from sysfs in parallel

3. sysfs LTP tests.

4. Above 3 tests with lockdep and KASAN enabled kernels

I will wait for your feedback about approach taken in v4 of the patch
set [1].

[1]:
https://lore.kernel.org/lkml/20220202145027.723733-1-imran.f.khan@xxxxxxxxxx/

Thanks
-- Imran