Re: [PATCH v1 0/8] Deferred dput() and iput() -- reducing lock contention

From: Mike Waychison
Date: Wed Jan 21 2009 - 01:21:17 EST


Andi Kleen wrote:
Mike Waychison <mikew@xxxxxxxxxx> writes:

livelock on dcache_lock/inode_lock (specifically in atomic_dec_and_lock())

I'm not sure how something can livelock in atomic_dec_and_lock which
doesn't take a spinlock itself? Are you saying you run into NUMA memory
unfairness here? Or did I misparse you?

By atomic_dec_and_lock, I really meant to say _atomic_dec_and_lock(). It takes the spinlock if the cmpxchg hidden inside atomic_dec_unless fails.

There are likely NUMA unfairness issues at play, but it's not the main worry at this point.


This patchset is an attempt to try and reduce the locking overheads associated
with final dput() and final iput(). This is done by batching dentries and
inodes into per-process queues and processing them in 'parallel' to consolidate
some of the locking.

I was wondering what this does to the latencies when dput/iput
is only done for very objects. Does it increase costs then
significantly?

very objects?


As a high level comment it seems like a lot of work to work
around global locks, like the inode_lock, where it might be better to just split the lock up? Mind you I don't have a clear proposal
how to do that, but surely it's doable somehow.


Perhaps.. the only plausible way I can think this would be doable would be to rework the global resources (like the global inode_unused LRU list and deal with inode state transitions), but even then, some sort of consistency needs to happen at the super_block level, which means the smallest I can see the lock becoming would be per-super_block, which doesn't solve the problem afaict.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/