Re: [PATCH 0/6] kill-the-BKL/reiserfs3: performance improvements,faster than Bkl based scheme

From: Linus Torvalds
Date: Fri May 01 2009 - 17:18:12 EST



On Fri, 1 May 2009, Ingo Molnar wrote:
> * Ingo Molnar <mingo@xxxxxxx> wrote:
>
> > The BKL is clearly removed at a faster reate with such debugging
> > measures in place. With such measures the BKL _really_ hurts, and
> > very visibly so - and that results in active removal.
>
> Btw., if you can think of any way to create a cleaner debug tool
> here i'd be glad to start a new tree that adds only _that_ tool -
> and that could perhaps be dealt with independently and possibly
> pulled upstream too, if clean enough.

Quite frankly, for something as clearly defined as as a filesystem, I
would literally suggest starting out with a few trivial stages:

Stage 1:
- raw search-and-replace of [un]lock_kernel()

sed 's/unlock_kernel()/reiserfs_unlock(sb)/g'
sed 's/lock_kernel()/reiserfs_lock(sb)/g'

and then obviously fix it up so that 'sb' always exists (ie you would
need to add a a few

struct super_block *sb = inode->i_sb;

lines manually to make the end result work.

- add that 'reiserfs_[un]lock()' pair around every single VFS entry-point
that currently gets called with lock_kernel(). They'd _still_ get
called with lock-kernel too (you're not fixing that), but now they'd
_also_ take the new lock.

- make reiserfs_unlock(sb) just do [un]lock_kernel(), and use that as a
known good starting point. Nothing has really changed (and you in fact
_increased_ the nesting on the BKL, since now the BKL is gotten twice
in those areas that were called from the VFS with the BKL held), but
you now have a mindless and pretty trivial conversion where you still
have a working system, and nothing really changed - except you now have
the beginnings of a nice abstraction.

IOW, "stage 1" is all totally mindless, and never breaks anything. It's
very much designed to be "obviously correct".

Stage 2:
- switch over reiserfs_[un]lock() to a per-superblock mutex, and enable
lockdep, and start looking for nesting problems.

IOW, "stage 2" is when you make a _minimal_ change to now enable all the
good lockdep infrastructure. But it's also designed to just do _one_
thing: look at nesting. Nothing else.

Stage 3:
- once you've fixed all nesting problems (well, most of them - enable a
swapfile on that filesystem and I bet you'll find a few more), you
might want to look at performance issues. In particular, you want to
drop the lock over at least the most _common_ blocking operations, if
at all possible.

And stage 3 is where it would make absolutely _tons_ of sense to just add
some very simple infrastructure for having a per-thread "IO warning
counter", and then simply increment/decrement that counter in
reiserfs_[un]lock().

But exactly because we do NOT want to get warnings about BKL use in all
the _other_ subsystems, we don't want to mix this up with the BKL counter
that we already have. You could probably even use the tracing
infrastructure to do this: enable tracing in reiserfs_lock(), disable it
in reiserfs_unlock(), and look at what you catch in between.

I dunno. That's how I would go about it if I just wanted to clean up a
specific subsystem, and didn't want to tie it together with getting
everything else right too.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/