Re: [PATCH 1/2] dcache: convert dentry_stat.nr_unused to per-cpucounters

From: Dave Chinner
Date: Tue Aug 09 2011 - 06:48:17 EST


On Mon, Aug 08, 2011 at 10:08:20AM +0300, Pekka Enberg wrote:
> On Mon, Aug 8, 2011 at 10:03 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> >
> > Before we split up the dcache_lru_lock, the unused dentry counter
> > needs to be made independent of the global dcache_lru_lock. Convert
> > it to per-cpu counters to do this.
> >
> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > ---
> >  fs/dcache.c |   17 ++++++++++++++---
> >  1 files changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/dcache.c b/fs/dcache.c
> > index b05aac3..5e5208d 100644
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -116,6 +116,7 @@ struct dentry_stat_t dentry_stat = {
> >  };
> >
> >  static DEFINE_PER_CPU(unsigned int, nr_dentry);
> > +static DEFINE_PER_CPU(unsigned int, nr_dentry_unused);
> >
> >  #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS)
> >  static int get_nr_dentry(void)
> > @@ -127,10 +128,20 @@ static int get_nr_dentry(void)
> >        return sum < 0 ? 0 : sum;
> >  }
> >
> > +static int get_nr_dentry_unused(void)
> > +{
> > +       int i;
> > +       int sum = 0;
> > +       for_each_possible_cpu(i)
> > +               sum += per_cpu(nr_dentry_unused, i);
> > +       return sum < 0 ? 0 : sum;
> > +}
>
> Why not use struct percpu_counter and percpu_counter_sum_positive() for this?

That's what I originally implemented for all these VFS per-cpu
counters. e.g:

cffbc8a fs: Convert nr_inodes and nr_unused to per-cpu counters

but then this happened:

commit 86c8749ede0c59e590de9267066932a26f1ce796
Author: Nick Piggin <npiggin@xxxxxxxxx>
Date: Fri Jan 7 17:49:18 2011 +1100

vfs: revert per-cpu nr_unused counters for dentry and inodes

The nr_unused counters count the number of objects on an LRU, and as such they
are synchronized with LRU object insertion and removal and scanning, and
protected under the LRU lock.

Making it per-cpu does not actually get any concurrency improvements because o
this lock, and summing the counter is much slower, and
incrementing/decrementing it costs more code size and is slower too.

These counters should stay per-LRU, which currently means global.

Signed-off-by: Nick Piggin <npiggin@xxxxxxxxx>

and then, because the per-cpu counters were actually necessary,
a couple of patches later in the series per-cpu counters were
re-introduced:

commit 3e880fb5e4bb6a012035e3edd0586ee2817c2e24
Author: Nick Piggin <npiggin@xxxxxxxxx>
Date: Fri Jan 7 17:49:19 2011 +1100

fs: use fast counters for vfs caches

percpu_counter library generates quite nasty code, so unless you need
to dynamically allocate counters or take fast approximate value, a
simple per cpu set of counters is much better.

The percpu_counter can never be made to work as well, because it has an
indirection from pointer to percpu memory, and it can't use direct
this_cpu_inc interfaces because it doesn't use static PER_CPU data, so
code will always be worse.

....

No point in making arguments about how using and improving the
generic counter helps everyone, the maintenance is lower as well,
or that we are summing the counters on every single shrinker call
(i.e. could be thousands of times a second) so we need fast
approximate values to be taken.

The change to per-sb LRUs actually takes away the need to sum the
VFS inode and dentry counters on every shrinker call, so now the use
of the roll-your-own counter structure makes a bit of sense because
they are only read when someone read /proc/sys/fs/dentry-state. It
sure as hell didn't make sense back when this code was merged,
though....

Cheers,

Dave.

--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/