Re: System CPU increasing on idle 2.6.36

From: Mark Moseley
Date: Fri Jan 07 2011 - 20:30:56 EST

Next message: David Miller: "Re: [stable] [069/152] econet: Do the correct cleanup after anunprivileged SIOCSIFADDR."
Previous message: mahatma: "Re: [PATCH] fix miscompiling with GCC 4.5 -finline-functions"
In reply to: Simon Kirby: "Re: System CPU increasing on idle 2.6.36"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Jan 7, 2011 at 4:52 PM, Simon Kirby <sim@xxxxxxxxxx> wrote:
> On Fri, Jan 07, 2011 at 10:05:39AM -0800, Mark Moseley wrote:
>
>> NOTE: NFS/Kernel guys: I've left the 2.6.36.2 box still thrashing
>> about in case there's something you'd like me to look at.
>>
>> Ok, both my 2.6.36.2 kernel box and my 2.6.35->2.6.36 bisect box (was
>> bisected to ce7db282a3830f57f5b05ec48288c23a5c4d66d5 -- this is my
>> first time doing bisect, so I'll preemptively apologize for doing
>
> That's a merge commit, so that doesn't sound right (it doesn't contain
> changes by itself). How were you running the bisect? Were you
> definitely seeing a good case since v2.6.35? Could you post your
> "git bisect log" so I can see if I can follow it?

Here's 'git bisect log':

# git bisect log
git bisect start
# bad: [f6f94e2ab1b33f0082ac22d71f66385a60d8157f] Linux 2.6.36
git bisect bad f6f94e2ab1b33f0082ac22d71f66385a60d8157f
# good: [9fe6206f400646a2322096b56c59891d530e8d51] Linux 2.6.35
git bisect good 9fe6206f400646a2322096b56c59891d530e8d51
# good: [78417334b5cb6e1f915b8fdcc4fce3f1a1b4420c] Merge branch
'bkl/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing
git bisect good 78417334b5cb6e1f915b8fdcc4fce3f1a1b4420c
# good: [14a4fa20a10d76eb98b7feb25be60735217929ba] Merge branch
'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
git bisect good 14a4fa20a10d76eb98b7feb25be60735217929ba
# bad: [ce7db282a3830f57f5b05ec48288c23a5c4d66d5] Merge branch
'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
git bisect bad ce7db282a3830f57f5b05ec48288c23a5c4d66d5

I did an initial checkout of stable
(git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git)
and (after git bisect start), did a 'git bisect good v2.6.35; git
bisect bad v2.6.36' but that seemed to be way too old. I tried to
approximate 2.6.35.9->2.6.36 by doing 'git bisect good
ea8a52f9f4bcc3420c38ae07f8378a2f18443970; git bisect bad
f6f94e2ab1b33f0082ac22d71f66385a60d8157f' but that had compilation
errors so I did 'git bisect good
9fe6206f400646a2322096b56c59891d530e8d51' which compiled ok -- heh,
did I mention this was my first outing with 'git bisect' :)

> Yeah, I was seeing load proportional to these, but I don't think their
> growth is anything wrong. It just seems to be something that is taking
> up CPU from flush proceses as they grow, and something which is causing a
> thundering herd of reclaim with lots of spinlock contention.
>
> Hmmm.. I tried to see if drop_caches would clear nfs_inode_cache and thus
> the CPU usage. The top of slabtop before doing anything:
>
> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> 1256343 1221799 97% 0.95K 38071 33 1218272K nfs_inode_cache
>
> # echo 2 > /proc/sys/vm/drop_caches
>
> this took about 40 seconds to execute, and perf top showed this for
> pretty much the whole time:
>
> samples pcnt function DSO
> _______ _____ ________________________ ___________
>
> 1188.00 45.1% _raw_spin_lock [kernel]
> 797.00 30.2% rpcauth_cache_shrinker [kernel]
> 132.00 5.0% shrink_slab [kernel]
> 60.00 2.3% queue_io [kernel]
> 59.00 2.2% _cond_resched [kernel]
> 44.00 1.7% nfs_writepages [kernel]
> 35.00 1.3% writeback_single_inode [kernel]
> 34.00 1.3% writeback_sb_inodes [kernel]
> 23.00 0.9% do_writepages [kernel]
> 20.00 0.8% bit_waitqueue [kernel]
> 15.00 0.6% redirty_tail [kernel]
> 11.00 0.4% __iget [kernel]
> 11.00 0.4% __GI_strstr libc-2.7.so
>
> slabtop then showed:
>
> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> 1256211 531720 42% 0.95K 38067 33 1218144K nfs_inode_cache
>
> and this doesn't change further with drop_caches=3 or 1. It's strange
> that most of them were freed, but not all... We do actually have
> RPC_CREDCACHE_DEFAULT_HASHBITS set to 10 at build time, which might
> explain a bigger credcache and time to free that. Did you change your
> auth_hashtable_size at all? Anyway, even with all of that, system CPU
> doesn't change.

Oh, now that's interesting. I'm setting RPC_CREDCACHE_DEFAULT_HASHBITS
to 8 (and have been since Jan, so like 2.6.33 or something) -- I was
having the same issue as in this thread,
http://www.spinics.net/lists/linux-nfs/msg09261.html. Bumping up
RPC_CREDCACHE_DEFAULT_HASHBITS was like a miracle on some of our
systems. I was going to get around to switching to the 'right way' to
do that as a boot param one of these days, instead of a hack.

In the bisected kernel, I definitely did *not* touch
RPC_CREDCACHE_DEFAULT_HASHBITS and it blew up anyway. If it hadn't
completely died, it'd have been nice to see what 'perf top' said.

> Ok, weird, Running "sync" actually caused "vmstat 1" to go from 25% CPU
> spikes every 5 seconds with nfs_writepages in the "perf top" every 5
> seconds to being idle! /proc/vmstat showed only nr_dirty 8 and
> nr_writeback 0 before I ran "sync". Either that, or it was just good
> timing.
>
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 0 0 0 6767892 9696 41088 0 0 5 180 15 70 62 7 29 1
> 0 0 0 6767892 9696 41088 0 0 0 0 1124 1978 0 0 100 0
> 0 0 0 6767892 9696 41088 0 0 0 0 1105 1967 0 0 100 0
> 0 0 0 6767892 9696 41088 0 0 0 0 1180 1984 0 2 98 0
> 2 0 0 6767892 9696 41088 0 0 0 0 2102 1968 0 22 78 0
> 0 0 0 6767768 9704 41084 0 0 0 52 1504 2022 0 8 91 1
> 0 0 0 6767768 9704 41084 0 0 0 0 1059 1965 0 0 100 0
> 0 0 0 6767768 9704 41088 0 0 0 0 1077 1982 0 0 100 0
> 0 0 0 6767768 9704 41088 0 0 0 0 1166 1972 1 2 97 0
> 0 0 0 6767768 9704 41088 0 0 0 0 1353 1990 0 6 94 0
> 0 0 0 6767644 9704 41100 0 0 0 0 2531 2050 0 27 72 0
> 0 0 0 6767644 9704 41100 0 0 0 0 1065 1974 0 0 100 0
> 1 0 0 6767644 9704 41100 0 0 0 0 1090 1967 0 0 100 0
> 0 0 0 6767644 9704 41100 0 0 0 0 1175 1971 0 2 98 0
> 0 0 0 6767644 9704 41100 0 0 0 0 1321 1967 0 6 94 0
> 1 0 0 6767644 9712 41100 0 0 0 52 1995 1995 0 18 81 1
> 0 0 0 6767644 9712 41100 0 0 0 0 1342 1964 0 6 94 0
> 0 0 0 6767644 9712 41100 0 0 0 0 1064 1974 0 0 100 0
> 0 0 0 6767644 9712 41100 0 0 0 0 1163 1968 0 2 98 0
> 0 0 0 6767644 9712 41100 0 0 0 0 1390 1976 1 6 94 0
> 0 0 0 6767768 9712 41112 0 0 0 0 1282 2018 0 1 99 0
> 0 0 0 6767768 9712 41112 0 0 0 0 2166 1972 0 24 76 0
> 0 0 0 6786260 9712 41112 0 0 0 0 1092 1989 0 0 100 0
> 0 0 0 6786260 9712 41112 0 0 0 0 1039 2003 0 2 98 0
> 0 0 0 6774588 9712 41112 0 0 0 0 1524 2015 3 9 89 0
> 0 1 0 6770124 9716 41112 0 0 0 48 1315 1990 3 0 87 11
> 1 0 0 6769628 9720 41112 0 0 0 4 2196 1966 0 20 79 0
> 0 0 0 6768364 9724 41108 0 0 36 0 1403 2031 1 5 94 1
> 0 0 0 6768388 9724 41144 0 0 0 0 1185 1969 0 2 98 0
> 0 0 0 6768388 9724 41144 0 0 0 0 1351 1975 0 6 94 0
> 0 0 0 6768140 9724 41156 0 0 0 0 1261 2020 0 1 99 0
> 0 0 0 6768140 9724 41156 0 0 0 0 1053 1973 0 0 100 0
> 1 0 0 6768140 9724 41156 0 0 0 0 2194 1971 1 26 73 0
> 0 0 0 6768140 9732 41148 0 0 0 52 1283 2088 0 4 96 0
> 2 0 0 6768140 9732 41156 0 0 0 0 1184 1965 0 2 98 0
> 0 0 0 6786368 9732 41156 0 0 0 0 1297 2000 0 4 97 0
> 0 0 0 6786384 9732 41156 0 0 0 0 998 1949 0 0 100 0
> (stop vmstat 1, run sync, restart vmstat 1)
> 1 0 0 6765304 9868 41736 0 0 0 0 1239 1967 0 0 100 0
> 0 0 0 6765304 9868 41736 0 0 0 0 1174 2035 0 0 100 0
> 0 0 0 6765304 9868 41736 0 0 0 0 1084 1976 0 0 100 0
> 1 0 0 6765172 9868 41736 0 0 0 0 1100 1975 0 0 100 0
> 0 0 0 6765304 9868 41760 0 0 0 0 1501 1997 1 1 98 0
> 0 0 0 6765304 9868 41760 0 0 0 0 1092 1964 0 0 100 0
> 1 0 0 6765304 9868 41760 0 0 0 0 1096 1963 0 0 100 0
> 0 0 0 6765304 9868 41760 0 0 0 0 1083 1967 0 0 100 0
> 0 0 0 6765304 9876 41752 0 0 0 76 1101 1980 0 0 99 1
> 0 0 0 6765304 9876 41760 0 0 0 4 1199 2097 0 2 98 0
> 0 0 0 6765304 9876 41760 0 0 0 0 1055 1964 0 0 100 0
> 0 0 0 6765304 9876 41760 0 0 0 0 1064 1977 0 0 100 0
> 0 0 0 6765304 9876 41760 0 0 0 0 1073 1968 0 0 100 0
> 0 0 0 6765304 9876 41760 0 0 0 0 1080 1976 0 0 100 0
> 0 0 0 6765156 9884 41772 0 0 0 20 1329 2034 1 1 98 1
> 0 0 0 6765180 9884 41780 0 0 0 0 1097 1971 0 1 99 0
> 0 0 0 6765180 9884 41780 0 0 0 0 1073 1960 0 0 100 0
> 0 0 0 6765180 9884 41780 0 0 0 0 1079 1976 0 0 100 0
> 0 0 0 6765172 9884 41780 0 0 0 0 1120 1980 0 0 100 0
> 0 0 0 6765180 9892 41772 0 0 0 60 1097 1982 0 0 100 0
> 0 0 0 6765180 9892 41780 0 0 0 0 1067 1969 0 0 100 0
> 0 0 0 6765180 9892 41780 0 0 0 0 1063 1973 0 0 100 0
> 0 0 0 6765180 9892 41780 0 0 0 0 1064 1969 0 1 100 0
>
> Hmmm. I'll watch it and see if it climbs back up again. That would mean
> that for now, echo 2 > /proc/sys/vm/drop_caches ; sync is a workaround
> for the problem we're seeing with system CPU slowly creeping up since
> 2.6.36. I have no idea if this affects the sudden flush process spinlock
> contention and attempted writeback without much progress during normal
> load, but it certainly changes things here.
>
> http://0x.ca/sim/ref/2.6.37/cpu_after_drop_caches_2_and_sync.png

That's pretty painful to have to do though (less painful than the box
becoming unusable, I'll admit). I wonder if when I did a umount on
mine (which must've sync'd at least that mount), it gave something a
big enough kick for the flush-0:xx threads to unstick themselves,
presumably what happened with your sync (though I didn't drop the
cache).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: David Miller: "Re: [stable] [069/152] econet: Do the correct cleanup after anunprivileged SIOCSIFADDR."
Previous message: mahatma: "Re: [PATCH] fix miscompiling with GCC 4.5 -finline-functions"
In reply to: Simon Kirby: "Re: System CPU increasing on idle 2.6.36"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]