R.I.P. pdflush

From: Artem Bityutskiy
Date: Wed Jul 25 2012 - 11:10:43 EST


Now that all file-systems have been modified to not use the '->write_super()'
superblock method, we can kill the last pdflush leftover - the 'sync_supers'
kernel thread.

The sync_supers kernel thread does a very simple thing: wake up every 5
seconds (see [1]), iterate over all superblocks in the system and flush
dirty superblocks by calling their '->write_super()' method.

The problem is that from power-efficiency point of view it is very wasteful
to have a thread which wakes up every 5 seconds in the very core of the
Linux kernel. Indeed, most of the time this thread wakes the CPU from a deep
sleep state just to find out that there are no dirty superblocks. Besides,
modern file-systems like btrfs and ext4 (journalled mode only) do not even
register '->write_super()', so on many modern systems sync_super is completely
useless.

And as usually happens when trying to modify old code like that - removing
sync_supers was a tedious job. It required changing 12 file-systems, including
ancient ones. While changes were not that complex, testing all of them was the
most difficult part. While testing the mainstream file-systems like ext4 was
easy (just run xfstests and wait few hours), testing baroque file-systems was
problematic because they simply oopsed or errored even before I modified them.

For example, reiserfs deadlocked quickly when I tested it using xfstests with
resierfs quota support enabled. I spend several days trying to fix this, but
reiserfs is quite complex and I'd say its locking is crazy (partially because
of the BKL push-down). But I gave up after I realized that the dead-lock is
related to the quota support. I disabled quotas and xfstests passed.

I also had some adventures with affs and few other old file-systems.

The first patch of this patch-set removes the sync_supers thread and it is the
most important one. All the other patches are minor clean-ups and they simply
remove all references to 'write_super' and 'pdflush' from commentaries
and the documentation.

I suggest that all patches go in via Al's tree. However, not before the ext4,
exofs and udf changes are merged, which I expect to happen before v3.6-rc1.
The rest of the file-systems are merged already - here is the summary.

1. ext4 - changes sit in Ted Ts'o's tree
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
2. exofs - changes sit in Boaz Harrosh's tree
git://git.open-osd.org/linux-open-osd linux-next
3. udf - changes sit in Jan Kara's tree:
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs for_next
4. sysv - merged upstream
9d46be2 fs/sysv: stop using write_super and s_dirt
5. ufs - merged upstream
9e9ad5f fs/ufs: get rid of write_super
6. affs - merged upstream:
3dd8478 affs: get rid of affs_sync_super
7. hfs - merged upstream:
5687b57 hfs: get rid of hfs_sync_super
8. hfsplus - merged upstream:
9e6c582 hfsplus: get rid of write_super
9. ext2 - merged upstream
f72cf5e ext2: do not register write_super within VFS
10. vfat - merged upstream
7849118 fat: switch to fsinfo_inode
11. jffs2 - merged upstream
208b14e jffs2: get rid of jffs2_sync_super
12. reiserfs - merged upstream
033369d reiserfs: get rid of resierfs_sync_super

These patches are also available here:
git://git.infradead.org/users/dedekind/linux-misc.git sync_supers

And just because this is the final pdflush removal, here is a brief historical
reference.

1. early days...2.6.31 - pdflush is the kernel daemon which periodically
wakes-up and flushes all dirty inodes and superblocks.
2. 2.6.32 - Jens Axboe introduces per-block device BDI flusher threads which
are now responsible to flushing dirty inodes [2]. The pdflush thread becomes
very simple, it is re-named to sync_supers and it periodically wakes-up
and flushes superblocks. While overall Jens' change was good, it introduced
a regression: instead of one pdflush thread waking-up every 5 seconds [3]
we ended up with multiple threads waking up every 5 seconds - sync_supers
and several flusher threads.
3. 2.6.36 - Artem Bityutskiy :-) fixes the wake-ups regression (see commit
6467716) and from now on flusher threads do not wake up unless there are
some dirty data for the corresponding block device.

Attempts are made to similarly optimize sync_supers, but they are vetoed
by Al Viro who wants sync_supers to be killed altogether instead [4].
4. 3.6 - the sync_supers is hopefully finally killed. With this the last
piece of pdflush is also gone.

I'd like to thank Intel OTC for supporting this project, Jan Kara for help
with ext[24], Andrew Morton, Al Viro, Ted Ts'o, Nick Piggin.

[1] 5 seconds is the default setting and major distributions do not change
it. But it is tunable via /proc/sys/vm/dirty_writeback_centisecs
[2] http://lwn.net/Articles/326552/
[3] pdflush thread was forking itself if there were a lot dirty date, but it
does not matter in this context.
[4] https://lkml.org/lkml/2010/7/22/96

--
Regards,
Artem Bityutskiy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/