Re: Linux 2.6.29

From: Ingo Molnar
Date: Tue Mar 24 2009 - 09:31:20 EST



* Theodore Tso <tytso@xxxxxxx> wrote:

> More recently (as in this past weekend), I went back to the ext3
> problem, and found a better solution, here:
>
> http://lkml.org/lkml/2009/3/21/304
> http://lkml.org/lkml/2009/3/21/302
> http://lkml.org/lkml/2009/3/21/303
>
> These patches cause the synchronous writes caused by an fsync() to
> be submitted using WRITE_SYNC, instead of WRITE, which definitely
> helps in the case where there is a heavy read workload in the
> background.
>
> They don't solve the problem where there is a *huge* amount of
> writes going on, though --- if something is dirtying pages at a
> rate far greater than the local disk can write it out, say, either
> "dd if=/dev/zero of=/mnt/make-lots-of-writes" or a massive distcc
> cluster driving a huge amount of data towards a single system or a
> wget over a local 100 megabit ethernet from a massive NFS server
> where everything is in cache, then you can have a major delay with
> the fsync().

Nice, thanks for the update! The situation isnt nearly as bleak as i
feared they are :)

> However, what I've found, though, is that if you're just doing a
> local copy from one hard drive to another, or downloading a huge
> iso file from an ftp server over a wide area network, the fsync()
> delays really don't get *that* bad, even with ext3. At least, I
> haven't found a workload that doesn't involve either dd
> if=/dev/zero or a massive amount of data coming in over the
> network that will cause fsync() delays in the > 1-2 second
> category. Ext3 has been around for a long time, and it's only
> been the last couple of years that people have really complained
> about this; my theory is that it was the rise of > 10 megabit
> ethernets and the use of systems like distcc that really made this
> problem really become visible. The only realistic workload I've
> found that triggers this requires a fast network dumping data to a
> local filesystem.

i think the problem became visible via the rise in memory size,
combined with the non-improvement of the performance of rotational
disks.

The disk speed versus RAM size ratio has become dramatically worse -
and our "5% of RAM" dirty ratio on a 32 GB box is 1.6 GB - which
takes an eternity to write out if you happen to sync on that. When
we had 1 GB of RAM 5% meant 51 MB - one or two seconds to flush out
- and worse than that, chances are that it's spread out widely on
the disk, the whole thing becoming seek-limited as well.

That's where the main difference in perception of this problem comes
from i believe. The problem was always there, but only in the last
1-2 years did 4G/8G systems become really common for people to
notice.

SSDs will save us eventually, but they will take up to a decade to
trickle through for us to forget about this problem altogether.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/