soft-update/async write file systems

Larry M. Augustin (lma@varesearch.com)
Tue, 17 Feb 1998 14:27:03 -0800


I've been seeing the reports of Kirk McKusick's soft-update filesystem
for *BSD. Anyone working on anything similar for Linux?

Larry

----- Begin Included Message -----

>From owner-freebsd-hackers@FreeBSD.ORG Wed Feb 4 21:31:29 1998
Date: Wed, 04 Feb 1998 21:34:40 -0800
From: Julian Elischer <julian@whistle.com>
Organization: Whistle Communications
X-Mailer: Mozilla 3.0Gold (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: hackers@FreeBSD.ORG
CC: mckusick@mckusick.com
Subject: soft-updates performance differences.
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
X-Loop: FreeBSD.ORG
X-To-Unsubscribe: mail to majordomo@FreeBSD.org "unsubscribe hackers"

These numbers were found today running soft-updates on freebsd-current
(as of 3 days ago).:

The tests are simplistic and do not reflect any known meaningful
work other than hierarchical copies and deletes.

method:
mount a 64MB soft-update filesystem as /mnt
time cp -pr /usr/X11R6 /mnt
time rm -rf /mnt/X11R6

-----------------------------
normal sync mount:
COPY
user time: 0.1 sec
sytem time: 4.9 secs
real time: 2:16
6339 sync writes, 3788 async writes

RM
user time 0 sec
system time: 0.9 sec
real time: 0:59
5884 sync writes, 74 async writes
-------------------------------
normal mount -o async
COPY
user time: 0.1 sec
system time: 4.6 secs
real time: 1:08
325 sync writes, 3873 async writes

RM
user time: 0.0
system time: 0.8 secs
real time: 0:26 secs
2941 sync writes, 163 async writes (why so many sync?)
---------------------------------
soft updates mount (after tunefs -n enable)
COPY
user time: 0.1 secs
system time: 5.1 secs
real time: 1:13 secs
0 sync writes, 4916 async writes

RM
user time: 0 secs
system time: 0.5 secs
real time: 0:01.27 (!!)
0 sync writes, 238 async writes
-------------------------------------

notes.
The the free space for the RM on soft updates is recovered
over the 30 seconds following the rm. It's amusing doing a 'df'
and seeing it come back even though you are not doing anything.
A system failure at any time during the soft-updates mount would leave a
disk that is completely consistant, except for
the fact that some blocks may not be freed (wow)
unlike the -o async case.

Note especially that 'consistant' does not mean 'no data loss'.
Data written to disk in the last 30 seconds and NOT FOLLOWED BY A FSYNC
may not be present after reboot...

If you plan on hitting 'reset', you need 3 syncs to ensure
everything you hope will be on the disk actually is.. (at this stage)
(you shouldn't be doing this anyhow!)

Soft-updates is a step forward in the technology but is not a magic
bullet.

----- End Included Message -----

----- Begin Included Message -----

>From owner-freebsd-hackers@FreeBSD.ORG Wed Feb 4 23:10:19 1998
To: "Jordan K. Hubbard" <jkh@time.cdrom.com>
Subject: Re: kirk's soft-update integration..
cc: Julian Elischer <julian@whistle.com>,
Amancio Hasty <hasty@rah.star-gate.com>, hackers@FreeBSD.ORG
Date: Wed, 04 Feb 1998 23:13:10 -0800
From: Kirk McKusick <mckusick@McKusick.COM>
Sender: owner-freebsd-hackers@FreeBSD.ORG
X-Loop: FreeBSD.ORG
X-To-Unsubscribe: mail to majordomo@FreeBSD.org "unsubscribe hackers"
Content-Length: 2626

To: Julian Elischer <julian@whistle.com>
cc: Amancio Hasty <hasty@rah.star-gate.com>,
Kirk McKusick <mckusick@McKusick.COM>, hackers@freebsd.org
Subject: Re: kirk's soft-update integration..
In-reply-to: Your message of "Wed, 04 Feb 1998 10:31:39 PST."
<34D8B40B.41C67EA6@whistle.com>
Date: Wed, 04 Feb 1998 15:48:52 -0800
From: "Jordan K. Hubbard" <jkh@time.cdrom.com>

> I am mostly done with the patches.
> and I have changed the new code to work with FreeBSD.
> Whistle owns this work, however the patches will probably
> make their way back into -current at some stage.
> the new code itself will possibly have to be fetched by each user,
> maybe kirk might set up a 'sign-to-get-it web page or something.

Ummm. I still completely fail to see why OpenBSD was able to
integrate all the hooks AND make the two "encumbered" (sorry
to use that word ;) files available on their web site without
any such hoop-jumping.

I really do also get the feeling that the intervention of whistle in
this matter has only vastly overcomplicated the situation for the
average user. As amancio says, why can't we just ftp the files from
someplace? I've asked Kirk about the method that OpenBSD used and he
didn't seem to have any objections to what they're doing, so why will
FreeBSD's support require a signature in blood before we can do the
same thing?

Jordan

I appreciate Julian taking a careful approach to the distribution
of my soft update code in FreeBSD. Having not talked to me about
what I intended to do, he did not want to make promises that I would
be unhappy with. However, he has advanced a more conservative position
than I intend to take. It is my intent that the distribution
policy for soft updates in FreeBSD be identical to that of OpenBSD.
That policy, is that all the hooks and the no-op stub file be in the
standard distribution. There will be absolutely no restrictions on
the distribution or use of any of these changes. The two plug-in
replacement files will be available by ftp from the usual FreeBSD
ftp sites. They may be downloaded and used by anyone that wants to
use them individually (including at your workstation at work). I
ask that companies that want to use them in commercial products
(for example, embedded or turnkey systems, production ISP systems,
etc) talk to me about working out a license (almost always I
negotiate a one time fee and the right but not the obligation to
buy consulting time). It is my hope that I will be able to use
this income to advance my next project: rebooting after a crash
without the need to run fsck.

Kirk McKusick

----- End Included Message -----

----- Begin Included Message -----

>From owner-freebsd-fs@FreeBSD.ORG Tue Feb 10 00:23:07 1998
From: Terry Lambert <tlambert@primenet.com>
Subject: Re: Soft update usage
To: roark_hilomen_at_meridian@meridian-data.com
Date: Tue, 10 Feb 1998 08:29:23 +0000 (GMT)
Cc: freebsd-fs@FreeBSD.ORG
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
X-Loop: FreeBSD.org

> Does anyone know if "Soft Updates" has been in use in the industry
> or is OpenBSD and now FreeBSD the first to implement it?

BSDI has them. The original Ganger/Patt work was done in SVR4
(if you are familiar with SVR4 internals, this is obvious from
looking at the published Appendix A code).

> How extensively has it been tested (outside the current FreeBSD flurry)?

Kirk would be a better person to answer that.

> And for that matter, is anyone testing it on a large multi-client system?

I don't know; they should.

> Has anyone tried crashing it and seeing if FSCK works well and as advertised
> with Soft Updates?

That's Kirk's testbed. He uses a SCSI drive and two machines; he halts
the one from the other, randomly, and fsck's (among other tests). He
claims many, many months of reliable operation.

> I'm trying to see how cutting edge this technology is.

It's "medium". It's a good leap past current implementations, but the
paper was published several years ago, and it does not support general
updates between layered FS's. Effectively, Ganger and Patt did this:

1) Determine all operations that could be done on SVR4 UFS

2) Determine the events that would result from these operations,
as small DAG fragments (directed acylic graphs).

3) Invent a structure called a "dependency" that defined relationships
between these DAG's.

4) Invent a mechanism for ensuring ordering of operations in
dependency order (this mechanism is superior to DOW -- delayed
ordered writes -- a USL patent-pending technology, because
there is, in effect, write gathering and meta-data ordering
without flush -- unlike DOW).

5) Hard-code (rather than parametrically code) these dependency
queueing events. This is the step that degeneralizes the
Ganger/Patt code, and makes it a lot of work to port -- at a
3-5% performance increase (negligible in comparison to the
win of soft updates themselves over using sync writes to
guarantee ordering of operations).

The DOW mechanism is basically the classic "banker's algorithm", which
determines potential cycles in a graph and denies operations based on
the possibility of cycles. The soft updates mechanism (effectively)
precomputes transitive closure over the graph, so any insertion that
would result in a cycle is seen as depending on prior insertions.

The difference in these mechisms can be best characterized by:

DOW: Prevent operations which are dangerous. Also, using back-tracking
via Howard's algortihm, prevent operations which *may* be dangerous,
even if it ends up that they *aren't* dangerous.

SU: Prevent only operations which are dangerous.

I *seriously* recommend the following books:

Algorithms in C++
Robert Sedgewick
Addison-Wesley Publishing Company
ISBN 0-201-51059-6

The Art Of computer Programming
_Volume 1: Fundamental Algorithms_
Donald Knuth
Addison-Wesley Publishing Company
ISBN 0-201-03809-9

UNIX Systems for Modern Architectures
_Symmetric Multiprocessing and Caching for Kernel Programmers_
Curt Schimmel
Addison-Wesley Publishing Company
ISBN 0-201-63338-8

For more information on graphical algorithms. Don't be put off by
the Sedgewick book; it is very light on the code and heavy on the
graph theory (chapters 29 through 34 -- I recommend chapters 24
through 28, on geomentric algorithms, as well).

For example, if you were to implement a lock manager using graphical
representations and cycle detection, you could probably expect to do
about 20,000 transactions a second (this is the measured rate through
the NetWare for UNIX lock manager, which used an intention mode lock
manager based on graphical algorithms).

Of course, you may not want to use this in an SMP kernel, since it's
only two orders of magnitude faster than Tuxedo, which used the same
algorithm as the BSD lockmgr code.

If you used a graphical soloution to locking in an SMP kernel, you'd
have to put up with your kernel being faster, more highly concurrent,
and unfortunately scalable to a larger number of processors than SVR4
or Solaris could every hope to support. Then where would you be?

Oh. Yeah. You'd be at the cutting edge. Couldn't have that...

Heh. 8-).

Terry Lambert
terry@lambert.org

---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message

----- End Included Message -----

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu