Re: [RFC] Tux3 for review

From: Daniel Phillips
Date: Mon Jun 23 2014 - 20:19:44 EST


On Saturday, June 21, 2014 12:29:01 PM PDT, James Bottomley wrote:
On Thu, 2014-06-19 at 14:58 -0700, Daniel Phillips wrote:
On Thursday, June 19, 2014 2:26:48 AM PDT, LukÃÅ Czerner wrote:
...

the concern has always been how page forking interacted with writeback.

More accurately, that is just one of several concerns that Tux3
necessarily addresses in order to benefit from this powerful
optimization. We are pleased that the details continue to be of
general interest.

Direct IO is a spurious issue. To recap: direct IO does notintroduce any new page forking issues. All of the page forking
issues already exist with normal buffered IO and mmap. We have little interest and scant available time for heading off on a tangent to implement direct IO at this point just as a precondition for merging.
...

The specific concern is that page forking cannot be made to work
with direct io. Asserting that it doesn't cause any additional
problems isn't an answer to that concern.

Yes it is. We are satisfied that direct IO introduces no new issues
with page forking. If you are concerned about a specific issue then the onus is on you to specify it.

Direct IO isn't actually a huge issue for most filesystems (I mean
even vfat has it).

You might consider asking Hirofumi about that (VFAT maintainer).

...The fact that you think it is such a huge deal...

(Surely you could have found a less disparaging way to express
yourself...)

...to implement for tux3 tends to lend credence to this viewpoint.

It is purely a matter of concentrating on what is actually important, as opposed to imagined or manufactured. We do not wish to spend time on direct IO at this point in time. If you have identified a specific issue then please raise it.

For the record, there is a genuine reason why direct IO requires
extra work for Tux3, which has nothing to do with page forking. Tux3 has an asynchronous backend, unlike any other local Linux filesystem (but like Matt Dillon's Hammer, from which we took inspiration). Direct IO thus requires implementing a new synchronization mechanism to allow frontend direct IO to use the backend allocation and writeback mechanisms, because direct IO is synchronous. There is nothing new, magical or particularly challenging about that, it is just time consuming work that we do not intend to do right now because other more important things need to be done.

In the fullness of time, Tux3 will have direct IO just like VFAT,
however that work is a good candidate for post-merge development. For example, it could be a good ramp-up project for a new team member or a student looking to make their mark on the kernel world.

The bottom line is that direct IO has nothing to do with compiling
the kernel or operating a cell phone efficiently, so it is not interesting to us right now. It will become more interesting when Tux3 is ready to scale to servers running Oracle and the like.

The point is that if page forking won't work with direct IO at
all, then it's a broken design and there's no point merging it.

You can rest assured that direct IO will work with page forking,
given that buffered IO does. We are now discussing details of how to make core Linux a more hospitable environment for page forking, not whether page forking can be made to work at all, a question that was settled by example some time ago.

On the other hand, page forking itself has a number of
interesting issues. Hirofumi is currently preparing a set of core kernel patches for review. These patches explicitly do not attempt to package page forking up into a nice and easy API that other filesystems could patch in tomorrow. That would be an unreasonable research burden on our small development team.
...

OK, can we take a step back and ask why you're so keen to push
this into the tree?

If you mean, why are we keen to merge Tux3, I should not need to
explain that to you.

If you mean, why are we keen to push page forking per se into
mainline, then the answer is, we are by no means keen to push page forking into core kernel. Rather, that request comes from other filesystem developers who recognize it as a plausible way to avoid the pain of stable pages.

Based on our experience, page forking is properly implemented within
the filesystem, not core kernel, and we are keen only to push the requisite hooks into core. If somebody disagrees and feels the need to prove their point by implementing page forking entirely in core, then they should post patches and we will be the first to applaud.

The usual reason is ease of maintenance because in-tree
filesystems get updated as the vfs and mm APIs change. However,
the reciprocal side of that is using standard VFS and MM APIs to make this update and maintenance easy. The reason no-one wants
an in-tree filesystem that implements its own writeback by hacking into the current writeback system is that it's a huge maintenance burden.

Every filesystem is a maintenance burden. Core kernel simply must
provide the mechanisms that are required to make the kernel a good place for filesystems to exist. The fact that some ancient core hackery needs to be tweaked to better accommodate the requirements of a modern filesystem is not unusual in any way. Essentially, that is the entire story of Linux kernel development.

Every time writeback gets tweaked, tux3 will break meaning either we double the burden on people updating writeback (to try to figure out how to replicate the change in tux3) or we just accept that tux3 gets broken.

No. Tux3 will be less of a burden for writeback maintenance than
other filesystems because it hooks in above the messy writepages machinery and therefore is not sensitive to subtle changes in that creaky code.

The former is unacceptable to the filesystem and mm people and the
latter would mean there's not really much point merging tux3 if we
just keep breaking it ... it's better to keep it out of tree
where the breakages can be fixed by people who understand them on their own timescales.

On the face of it you are arguing the case that Tux3 should be blocked from merging forever, as should every new filesystem, as Pavel succinctly pointed out. That is less than helpful. But if your goal is to buttress the public perception that LKML has
become a toxic forum for contributors then you do an admirable
job.

By the way, after reading your polemic an observer might draw the conclusion that I am not one of the "filesystem and mm people". When did that change?

...
That was already fixed as noted above, and all the relevant
changes were already posted as an independent patch set. After
that, some developers weighed in with half formed ideas about how the same thing could be done better, but without concrete suggestions. There is nothing wrong with half formed ideas, except when they turn into a way of blocking forward progress
...

Could you post the url to the new series, please, I must have missed it; seeing the patches that implement the API for insertion into the writeback code would certainly help frame
this discussion.

We think that our most recently posted patch is the best approach at this time. Which is to say that it relies on exactly the existing writeback scheduling heuristics. We think that Dave Chinner and others are wrong to advocate experimental development of a new writeback mechanism at this juncture while the current scheme already works perfectly well for Tux3, either with our writeback hack or with the new hook.

We further suggest that the new hook is easy to understand and
imposes insignificant new maintenance burden. In any case we will be happy to assume whatever maintenance burden might arise. Obviously, that is entirely academic while we are the only user.

It is worth noting that we (the kernel community) have been
thrashing away at the writeback problem for more than twenty years, and the current solution still leaves much to be desired. It is unfair to expect us, the Tux3 team, to fix that mess in a week or two, just to merge our filesystem. We prefer to adapt the existing infrastructure for now, as expressed in the currently proposed patch set. With that, we allow core to mark our inodes dirty just as it has always done, and we continue to use the usual inode writeback lists for writeback
scheduling, which work just fine.

So that's a misunderstanding of expectations...

I did not misunderstand. It is clear from the context you deleted
that we are being pushed to engineer a new core writeback mechanism instead of adapting the existing one.

...the actual expectation is that you won't make the writeback
problem more difficult to tackle.

We do not make the writeback problem more difficult, which is obvious from the patch.

Reimplementing writeback within your code in a way that's hacked
into the system is fragile and burdensome ... it becomes double the code to maintain ... and tux3 breaks if its not updated.

You are preaching to the converted. As you know, we posted a patch
set that eliminates this particular instance of core duplication. Upcoming patches will eliminate the remaining core duplication. It is unnecessary to belabor that point further.

Regards,

Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/