Re: Nanosecond fs timestamp support: sad

From: Paul E. McKenney
Date: Mon Jul 25 2011 - 11:09:47 EST


On Sat, Jul 23, 2011 at 08:59:15AM +1000, NeilBrown wrote:
> On Fri, 22 Jul 2011 18:31:58 -0400 "J. Bruce Fields" <bfields@xxxxxxxxxxxx>
> wrote:
>
> > On Fri, Jul 22, 2011 at 06:10:39PM -0400, bfields wrote:
> > > On Fri, Jul 22, 2011 at 11:47:32PM +0200, Andi Kleen wrote:
> > > > On Fri, Jul 22, 2011 at 04:11:42PM -0500, Matt Mackall wrote:
> > > > > On Fri, 2011-07-22 at 22:59 +0200, Andi Kleen wrote:
> > > > > > > Indeed. Only usefully exists on ext4 and requires extra system calls.
> > > > > >
> > > > > > Not sure what you mean? It's in stat(2), just like the timestamps.
> > > > >
> > > > > I don't see anything that looks like a version or generation number in
> > > > > either the man pages, the asm-generic/stat.h, or glibc's asm/stat.h.
> > > > > Pointer?
> > > >
> > > > Hmm you're right. I thought it was in there, but apparently not.
> > > > I think it should be added there though. We still have some unused
> > > > fields.
> > >
> > > But last I checked I thought it was only ext4 that actually incremented
> > > the i_version on IO, and even then only when given a (non-default) mount
> > > option.
> > >
> > > My notes on what needs to be done there:
> > >
> > > - collect data to determine whether turning on i_version causes
> > > any significant performance regressions.
> > > - Last I talked to him, Ted Tso recommended running
> > > Bonnie on a local disk, since it does a lot of little
> > > writes, which is somewhat of a worst case, as it will
> > > generate extra metadata updates for each write.
> > > Compare total wall-clock time, number of iops, and
> > > number of bytes (using some kind of block tracing).
> > > - If there aren't any problems, turn it on by default, and we're
> > > done.
> >
> > (Well,and talk the other filesystem implementors into doing it.)
> >
>
> But does anyone apart from NFSv4 actually *want* i_version as opposed to the
> more-generally-useful precise timestamps?
>
> If not, we probably should tell NFSv4 to use timestamps and focus on making
> them work well.
> ??
>
> The timestamp used doesn't need to update ever nanosecond. I think if it
> were just updated on every userspace->kernel transition (or effective
> equivalents inside kernel threads) that would be enough capture all
> causality. I wonder how that would be achieved.. I wonder if RCU machinery
> could help - doesn't it keep track of when threads schedule ... or something?

RCU does track thread scheduling, but currently only pays attention to
it if there is an RCU grace period in progress. It would be easy to
make it track more precisely, though, if that would help something.

That said, I suspect that Peter Zijlstra would be extremely unhappy with
any proposed change that (say) acquired a global lock on every thread
schedule. And I don't believe that he would be all that happy even with a
change that added a non-global lock acquisition to each context switch...

Thanx, Paul

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/