Re: Detecting changed files for JIT/metadata cacheing

Jamie Lokier (lkd@tantalophile.demon.co.uk)
Mon, 28 Jun 1999 17:10:51 +0200


Rich Paul wrote:
> Sounds rather like some things that I've been interested in. Especially
> the fact that changes would flow back up to /, so that you could check
> if any file on the system had changed by stating one directory. Not too
> useful in and of itself, but imagine a make system that could skip a
> whole directory tree because it KNOWS that nothing there has changed.

Quite. I've already done that with `find -newer' and a timestamp in the
base directory, but on my huge tree that was still the dominant time!

(BTW it's not as reliable as you'd think to just trim directories like
that -- many build processes have subdirectories that depend on their
parents, or sibling directories).

> 1) How many bits do you want for this thing. /, /usr, and /home might
> fly by faster than the eye can see.
>
> 2) If a program has a file open, is this number updated with each write
> ( which would really be each flush, close enough ) or only when the file
> is closed?

No no.

- Initial inode state is Modified.
- The serial number is
updated once on transition from Not-Modified to Modified -- this
only happens when the file is modified and is in the Not-Modified state.
I.e. normally this never happens.
- The Not-Modified->Modified transition causes the parent directory
to be checked -- if that's marked Not-Modified do the same thing
there and so on. Again this does not usually happen, so no
overhead in the usual case.
- Reading the serial number sets the flag to Not-Modified.

So unless you're actively reading the serial number, the file stays in
the Modified state, the number stays the same, and there's no there's no
disk overhead.

This means if you don't read the number too often, it increments only
slowly.

> 3) Does a serial number or a date make more sense? A date would have
> limited resolution ( maybe 1 second or 1/100 second, or 1/1000 second,
> with a 64 bit field there would be PLEANTY or room ) a serial number
> would be more atomic ... but a make system would have to know the change
> count of all the files that went into the file that it's trying to
> build.

A date would be nicer in some ways but it'd need a high resolution.
64 bits is more than there is available. OTOH 32 bits are already
available as a serial number.

> 4) What about publish/subscribe: Imagine this: program creates a
> named
> pipe or unix socket, and says "Any time somebody writes to this file, or
> this directory, or this directory and any subdirectory, write a couple
> of bytes to this file ... possibly an integer handle that maps back to
> my internal map of files. This would be quite useful in something like
> a file manager ... somebody creates a new file and it appears
> immediately
> in the browser, without the browser polling.

I think that's a different issue entirely.

> 7) One more random thought ... wouldn't it be interesting if one could
> set a property on a file, and the kernel would then create AND MAINTAIN
> a md5 hash of that file ... so that you could do a system
> call const char*md5stat(const char *), and get back a checksum, and
> whenever the file has data written to it, the hacksum would be changed.
> It would be slower than an integer counter, but if you copied a file
> from one place to another, the checksum would be the same in the new
> place. It would also slow down writing to a file, since the md5sum
> would have to be updated with each flush ( that's fflush() not sync()

Ugh. My proposal lets you do this efficiently in user space.

-- Jamie

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/