Re: Suggestion for O_DIRECT and friends

Jeremy Fitzhardinge (jeremy@goop.org)
Mon, 21 Dec 1998 09:41:36 -0800 (PST)


On 21-Dec-98 Jamie Lokier wrote:
> A good read-ahead implementation will catch on to this very quickly by
> itself. Discarding the pages immediately after use is trickier... Then
> again, how can an app be absolutely sure you're not about to stop it and
> restart it again at the beginning?

Any reasonably-sized media stream is going to be way bigger than physical
memory, and even if it isn't, not worth evicting everything else for. By
definition, if you're streaming off disk, then you have enough IO bandwidth for
the media stream.

After all, caching and LRU replacement are only useful if you do in fact use the
data again. The wins are large when you get a cache hit, which covers the fact
that caching needlessly is surprisingly expensive.

Perhaps what you need to deal with this case is caching control: something to
say "unless this page is otherwise resued, consider it to be the oldest thing
in the cache, and therefore replaceable at the first opportunity. On the other
hand, it is in the cache, and you can get a cache hit on it". This would
prevent the cache from getting polluted with mostly useless data which is
unlikely to be re-read, and prevents useful things from being evicted.

>> Pages that have been sent to the appl. can be freed immediatly.
>
> Both the application and the kernel can guess at this, but neither can
> be sure in most cases. No media streaming program can be sure you don't
> want to use the data again, though you might trust the hint a bit more.

Most applications can't or won't guess at this, but there's a large useful
class which can and will, accurately.

Having the kernel guess is somewhat hard. I guess if you clear the ref bits on
pages which have recently read into, you can tell whether the application seems
to be rereading stuff or not. Of course, mostly the application will be
reading from a large file into a moderate buffer, and rereading the buffer.
This means the problem is one of getting the kernel to guess which parts of the
page cache are likely to be used again in the future (small, often used files
which are worth caching) and those which arn't (files larger than physical
memory which are only ever read serially).

I suppose if you keep stats on how successful read-ahead is, and the number of
times a single process breaks a linear read with a seek, you can guess at
future access of the file. Turning that into a robust self-tuning heuristic is
probably not trivial, however.

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/