Re: mmap on a device returns ENODEV

Andrea Arcangeli (andrea@suse.de)
Fri, 10 Dec 1999 17:42:39 +0100 (CET)


On Fri, 10 Dec 1999, Ingo Molnar wrote:

>the attached small patch against 2.3.32-pre2 adds all pagecache blocks
>that have established mappings to the buffer-cache hashlists (and can thus

Your patch is buggy as you forgot to hash the buffers inside the
block_*write* stuff.

For brw_page (in create_page_buffers()) we probably don't care to take the
buffer hashed and I think you can safely remove such change from your
patch. 2.2.x is not doing that either. This mean you can't swapout on
raid5 of course, but that is a minor issue IMHO.

>Is it bad to synchronize access through the buffer-cache? I dont think so

I remeber well the argument: the new pagecache will run fsync reducing the
complexity of an order of magnitude for reaching the metadata with an
almost always empty buffer hash so no need of per-inode dirty buffer-queue
as we are just fast enough compared to 2.2.x. Also the buffer hash size is
been shrunk in 2.3.x for this reason. With your proposal we return to
2.2.x regime and we have the buffer hash filled with both data and
metadata.

>can you (or anyone else) see any problems with this approach?

The problem I see is that today you say:

>[..] It works fine here and there
>is no noticeable slowdown anywhere. [..]

and yesterday there was the argument that an almost empty buffer hash
(with only the metadata buffers in it) was one of the main feature/huge
improvements of the new page cache design (mainly for fsync).

So there are two possibility: or you are wrong now or the previous
argument was a red herring.

I quote here one Linus's email of some month ago that I remeber well. I
agree with Linus and I just think you are wrong assuming it make an not
noticeable slowdown anywhere.

On Fri, 18 Jun 1999, Linus Torvalds wrote:
>
>Date: Fri, 18 Jun 1999 16:59:30 -0700 (PDT)
>From: Linus Torvalds <torvalds@transmeta.com>
>To: Stephen C. Tweedie <sct@redhat.com>
>Cc: Andrea Arcangeli <andrea@suse.de>, Jaroslav Kysela <perex@suse.cz>,
> linux-kernel@vger.rutgers.edu, Pavel Machek <pavel@bug.ucw.cz>
>Subject: Re: [patch] `cp /dev/zero /tmp' (patch against 2.2.9)
>
>
>
>On Sat, 19 Jun 1999, Stephen C. Tweedie wrote:
>>
>> Yes, we do. The penalty of having to walk the entire inode indirection
>> tree to find dirty indirection buffers is still there otherwise, and
>> that is really expensive for large files (although not so expensive as
>> it used to be when we needed to do a buffer lookup for each entry).
>
>It's cut down by a factor of thousand on a 4k filesystem.
>
>AND on top of that the lookup is faster anyway, because there are much
>fewer buffers on the hashes.
>
>> The
>> per-inode dirty buffer list eliminates that scan entirely, and allows us
>> to unify the O_SYNC and fdatasync code.
>
>Why do you want that unified? Have you looked at my stuff - it does
>fdatasync() very cleanly.
>
>> The other thing I'll fix is the wait-on-page ordering: we want to wait
>> in the reverse order that we submit the IOs in, to try to minimise the
>> number of unnecessary wakeups we encounter.
>
>I doubt that really matters, but that can be timed.
>
> Linus
>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.rutgers.edu
>Please read the FAQ at http://www.tux.org/lkml/
>

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/