Re: The Central Mystery

Martin von Loewis (martin@mira.isdn.cs.tu-berlin.de)
Fri, 25 Jul 1997 00:46:32 +0200


> I guess in order of preference, either
> - Someone has already written a good explanation, or

Böhme et.al, Linux Kernel Programming, have about 40 pages on
memory management. Maybe this helps.

> What I want to figure out is how a file system brings data into memory
> in such a way that it's properly aligned and packed into MMU-sized
> (e.g. 4K) pages despite the huge variability in the number of ways that
> it can get there. How do permissions get checked and used? How
> do writes get propagated back to the file system?

There are several ways how a file system can support mmap:
a) implement a file mmap operation
b) use the generic mmap, but implement the inode's readpage
c) use the generic readpage, and support the inode's bmap

Upon sys_mmap, the system calls the file's mmap operation. In case a),
the file system needs to fill a vm_area_struct. In particular, it needs
to set the operations pointer. Upon page fault, swap-out, write-back
and so on the system will then call the operations of the vm area. In
particular, upon page read, the nopage operation is called.

The generic nopage then calls the inode's readpage function, asking
for the page. The readpage function is asked to fill the page. It's
entirely up to the fs how it achieves this. If the file is smaller than
the page, the file system should fill only the beginning of the page.

Most file systems use the generic readpage here. This in turn calls
bmap. bmap translates the file offset to a block number on the device
where the inode resides, and the page data better starts on a block
boundary (or the file system cannot use generic_readpage).
generic_readpage then fakes a buffer that shares the data with page
to fill, and performs the actual IO.

The permissions are checked the first time when opening the file.
On the mmap call, several checks are performed in do_mmap to verify
that the requested mapping matches the mode of the file.

Since Larry was asking for read/write: In Linux, read/write are still
separate file operations, since not every file system needs to support
mmap (and there are files w/o a file system behind them). Those file
systems that do support readpage (directly or indirectly via bmap)
don't need to implement read/write, they can use the generic file
read/write routines instead.

Please note that this is a Linux 2.1 description; in Linux 2.0, things
are slightly different.

Regards,
Martin