file ID redesign proposal

Ingo Molnar (mingo@pc7537.hil.siemens.at)
Wed, 26 Nov 1997 12:28:25 +0100 (MET)


On Tue, 25 Nov 1997, Dean Gaudet wrote:

> Please keep fcntl(F_DUPFD) in mind ... [...]

ok, we have to support file sharing, to support:

- fcntl(F_DUPFD)
- fork()
- all the in-kernel fs daemons: NFS, SMBFS, NCPFS
- file descriptor passing over unix domain sockets
- if we want Wine to implement Win32 processes and files properly

The following proposal might sound a bit exterme, but dont give up too
early ... i think we Really Have A Problem, and i tried to make the design
complete, so far i think it covers all the cases.

the scheme is based on the following concept: we split the file structure
into two parts, a 'connection part', and a 'core part'. The 'connection
part' is a per-process thing, and it gets added to the 'fd hash table',
and can be duplicated at will. The connection part is also responsible for
'free file ID' management and exit()/exec() handling. Note that there are
no bitmaps and bitmap scans in this scheme.

The 'core part' contains all the usual file info, dentry, operations,
mode, flags, offset, readahead context, etc. Core part is only present
once, and is not part of any global hash table. This core part is not
bound to any process, but usually only one connection part points to it.

Say if a file that is shared amongst 3 processes, 3 connection parts are
pointing to a single core part:

[conn 1/A] ------->
[conn 1/B] -------> [core 1]
[conn 1/C] ------->

access to 'file fields' is identical to former access methods, but we have
to 'get' the core part first:

instead of:

filp = current->files->fd[fd]; (*)

we will do:

filp = get_file_core(fd,current->files);

'get_file_core()' is O(1) and very fast. Such 'get' operations (*) are
already kindof separated and are present in very highlevel VFS levels, so
the 'impact' of this design does not seem to be _that_ large. So far _all_
file related operations are O(1) with this scheme. (and their constant
offset cost seems to be low as well)

NOTE: the only problem i see with this scheme (so far), are we required to
guarantee first-free-file-ID-allocation of file IDs?

if we can allocate file IDs in any order, then we can get rid of the
bitmap, and can make the whole thing completely dynamic, from
the kernel's point of view. Allocating a unique file ID can be done
in an O(1) manner by extending the 'connection part' with a ringlist, and
adding a 'last allocated file ID' special connection buffer, and
recirculating freed file IDs.

- if we have to guarantee forward-allocation, then we have a big fat
scalability problem... IMO the first-free-ID constraint unfortunately
mathematically adds hidden ordering to the data structure of files, and
destroys O(1), we have to use nasty/complex O(log(N)) algorithms a'la
vmas.

is there anything i've missed?

-- mingo

ps. if you are unhappy with the 'split', then think about it, it's the
only way, really. our current tables and bitmaps, and those proposed
self-extending tables really just hide this problem in an
ugly/nonscalable way.