Re: kdev_t

Andries.Brouwer@cwi.nl
Thu, 7 Oct 1999 14:30:59 +0200 (MET DST)


> Do you still have a [really old] patch, which makes kdev_t a struct?

Hmm. The real work was done around 1.3.20, and most patches
are very obsolete today. I'll quote some fragments below.

Project 1: compile the kernel with kdev_t a struct,
so that gcc will help catch places where kdev_t is used improperly.
This is trivial:

typedef struct { int major, minor; } kdevtt, *kdev_t;

static inline int
MAJOR(kdev_t dev) {
return dev->major;
}

static inline int
MINOR(kdev_t dev) {
return dev->minor;
}

extern kdevtt mamis[10000];
extern int mamimax;

static inline kdev_t
MKDEV(unsigned int ma, unsigned int mi) {
int i;

for (i=0; i<mamimax; i++)
if (mamis[i].major == ma && mamis[i].minor == mi)
return &(mamis[i]);
if (mamimax < 10000) {
mamimax++;
mamis[i].major = ma;
mamis[i].minor = mi;
return &(mamis[i]);
}
panic("out of mamis...");
}

(this in kdev_t.h, and the actual declaration of mamis and mamimax
in init/main.c or so). The resulting kernel also runs, as long as
you don't need more than 10000 devices.

When Project 1 is completed, and the kernel source is kdev_t clean
(my memory is bad, but I recall that most places are OK. but
there were nontrivial problems with NFS. I once submitted a patch
that fixed these using a layer of defines but it was not used,
maybe because Linus considered it ugly, maybe because he never
looked at it - this kind of uncertainty makes it a bit difficult
to work on this), you start Project 2.

At present MKDEV is just a simple arithmetical operation,
but the new MKDEV must return the struct that belongs
to the given major and minor.
But there are two of them! A block device and a character device.
Presently this last bit of information is implicitly present -
in drivers/block you talk about block devices, in drivers/char
about character devices. But we'll need it explicitly.
Thus, I have patches like

--- ../linux-1.3.84/linux/arch/sparc/kernel/setup.c Mon Mar 4 07:49:56 1996
+++ ./linux/arch/sparc/kernel/setup.c Mon Apr 8 01:17:42 1996
@@ -283,9 +283,9 @@
#if 1
/* XXX ROOT_DEV hack for kgdb - davem XXX */
#if 1
- ROOT_DEV = MKDEV(UNNAMED_MAJOR, 255); /* NFS */
+ ROOT_DEV = MKBDEV(UNNAMED_MAJOR, 255); /* NFS */
#else
- ROOT_DEV = 0x801; /* SCSI DISK */
+ ROOT_DEV = MKBDEV(8, 1); /* SCSI DISK */
#endif

#endif
--- ../linux-1.3.84/linux/drivers/char/tpqic02.c Fri Mar 1 06:50:41 1996
+++ ./linux/drivers/char/tpqic02.c Mon Apr 8 01:38:57 1996
@@ -2887,7 +2887,7 @@

QIC02_TAPE_DEBUG = TPQD_DEFAULT_FLAGS;

- current_tape_dev = MKDEV(QIC02_TAPE_MAJOR, 0);
+ current_tape_dev = MKCDEV(QIC02_TAPE_MAJOR, 0);

#ifndef CONFIG_QIC02_DYNCONF
printk(TPQIC02_NAME ": IRQ %d, DMA %d, IO 0x%x, IFC %s, %s, %s\n",

that change all calls of MKDEV() into either MKBDEV() or MKCDEV()
or MKXDEV(), where the latter occurs in a fs context, reading
special device nodes.
The entire patch is uninteresting - you would have to redo it
on a current kernel.

Then what do MKBDEV() and MKCDEV() do?
They find the majorstruct by hashtable lookup
and it contains a pointer to a fixed array of minors
in case the number of minors is known beforehand,
and also a pointer to a routine that will return
the minorstruct in case things are dynamic.

Clearly, this is an order of magnitude more work
than the simple OR that MKDEV is today, and it becomes
interesting to avoid calling MKDEV as much as possible.
Many present uses can be eliminated: usually one has the device,
extracts some information, and later assembles things again
with MKDEV. Never forgetting the device one started with
usually makes the MKDEV() superfluous.
Sometimes this changes the setup a little:

--- ../linux-1.3.84/linux/drivers/char/n_tty.c Sat Oct 28 14:19:02 1995
+++ ./linux/drivers/char/n_tty.c Mon Apr 8 00:18:44 1996
@@ -38,7 +38,7 @@
#include <asm/system.h>
#include <asm/bitops.h>

-#define CONSOLE_DEV MKDEV(TTY_MAJOR,0)
+extern kdev_t CONSOLE_DEV;

#ifndef MIN
#define MIN(a,b) ((a) < (b) ? (a) : (b))

--- ../linux-1.3.84/linux/drivers/char/tty_io.c Mon May 6 20:11:29 1996
+++ ./linux/drivers/char/tty_io.c Mon Apr 8 01:37:49 1996
@@ -77,8 +77,8 @@
#include <linux/kerneld.h>
#endif

-#define CONSOLE_DEV MKDEV(TTY_MAJOR,0)
-#define TTY_DEV MKDEV(TTYAUX_MAJOR,0)
+kdev_t CONSOLE_DEV = 0; /* also referred in n_tty.c */
+static kdev_t TTY_DEV = 0;

#undef TTY_DEBUG_HANGUP

@@ -1791,6 +1791,10 @@
*/
long console_init(long kmem_start, long kmem_end)
{
+ /* Create the kdev_t structures */
+ CONSOLE_DEV = MKCDEV(TTY_MAJOR,0);
+ TTY_DEV = MKCDEV(TTYAUX_MAJOR,0);
+
/* Setup the default TTY line discipline. */
memset(ldiscs, 0, sizeof(ldiscs));
(void) tty_register_ldisc(N_TTY, &tty_ldisc_N_TTY);

Project 2, the separation of MKDEV into MKBDEV, MKCDEV and MKXDEV
can be done today, and the corresponding patch could be applied,
as long as one puts
#define MKBDEV MKDEV
#define MKCDEV MKDEV
#define MKXDEV MKDEV
in kdev_t.h.

Project 3 is about inventing the device structure.
The non-creative approach takes all the arrays
that today are indexed by major, and creates corresponding
fields in the device struct. This yields fields like
blk_dev
blk_size
blksize_size
hardsect_size
read_ahead
max_readahead
max_sectors
max_segments
and more (search for MAX_BLKDEV in the current source;
each array foo_t bar[MAX_BLKDEV] becomes a field foo_t bar
in the major struct; each array like blk_size[MAJOR][MINOR]
becomes a field in the minor struct and possibly a default field
in the major struct).
I have used several versions, but there was also always a
routine that returns the name of the device, to be used
for error messages.

Here in some very old source I see

+typedef struct kdev_struct {
+ unsigned int minor;
+ unsigned long size;
+ unsigned long blksize;
+ struct driver_struct * driver;
+} *kdev_t;
+
+struct driver_struct {
+ unsigned int major;
+ unsigned int xor_mask, not_mask;
+ char * (* namefn)(kdev_t dev);
+ kdev_t (* getkdev)(struct driver_struct *major, unsigned int minor);
+ struct driver_struct * next;
+ struct driver_struct * prev;
+};
+
+static inline kdev_t MKDEV(unsigned int ma, unsigned int mi) {
+ struct driver_struct *m = get_driver(ma);
+ return m->getkdev(m, mi);
+}

where it looks like kdev_struct is what I now call minor_struct
and driver_struct is what I now call major_struct.
Here most of the driver_structs are allocated statically
in the corresponding drivers.

[Note new nomenclature: blk_size became size, and blksize_size
became blksize. I forget what these masks were used for, but two
things come to mind: (i) one needs to go from a given disk device
to the corresponding whole disk device, and (ii) the floppy disk
device has a strange sparse set of minors.]

In other patches I also see dev->blksize_bits used.
If I am not mistaken, the resulting code is shorter and faster
than what we use today.

In other patches I see in fs/super.c:put_unnamed_dev() a call
to putkdev(). If getkdev allocates memory in case the minor
did not exist yet, then it is reasonable to have putkdev
free it again. But except in this case it is not clear
when precisely we can forget about a device.

My approach has always been to invent driver_struct and minor_struct
and do a completely mechanical translation towards the new
data representation, so that it is clear that the new code
is correct if and only if the old code was correct.
Later one can play and reorganize things again.

Then there is Project 4. The stat system call.
Here I also submitted patches once or twice.
There are three new system calls versioned_stat,
versioned_lstat, versioned_fstat, and code like

+typedef void (*cp_stat_fn)(struct inode *, void *);
+
+static struct {
+ unsigned int statbuf_size;
+ cp_stat_fn cp_stat;
+} stat_versions[] = {
+ { sizeof(struct old_stat), (cp_stat_fn) cp_old_stat },
+ { sizeof(struct new_stat), (cp_stat_fn) cp_new_stat },
+ { sizeof(struct longdev_stat), (cp_stat_fn) cp_longdev_stat }
+};
+#define SIZE(a) (sizeof(a) / sizeof((a)[0]))
+
+asmlinkage int sys_versioned_stat(unsigned int version,
+ char * filename, void * statbuf)
{
struct inode * inode;
int error;

- error = verify_area(VERIFY_WRITE,statbuf,sizeof (*statbuf));
+ if (version >= SIZE(stat_versions))
+ return -EINVAL;
+ error = verify_area(VERIFY_WRITE, statbuf,
+ stat_versions[version].statbuf_size);
if (error)
return error;
error = namei(filename,&inode);
if (error)
return error;
- cp_new_stat(inode,statbuf);
+ (* stat_versions[version].cp_stat)(inode,statbuf);
iput(inode);
return 0;
}

in fs/stat.c, and of course

+struct longdev_stat {
+ unsigned long st_dev;
+ unsigned long st_ino;
+ unsigned short st_mode;
+ unsigned short st_nlink;
+ unsigned short st_uid;
+ unsigned short st_gid;
+ unsigned long st_rdev;
+ unsigned long st_size;
+ unsigned long st_blksize;
+ unsigned long st_blocks;
+ unsigned long st_atime;
+ unsigned long __unused1;
+ unsigned long st_mtime;
+ unsigned long __unused2;
+ unsigned long st_ctime;
+ unsigned long __unused3;
+ unsigned long __unused4;
+ unsigned long __unused5;
+};

These days several more needs for a different stat structure
have arisen. Maybe st_mode needs more bits. Certainly st_size
needs more bits.

It is easy to tweak libc/glibc to use these new system calls.
There are provisions already for a versioned stat.

So far about the kdev_t stuff from the 1.3.* ages.
It is still a relatively small project, and something
that can be done in several more or less independent
steps.

Now that I spent the effort of writing the above, let
me cc to linux-kernel; maybe there are people who want
to reimplement some version of this and look whether
Linus likes it.

Best regards, Andries

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/