Re: new dentry feature in 2.1.78

Colin Plumb (colin@nyx.net)
Sat, 10 Jan 1998 02:35:17 -0700 (MST)


Bill Hawes wrote:
> Just wanted to mention that there's a new feature in the 2.1.78 dcache
> that may be useful for some of the filesystems. It's a void * pointer to
> a private area, initialized to NULL when the dentry is created but
> otherwise unmolested for the life of the dentry.

> Typically you would create your private data structure and install it in
> d_fsdata in the fs lookup routine, or possibly in a d_hash routine. To
> free it, you need to have a dentry_ops d_release operation, which is
> then called at d_free time.

> For examples take a look in nfs/dir.c. The d_fsdata area is now being
> used to store the NFS filehandle in the dentry instead of in the inode.

You know, this, and the way that Linux does inodes, always seemed silly
to me. Why make all dentries and all iniodes the same size? Why not
do it like C++, where there is a common leading part (the base class)
and a type-dependent following part (for the derived class)?

Doing this requires that they be allocated and freed by type-dependent code,
but that doesn't seem hard at all. You just need a separate slab cache per
type (or per size, at least).

Ths inode_info structire sizes are as follows, on 32-bit machines:
affs_inode_info: 10 words
ext2_inode_info: 28 words
hfs_inode_info: 9 words
hpfs_inode_info: 8 words
iso_inode_info: 2 words
minix_inode_info: 16 words
msdos_inode_info: 17 words (pipe_inode_info + 9)
nfs_inode_info: 13 words (pipe_inode_info + 5)
ntfs_inode_info: 10 words
pipe_inode_info: 8 words
romfs_inode_info: 2 words
smb_inode_info: 6 words
sysv_inode_info: 13 words
ufs_inode_info: 23 words
umsdos_inode_info: 21 words (mdsos_inode_info + 4)

H'm... you know, I think I'm wrong. ext2_inode_info is going to be by
far the most common type, and it's the largest. (The Unix-style file
systems all have 13 to 15 direct and indirect block pointers in a data[]
array, plus miscellaneous flags, and the ext2 inode has some extra info
about block groups, allocation, and preallocation as well.)

A struct inode's common part is 33 words. (Since list_head is 2 words,
a struct semaphore is 3, and MAXQUOTAS is 2.) This could be reduced to 32
words with a little rearranging: there are a bunch of 16-bit words near
the front:
kdev_t i_dev;
unsigned short i_count;
umode_t i_mode;
nlink_t i_nlink;
uid_t i_uid;
gid_t i_gid;
kdev_t i_rdev;
which you'll note is an odd number, leaving a 16-bit hole, and then later comes
unsigned int i_flags;
unsigned char i_pipe;
unsigned char i_sock;

Also leaving a 16-bit hole.

Anyway, while the extra space reclaimed might be nice (an ISO inode would
then be 35/34 words rather than 61/60), given that ext2 is both largest
and most common.

(Minor performance hack: in fs/ext2/balloc.c:ext2_new_block is some code
that accesses *prealloc_count a lot when I think that (k-1) would be
a suitable synonym. Would the following diff be an improvement?

--- balloc.c Sat Jan 10 01:12:10 1998
+++ balloc2.c Sat Jan 10 01:13:07 1998
@@ -452,34 +452,30 @@

/*
* Do block preallocation now if required.
*/
#ifdef EXT2_PREALLOCATE
if (prealloc_block) {
- *prealloc_count = 0;
*prealloc_block = tmp + 1;
for (k = 1;
k < 8 && (j + k) < EXT2_BLOCKS_PER_GROUP(sb); k++) {
if (sb->dq_op)
if (sb->dq_op->alloc_block(inode, fs_to_dq_blocks(1, sb->s_blocksize)))
break;
if (ext2_set_bit (j + k, bh->b_data)) {
if (sb->dq_op)
sb->dq_op->free_block(inode, fs_to_dq_blocks(1, sb->s_blocksize));
break;
}
- (*prealloc_count)++;
}
+ *prealloc_count = --k;
gdp->bg_free_blocks_count =
- cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count) -
- *prealloc_count);
+ cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count) - k);
es->s_free_blocks_count =
- cpu_to_le32(le32_to_cpu(es->s_free_blocks_count) -
- *prealloc_count);
- ext2_debug ("Preallocated a further %lu bits.\n",
- *prealloc_count);
+ cpu_to_le32(le32_to_cpu(es->s_free_blocks_count) - k);
+ ext2_debug ("Preallocated a further %lu bits.\n", k);
}
#endif

j = tmp;

mark_buffer_dirty(bh, 1);

You could also economize on calls to dq_op operations and do:
--- balloc.c Sat Jan 10 01:12:10 1998
+++ balloc2.c Sat Jan 10 01:29:54 1998
@@ -448,38 +448,35 @@
goto repeat;
}

ext2_debug ("found bit %d\n", j);

/*
- * Do block preallocation now if required.
+ * Do block preallocation now if required. (Up to 8 blocks.)
*/
#ifdef EXT2_PREALLOCATE
- if (prealloc_block) {
- *prealloc_count = 0;
+#define MAX_PREALLOC 8
+
+ if (prealloc_block &&
+ (!sb->dq_op ||
+ sb->dq_op->alloc_block(inode, fs_to_dq_blocks(MAX_PREALLOC, sb->s_blocksize)))) {
*prealloc_block = tmp + 1;
for (k = 1;
- k < 8 && (j + k) < EXT2_BLOCKS_PER_GROUP(sb); k++) {
- if (sb->dq_op)
- if (sb->dq_op->alloc_block(inode, fs_to_dq_blocks(1, sb->s_blocksize)))
- break;
- if (ext2_set_bit (j + k, bh->b_data)) {
- if (sb->dq_op)
- sb->dq_op->free_block(inode, fs_to_dq_blocks(1, sb->s_blocksize));
+ k < MAX_PREALLOC && (j + k) < EXT2_BLOCKS_PER_GROUP(sb);
+ k++) {
+ if (ext2_set_bit (j + k, bh->b_data))
break;
- }
- (*prealloc_count)++;
}
+ *prealloc_count = --k;
+ if ((k < MAX_PREALLOC && sb->dq_op)
+ sb->dq_op->free_block(inode, fs_to_dq_blocks(MAX_PREALLOC-k, sb->s_blocksize));
gdp->bg_free_blocks_count =
- cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count) -
- *prealloc_count);
+ cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count) - k);
es->s_free_blocks_count =
- cpu_to_le32(le32_to_cpu(es->s_free_blocks_count) -
- *prealloc_count);
- ext2_debug ("Preallocated a further %lu bits.\n",
- *prealloc_count);
+ cpu_to_le32(le32_to_cpu(es->s_free_blocks_count) - k);
+ ext2_debug ("Preallocated a further %lu bits.\n", k);
}
#endif

j = tmp;

mark_buffer_dirty(bh, 1);

I.e. try to dq_op->alloc_block 8 blocks and then free back what is unused.

Anyway, I think I've rambled enough. But amazingly, I've stayed on the
subject of the Linux kernel.

-- 
	-Colin