fs/inode.c reimplementation

Thomas Schoebel-Theuer (schoebel@informatik.uni-stuttgart.de)
28 Mar 1997 17:10:31 GMT

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Ingo Molnar: "Re: SMP update: 2.1.30"
Previous message: Vince Nicotra: "How do I get off this mailing list"

Hi folks,

I needed to reimplement fs/inode.c due to some reasons I will explain later.
Here are the main features of the new implementation:

- dramatically faster
- allows more concurrency
- intended to be MP-safe (not tested)

Here are some numbers:

pristine linux-2.0.27 as delivered from RedHat, freshly booted:
[root@zeder /]# time du -s /usr
1000392 /usr
0.76user 20.32system 1:41.69elapsed 20%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (88major+43minor)pagefaults 0swaps
[root@zeder /]# time du -s /usr
1000392 /usr
0.90user 20.32system 0:23.88elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (81major+43minor)pagefaults 0swaps
[root@zeder /]#

patched 2.0.27 with transname enabled, also freshly booted:
[root@zeder /]# time du -s /usr
1000392 /usr
0.61user 12.06system 1:32.48elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (88major+43minor)pagefaults 0swaps
[root@zeder /]# time du -s /usr
1000392 /usr
0.62user 11.16system 0:16.52elapsed 71%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (81major+43minor)pagefaults 0swaps
[root@zeder /]#

The time spent in system mode is now halfed. The reimplementation does
everything with constant overhead, no list search with O(n) any more.
I had the new transname enabled when doing the test; disabling it
will even more speed it up. Note that I have not included the new
transname in this posting; an old version is on sunsite for those
who want to be informed what it is (grep for filenames including
the substring "transname").

Now for the reason why I needed to reimplement it: Andreas Luik has
produced omirrd, a daemon that allows symmetric online mirroring over
the internet. Any changes written to the own disk are mirrored to the other
machines as soon as the file is closed; in case of concurrent updates
the most recent timestamp wins (as with local files modified by concurrent
processes). An early alpha version can be found at
ftp://ftp.isa.de/pub/home/luik/omirr, but the stuff there is rather
outdated. The problem is that at the moment of a file close, the
pathname is not known any more (but needs to be transferred to the
remote site). Andreas implemented a kludge to let the daemon keep track
of all names, and I want to remove that. Another problem is that the
user-supplied names may be very different from the real locations
due to symlink redirection (and symlinks may be different on different
machines, if you do not mirror everything), and absolute paths may be
chroot()ed, and so on.

So I need a mechanism that remembers the *real* path names at kernel
level. To implement this, I need to reuse the inode cache for my new
purpose. Then I noticed that dcache also remembers names, but not all
of them. So my name remembering mechanism will also replace the current
dcache.

As a preparation for the new dcaching, I started the new fs/inode.c.
It is now in alpha state: it works on my machine with ext2, but is not
thoroughly tested. Be shure to not use it on valuable data.

First comes the new fs/inode.c, then a diff of include/linux/fs.h,
both based on version 2.0.27 (but should work with later versions).
Output messages not starting with "VFS:" (such as "_io grab") are debugging
messages that should be ignored; not all messages starting with "VFS:"
indicate serious problems, but let me know of it.

I'm in particular interested in comments on the code, on the algorithms,
and in bug reports (or success stories if it works for you). Please
do no overfreight me with questions on omirrd, the new dcaching etc,
but solely on the code in this posting.

-- Thomas

--------------------------------------------------------------------------------
/*
* fs/inode.c
*
* Complete reimplementation
* (C) 1997 Thomas Schoebel-Theuer
*/

/* Everything here is intended to be MP-safe. However, other parts
* of the kernel are not yet MP-safe, in particular the inode->i_count++
* that are spread over everywhere. These should be replaced by
* iinc() as soon as possible. Since I have no MP machine, I could
* not test it.
*/
#include <linux/config.h>
#include <linux/errno.h>
#include <linux/fs.h>
#include <linux/string.h>
#include <linux/mm.h>
#include <linux/kernel.h>

#define HASH_SIZE 1024 /* must be a power of 2 */
#define NR_LEVELS 4

#define ST_AGED 1
#define ST_HASHED 2
#define ST_EMPTY 4
#define ST_TO_READ 8
#define ST_TO_WRITE 16
#define ST_TO_PUT 32
#define ST_TO_DROP 64
#define ST_IO (ST_TO_READ|ST_TO_WRITE|ST_TO_PUT|ST_TO_DROP)
#define ST_WAITING 128

/* The idea is to keep empty inodes in a separate list, so no search
* is required as long as empty inodes exit.
* All reusable inodes occurring in the hash table with i_count==0
* are also registered in the ringlist aged_i[level], but in LRU order.
* Used inodes with i_count>0 are kept solely in the hashtable and in
* all_i, but in no other list.
* The level is used for multilevel-aging to avoid thrashing; each
* time i_count decreases to 0, the inode is inserted into the next level
* ringlist. Cache reusage is simply by taking the _last_ element from the
* lowest-level ringlist that contains inodes.
* In contrast to the old code, there isn't any O(n) search overhead now
* in iget/iput (if you make HASH_SIZE large enough).
*/
static struct inode * hashtable[HASH_SIZE];/* linked with i_hash_{next,prev} */
static struct inode * all_i = NULL; /* linked with i_{next,prev} */
static struct inode * empty_i = NULL; /* linked with i_{next,prev} */
static struct inode * aged_i[NR_LEVELS+1]; /* linked with i_lru_{next,prev} */
static int aged_count[NR_LEVELS+1]; /* # in each level */
static int aged_reused[NR_LEVELS+1]; /* # removals from aged_i[level] */
static int age_table[NR_LEVELS+1] = { /* You may tune this */
1, 4, 10, 100, 1000
}; /* after which # of uses to increase to the next level */

/* Keep the next contiguous in memory for kernel/sysctl.c */
int nr_inodes = 0;
int nr_free_inodes = 0;
int max_inodes = NR_INODE;
unsigned long last_inode = 0;

unsigned long inode_init(unsigned long start, unsigned long end)
{
memset(hashtable, 0, sizeof(hashtable));
memset(aged_i, 0, sizeof(aged_i));
memset(aged_count, 0, sizeof(aged_count));
memset(aged_reused, 0, sizeof(aged_reused));
printk("Size of an inode: %d\n", sizeof(struct inode));
return start;
}

/* Intended for short locks of the above global data structures.
* Could be replaced with spinlocks completely, since there is
* no blocking during manipulation of the static data; however the
* lock in invalidate_inodes() may last relatively long.
*/
#ifdef __SMP__
struct semaphore vfs_sem = { 1, };
#endif

/* All lists are cyclic ringlists, so the last element cannot be tested
* for NULL. Use the following construct for traversing cyclic lists:
* ptr = anchor;
* if(ptr) do {
* ...
* ptr = ptr->i_{something}_{next,prev};
* } while(ptr != anchor);
* The effort here is paid off with much simpler inserts/removes.
*/

/* insert is always at the first position */
#define INSERT(NAME,NEXT,PREV) \
static inline void insert_##NAME(struct inode ** anchor, struct inode * inode)\
{\
struct inode * oldfirst = *anchor;\
if(!oldfirst) {\
inode->NEXT = inode->PREV = inode;\
} else { /* don't change the order, it's intended MP-safe */ \
inode->PREV = oldfirst->PREV;\
inode->NEXT = oldfirst;\
oldfirst->PREV->NEXT = inode;\
oldfirst->PREV = inode;\
}\
*anchor = inode;\
}

/* remove can be done with any element in the list */
#define REMOVE(NAME,NEXT,PREV) \
static inline void remove_##NAME(struct inode ** anchor, struct inode * inode)\
{\
struct inode * next = inode->NEXT;\
if(next == inode) {\
*anchor = NULL;\
} else {\
struct inode * prev = inode->PREV;\
prev->NEXT = next;\
next->PREV = prev;\
inode->NEXT = inode->PREV = NULL;\
if(*anchor == inode)\
*anchor = next;\
}\
}

INSERT(all,i_next,i_prev)
REMOVE(all,i_next,i_prev)

INSERT(lru,i_lru_next,i_lru_prev)
REMOVE(lru,i_lru_next,i_lru_prev)

INSERT(hash,i_hash_next,i_hash_prev)
REMOVE(hash,i_hash_next,i_hash_prev)

/* I could not put that in fs.h, because of cyclic header includes.
* Should be revised. */
void _inode_wake_up(struct inode * inode)
{
wake_up_interruptible(&PIPE_WAIT(*inode));
}

static inline struct inode * grow_inodes(void)
{
struct inode * res;
struct inode * inode = res = (struct inode*)__get_free_page(GFP_KERNEL);
int size = PAGE_SIZE;
if(!inode)
return NULL;

size -= sizeof(struct inode);
inode++;
nr_inodes++;
while(size >= sizeof(struct inode)) {
nr_inodes++;
nr_free_inodes++;
insert_all(&empty_i, inode);
inode->i_status = ST_EMPTY;
inode++;
size -= sizeof(struct inode);
}
return res;
}

static inline int hash(dev_t i_dev, unsigned long i_ino)
{
return ((int)i_ino ^ ((int)i_dev << 6)) & (HASH_SIZE-1);
}

void _clear_inode(struct inode * inode, int external, int verbose)
{
if(inode->i_status & ST_HASHED)
remove_hash(&hashtable[hash(inode->i_dev, inode->i_ino)], inode);
if(inode->i_status & ST_AGED) {
/* "cannot happen" when called from an fs because at least
* the caller must use it. Can happen when called from
* invalidate_inodes(). */
if(verbose)
printk("VFS: clearing aged inode\n");
remove_lru(&aged_i[inode->i_level], inode);
aged_count[inode->i_level]--;
}
if(!external && inode->i_status & ST_IO) {
printk("VFS: clearing inode during IO operation\n");
}
if(!(inode->i_status & ST_EMPTY)) {
remove_all(&all_i, inode);
insert_all(&empty_i, inode);
nr_free_inodes++;
} else if(external)
printk("VFS: empty inode is unnecessarily cleared multiple times by an fs\n");
else
printk("VFS: clearing empty inode\n");
inode->i_status = ST_EMPTY;
if(inode->i_pages) {
vfs_unlock(); /* may block, can that be revised? */
truncate_inode_pages(inode, 0);
vfs_lock();
}
/* The inode is not really cleared any more here, but only once
* when taken from empty_i. This saves instructions and processor
* cache pollution.
*/
}

void insert_inode_hash(struct inode * inode)
{
vfs_lock();
if(!(inode->i_status & ST_HASHED)) {
insert_hash(&hashtable[hash(inode->i_dev, inode->i_ino)], inode);
inode->i_status |= ST_HASHED;
} else
printk("VFS: trying to hash an inode again\n");
vfs_unlock();
}

struct inode * _get_empty_inode(void)
{
struct inode * inode;
int retry = 0;

retry:
inode = empty_i;
if(inode) {
remove_all(&empty_i, inode);
nr_free_inodes--;
} else if(nr_inodes < max_inodes || retry > 2) {
inode = grow_inodes();
}
if(!inode) {
int level;
int usable = 0;
for(level = 0; level < NR_LEVELS; level++)
if(aged_i[level]) {
inode = aged_i[level]->i_lru_prev;
/* Here is the picking strategy, tune this */
if(aged_reused[level] < (usable++ ? aged_count[level] : 2))
break;
aged_reused[level] = 0;
}
if(inode) {
if(!(inode->i_status & ST_AGED) || inode->i_level != level)
printk("VFS: Aging inconsistency\n");
if(inode->i_count)
printk("VFS: i_count of aged inode is not zero\n");
if(inode->i_dirt)
printk("VFS: Hey, somebody made my aged inode dirty\n");
_clear_inode(inode, 0, 0);
goto retry;
}
}
if(!inode) {
vfs_unlock();
schedule();
if(retry)
sync_inodes((kdev_t)0);
if(retry > 10)
panic("VFS: cannot repair inode shortage");
if(retry > 2)
printk("VFS: no free inodes\n");
retry++;
vfs_lock();
goto retry;
}
memset(inode, 0, sizeof(struct inode));
inode->i_count = 1;
inode->i_nlink = 1;
inode->i_sem.count = 1;
inode->i_ino = ++last_inode;
inode->i_version = ++event;
insert_all(&all_i, inode);
return inode;
}

static inline struct inode * _get_empty_inode_hashed(dev_t i_dev, unsigned long i_ino)
{
struct inode ** base = &hashtable[hash(i_dev, i_ino)];
struct inode * inode = *base;
if(inode) do {
if(inode->i_ino == i_ino && inode->i_dev == i_dev) {
inode->i_count++;
printk("VFS: inode %ld is already in use\n", i_ino);
return inode;
}
inode = inode->i_hash_next;
} while(inode != *base);
inode = _get_empty_inode();
inode->i_dev = i_dev;
inode->i_ino = i_ino;
insert_hash(base, inode);
inode->i_status |= ST_HASHED;
return inode;
}

/* Please prefer to use this function in future, instead of using
* a get_empty_inode()/insert_inode_hash() combination.
* It allows for better checking and less race conditions.
*/
struct inode * get_empty_inode_hashed(dev_t i_dev, unsigned long i_ino)
{
struct inode * inode;

vfs_lock();
inode = _get_empty_inode_hashed(i_dev, i_ino);
vfs_unlock();
return inode;
}

static inline void wait_io(struct inode * inode, unsigned char flags)
{
while(inode->i_status & flags) {
struct wait_queue wait = {current, NULL};
inode->i_status |= ST_WAITING;
vfs_unlock();
add_wait_queue(&inode->i_wait, &wait);
sleep_on(&inode->i_wait);
remove_wait_queue(&inode->i_wait, &wait);
vfs_lock();
}
}

static inline void set_io(struct inode * inode, unsigned char waitflags, unsigned char setflags)
{
wait_io(inode, waitflags);
inode->i_status |= setflags;
vfs_unlock();
}

static inline int release_io(struct inode * inode, unsigned char flags)
{
int res = 0;
vfs_lock();
inode->i_status &= ~flags;
if(inode->i_status & ST_WAITING) {
inode->i_status &= ~ST_WAITING;
vfs_unlock();
wake_up(&inode->i_wait);
res = 1;
}
return res;
}

struct inode * __iget(struct super_block * sb, unsigned long i_ino, int crossmntp)
{
struct inode ** base;
struct inode * inode;
dev_t i_dev;

if(!sb)
panic("VFS: iget with sb == NULL");
i_dev = sb->s_dev;
if(!i_dev)
panic("VFS: sb->s_dev is NULL\n");
base = &hashtable[hash(i_dev, i_ino)];
vfs_lock();
inode = *base;
if(inode) do {
if(inode->i_ino == i_ino && inode->i_dev == i_dev) {
if(inode->i_status & ST_AGED) {
inode->i_status &= ~ST_AGED;
remove_lru(&aged_i[inode->i_level], inode);
aged_count[inode->i_level]--;
aged_reused[inode->i_level]++;
if(inode->i_nlink > 1)
/* keep hardlinks totally separate */
inode->i_level = NR_LEVELS;
else if(++inode->i_usecount >= age_table[inode->i_level]
&& inode->i_level < NR_LEVELS-1)
inode->i_level++;
if(inode->i_count)
printk("VFS: inode count not zero\n");
}
inode->i_count++;
/* Allow concurrent writes/puts. This is in particular
* useful e.g. when syncing large chunks.
* I hope the i_dirty flag is everywhere set as soon
* as _any_ modifcation is made and _before_
* giving up control, so no harm should occur if data
* is modified during writes, because it will be
* rewritten then (does a short inconsistency on the
* disk harm?) */
wait_io(inode, ST_TO_READ);
vfs_unlock();
goto done;
}
inode = inode->i_hash_next;
} while(inode != *base);
inode = _get_empty_inode_hashed(i_dev, i_ino);
inode->i_sb = sb;
if(sb->s_op && sb->s_op->read_inode) {
set_io(inode, 0, ST_TO_READ); /* do not wait at all */
sb->s_op->read_inode(inode);
if(release_io(inode, ST_TO_READ))
goto done;
}
vfs_unlock();
done:
while(crossmntp && inode->i_mount) {
struct inode * tmp = inode->i_mount;
iinc(tmp);
iput(inode);
inode = tmp;
}
return inode;
}

static inline void _io(void (*op)(struct inode*), struct inode * inode,
unsigned char waitflags, unsigned char setflags)
{
/* Do nothing if the same op is already in progress */
if(op && !(inode->i_status & setflags)) {
set_io(inode, waitflags, setflags);
op(inode);
if(release_io(inode, setflags)) {
/* Somebody grabbed my inode from under me */
printk("_io grab!\n");
vfs_lock();
}
}
}

void _iput(struct inode * inode)
{
struct super_block * sb;
if(inode->i_pipe) {
free_page((unsigned long)PIPE_BASE(*inode));
PIPE_BASE(*inode)= NULL;
}
if((sb = inode->i_sb)) {
if(IS_WRITABLE(inode) && inode->i_sb->dq_op) {
/* can operate in parallel to other ops */
_io(inode->i_sb->dq_op->drop, inode, 0, ST_TO_DROP);
if(inode->i_count)
return;
}
if(inode->i_sb->s_op) {
_io(inode->i_sb->s_op->put_inode, inode,
ST_TO_PUT|ST_TO_WRITE, ST_TO_PUT);
if(inode->i_count)
return;
if(!inode->i_nlink) {
if(!(inode->i_status & ST_EMPTY))
_clear_inode(inode, 0, 1);
return;
}
if(inode->i_dirt) {
inode->i_dirt = 0;
_io(inode->i_sb->s_op->write_inode, inode,
ST_TO_PUT|ST_TO_WRITE, ST_TO_WRITE);
if(inode->i_count)
return;
}
}
}
if(inode->i_mmap)
printk("VFS: inode has mappings\n");
if(inode->i_status & ST_AGED)
return;
if(!(inode->i_status & (ST_HASHED|ST_EMPTY))) {
_clear_inode(inode, 0, 1);
return;
}
if(inode->i_status & ST_EMPTY) {
printk("VFS: Hey, aging an empty inode\n");
}
insert_lru(&aged_i[inode->i_level], inode);
aged_count[inode->i_level]++;
inode->i_status |= ST_AGED;
}

void sync_inodes(kdev_t dev)
{
struct inode * inode;
vfs_lock();
inode = all_i;
if(inode) do {
if(inode->i_dirt && (inode->i_dev == dev || !dev)) {
if(inode->i_sb && inode->i_sb->s_op) {
inode->i_dirt = 0;
_io(inode->i_sb->s_op->write_inode, inode,
ST_IO, ST_TO_WRITE);
}
}
inode = inode->i_next;
} while(inode != all_i);
vfs_unlock();
}

int _check_inodes(kdev_t dev, int complain)
{
struct inode * inode;
int bad = 0;

vfs_lock();
inode = all_i;
if(inode) do {
struct inode * next;
next = inode->i_next;
if(inode->i_dev == dev) {
if(inode->i_count || inode->i_dirt) {
bad++;
} else
_clear_inode(inode, 0, 0);
}
inode = next;
} while(inode != all_i);
vfs_unlock();
if(complain)
printk("VFS: %d inode(s) busy on removed device `%s'\n",
bad, kdevname(dev));
return (bad == 0);
}

/*inline*/ void invalidate_inodes(kdev_t dev)
{
(void)_check_inodes(dev, 1);
}

/*inline*/ int fs_may_mount(kdev_t dev)
{
return _check_inodes(dev, 0);
}

int fs_may_remount_ro(kdev_t dev)
{
(void)dev;
return 1; /* not checked any more */
}

int fs_may_umount(kdev_t dev, struct inode * mount_root)
{
struct inode * inode;
vfs_lock();
inode = all_i;
if(inode) do {
if(inode->i_dev == dev && inode->i_count)
if(inode != mount_root || inode->i_count >
(inode->i_mount == inode ? 2 : 1)) {
vfs_unlock();
return 0;
}
inode = inode->i_next;
} while(inode != all_i);
vfs_unlock();
return 1;
}

extern struct inode_operations pipe_inode_operations;

struct inode * get_pipe_inode(void)
{
struct inode * inode = get_empty_inode();
PIPE_BASE(*inode) = (char*)__get_free_page(GFP_USER);
if(!(PIPE_BASE(*inode))) {
iput(inode);
return NULL;
}
inode->i_blksize = PAGE_SIZE;
inode->i_pipe = 1;
inode->i_mode = S_IFIFO | S_IRUSR | S_IWUSR;
inode->i_count++;
inode->i_uid = current->fsuid;
inode->i_gid = current->fsgid;
inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
inode->i_op = &pipe_inode_operations;
PIPE_READERS(*inode) = PIPE_WRITERS(*inode) = 1;
return inode;
}

/*taken over from the old code... */
/* POSIX UID/GID verification for setting inode attributes */
int inode_change_ok(struct inode *inode, struct iattr *attr)
{
/*
* If force is set do it anyway.
*/

if (attr->ia_valid & ATTR_FORCE)
return 0;

/* Make sure a caller can chown */
if ((attr->ia_valid & ATTR_UID) &&
(current->fsuid != inode->i_uid ||
attr->ia_uid != inode->i_uid) && !fsuser())
return -EPERM;

/* Make sure caller can chgrp */
if ((attr->ia_valid & ATTR_GID) &&
(!in_group_p(attr->ia_gid) && attr->ia_gid != inode->i_gid) &&
!fsuser())
return -EPERM;

/* Make sure a caller can chmod */
if (attr->ia_valid & ATTR_MODE) {
if ((current->fsuid != inode->i_uid) && !fsuser())
return -EPERM;
/* Also check the setgid bit! */
if (!fsuser() && !in_group_p((attr->ia_valid & ATTR_GID) ? attr->ia_gid :
inode->i_gid))
attr->ia_mode &= ~S_ISGID;
}

/* Check for setting the inode time */
if ((attr->ia_valid & ATTR_ATIME_SET) &&
((current->fsuid != inode->i_uid) && !fsuser()))
return -EPERM;
if ((attr->ia_valid & ATTR_MTIME_SET) &&
((current->fsuid != inode->i_uid) && !fsuser()))
return -EPERM;
return 0;
}

void inode_setattr(struct inode * inode, struct iattr * attr)
{
if(attr->ia_valid & (ATTR_UID|ATTR_GID|ATTR_SIZE|ATTR_ATIME|ATTR_MTIME|ATTR_CTIME|ATTR_CTIME)) {
if (attr->ia_valid & ATTR_UID)
inode->i_uid = attr->ia_uid;
if (attr->ia_valid & ATTR_GID)
inode->i_gid = attr->ia_gid;
if (attr->ia_valid & ATTR_SIZE)
inode->i_size = attr->ia_size;
if (attr->ia_valid & ATTR_ATIME)
inode->i_atime = attr->ia_atime;
if (attr->ia_valid & ATTR_MTIME)
inode->i_mtime = attr->ia_mtime;
if (attr->ia_valid & ATTR_CTIME)
inode->i_ctime = attr->ia_ctime;
if (attr->ia_valid & ATTR_MODE) {
inode->i_mode = attr->ia_mode;
if (!fsuser() && !in_group_p(inode->i_gid))
inode->i_mode &= ~S_ISGID;
}
inode->i_dirt = 1;
}
}

int notify_change(struct inode * inode, struct iattr * attr)
{
int error;
time_t now = CURRENT_TIME;

attr->ia_ctime = now;
if ((attr->ia_valid & (ATTR_ATIME | ATTR_ATIME_SET)) == ATTR_ATIME)
attr->ia_atime = now;
if ((attr->ia_valid & (ATTR_MTIME | ATTR_MTIME_SET)) == ATTR_MTIME)
attr->ia_mtime = now;
attr->ia_valid &= ~(ATTR_CTIME);
if (inode->i_sb && inode->i_sb->s_op && inode->i_sb->s_op->notify_change)
return inode->i_sb->s_op->notify_change(inode, attr);
error = inode_change_ok(inode, attr);
if(!error)
inode_setattr(inode, attr);
return error;
}

int bmap(struct inode * inode, int block)
{
if (inode->i_op && inode->i_op->bmap)
return inode->i_op->bmap(inode, block);
return 0;
}

---------------------------------------------------------------------------------
Index: fs.h
===================================================================
RCS file: /usr/src/CVS/include/linux/fs.h,v
retrieving revision 1.8
diff -c -r1.8 fs.h
*** fs.h 1997/03/12 16:00:49 1.8
--- fs.h 1997/03/28 16:16:45
***************
*** 6,11 ****
--- 6,13 ----
* structures etc.
*/

+ #define CONFIG_NEW_INODE /* kludge for those not having my config */
+
#include <linux/config.h>
#include <linux/linkage.h>
#include <linux/limits.h>
***************
*** 291,306 ****
struct dquot *i_dquot[MAXQUOTAS];
struct inode *i_next, *i_prev;
struct inode *i_hash_next, *i_hash_prev;
! struct inode *i_bound_to, *i_bound_by;
struct inode *i_mount;
unsigned short i_count;
unsigned short i_flags;
unsigned char i_lock;
unsigned char i_dirt;
unsigned char i_pipe;
unsigned char i_sock;
unsigned char i_seek;
unsigned char i_update;
unsigned short i_writecount;
union {
struct pipe_inode_info pipe_i;
--- 293,317 ----
struct dquot *i_dquot[MAXQUOTAS];
struct inode *i_next, *i_prev;
struct inode *i_hash_next, *i_hash_prev;
! struct inode *i_lru_next, *i_lru_prev;
struct inode *i_mount;
unsigned short i_count;
unsigned short i_flags;
+ #ifdef CONFIG_NEW_INODE
+ unsigned char i_status;
+ #else
unsigned char i_lock;
+ #endif
unsigned char i_dirt;
unsigned char i_pipe;
unsigned char i_sock;
+ #ifdef CONFIG_NEW_INODE
+ unsigned char i_level;
+ unsigned char i_usecount;
+ #else
unsigned char i_seek;
unsigned char i_update;
+ #endif
unsigned short i_writecount;
union {
struct pipe_inode_info pipe_i;
***************
*** 617,627 ****
struct inode ** res_inode, struct inode * base);
extern int do_mknod(const char * filename, int mode, dev_t dev);
extern int do_pipe(int *);
extern void iput(struct inode * inode);
extern struct inode * __iget(struct super_block * sb,int nr,int crsmnt);
extern struct inode * get_empty_inode(void);
- extern void insert_inode_hash(struct inode *);
extern void clear_inode(struct inode *);
extern struct inode * get_pipe_inode(void);
extern int get_unused_fd(void);
extern void put_unused_fd(int);
--- 628,709 ----
struct inode ** res_inode, struct inode * base);
extern int do_mknod(const char * filename, int mode, dev_t dev);
extern int do_pipe(int *);
+ #ifdef CONFIG_NEW_INODE
+ #include <asm/semaphore.h>
+
+ /* Intended for short locks of the global data structures in inode.c.
+ * Could be replaced with spinlocks completely, since there is
+ * no blocking during manipulation of the static data; however the
+ * lock in invalidate_inodes() may last relatively long.
+ */
+ extern struct semaphore vfs_sem;
+ extern inline void vfs_lock(void)
+ {
+ #ifdef __SMP__
+ down(&vfs_sem);
+ #endif
+ }
+
+ extern inline void vfs_unlock(void)
+ {
+ #ifdef __SMP__
+ up(&vfs_sem);
+ #endif
+ }
+
+ /* This should be reimplemented using either local locks on every inode
+ * or using lock prefixes in assembler (on architectures where possible).
+ * However, be warned that local locks must not lead to deadlocks when
+ * combined with the global lock: if you need both types of locks
+ * simultanously, always get the global lock first (ensure strict order),
+ * never vice versa.
+ */
+ extern inline void iinc(struct inode * inode)
+ {
+ vfs_lock();
+ inode->i_count++;
+ vfs_unlock();
+ }
+
+ extern void _inode_wake_up(struct inode * inode);
+ extern void _iput(struct inode * inode);
+ extern inline void iput(struct inode * inode)
+ {
+ if(inode) {
+ if(inode->i_pipe)
+ _inode_wake_up(inode);
+ vfs_lock();
+ if(!--inode->i_count)
+ _iput(inode);
+ vfs_unlock();
+ }
+ }
+
+ extern struct inode * __iget(struct super_block * sb, unsigned long nr, int crsmnt);
+ void _clear_inode(struct inode * inode, int external, int verbose);
+ extern inline void clear_inode(struct inode * inode)
+ {
+ vfs_lock();
+ _clear_inode(inode, 1, 1);
+ vfs_unlock();
+ }
+ extern struct inode * _get_empty_inode(void);
+ extern inline struct inode * get_empty_inode(void)
+ {
+ struct inode * inode;
+ vfs_lock();
+ inode = _get_empty_inode();
+ vfs_unlock();
+ return inode;
+ }
+
+ #else
extern void iput(struct inode * inode);
extern struct inode * __iget(struct super_block * sb,int nr,int crsmnt);
extern struct inode * get_empty_inode(void);
extern void clear_inode(struct inode *);
+ #endif
+ extern void insert_inode_hash(struct inode *);
extern struct inode * get_pipe_inode(void);
extern int get_unused_fd(void);
extern void put_unused_fd(int);
***************
*** 683,692 ****
--- 765,782 ----
extern int inode_change_ok(struct inode *, struct iattr *);
extern void inode_setattr(struct inode *, struct iattr *);

+ #ifdef CONFIG_NEW_INODE
+ extern inline struct inode * iget(struct super_block * sb, unsigned long nr)
+ {
+ return __iget(sb, nr, 1);
+ }
+ #else
+
extern inline struct inode * iget(struct super_block * sb,int nr)
{
return __iget(sb, nr, 1);
}
+ #endif

/* kludge to get SCSI modules working */
#include <linux/minix_fs.h>

Next message: Ingo Molnar: "Re: SMP update: 2.1.30"
Previous message: Vince Nicotra: "How do I get off this mailing list"