Re: inode_lock "decorative"?

Andrea Arcangeli (andrea@suse.de)
Thu, 25 Nov 1999 21:50:32 +0100 (CET)


On Thu, 25 Nov 1999 kuznet@ms2.inr.ac.ru wrote:

> * It smells... In theory we should not make this not holding
> * inode_lock. This syncronization point is not better than
> * kernel lock. I see no problem now, this inode is our private
> * property.
> */
> sock->inode->i_count++;
>
>
>I wrote "I see no problem", but it is true only if this inode is
>really private i.e. VFS never changes i_count for some service purposes.

It's private and the VFS won't race with you (unless you play with the
i_list field at least ;).

NOTE: the inode is not private completly as it's queued in the
inode_in_use list but the VFS won't touch the data of the inode. It's
there only because by design an inode is always queued in one list
depending on its state. I updated the description of the state -> list at
the top of fs/inode.c.

Actually for dummy inodes like for the sockets we could avoid queueing the
inode in the inuse_list but then we should add a path in iput() to check
that the inode is queued in a list before calling list_del. This thing is
not very interesting IMHO.

What I ask you is _why_ you increment and the iput by hand. Just remove
the i_count++ and the iput() and all should remains fine. Basically you
run always with an i_count of 2 while you could run all the time with an
i_count of 1 without differences (unless I am missing something... hmm).

>Also, is it difficult to make similar thing with d_alloc_root?

It's just possible even if the dcache is still signle threaded. The reason
is that d_alloc remains local to you if the parent is NULL (and the parent
is null when called by d_alloc_root()). Then there is d_instantitate()
that will do a list_add on a list local to you as it's in your dummy inode
(and you won't have more than one dentry per inode so you have not to
protect either inside your subsystem). So basically you can run such piece
of code without the big kernel lock. Same shared .text assembler but run
with different parallelism, fun ;).

I also noticed an SMP race in a accounting var.

Here a patch against 2.3.30pre1:

--- 2.3.30pre1-socket_VFS/net/socket.c.~1~ Tue Oct 12 18:13:38 1999
+++ 2.3.30pre1-socket_VFS/net/socket.c Thu Nov 25 21:39:07 1999
@@ -44,6 +44,7 @@
* Tigran Aivazian : sys_send(args) calls sys_sendto(args, NULL, 0)
* Tigran Aivazian : Made listen(2) backlog sanity checks
* protocol-independent
+ * Andrea Arcangeli: VFS interface cleanups.
*
*
* This program is free software; you can redistribute it and/or
@@ -176,7 +177,7 @@
* Statistics counters of the socket lists
*/

-static int sockets_in_use = 0;
+static atomic_t sockets_in_use = ATOMIC_INIT(0);

/*
* Support routines. Move socket addresses back and forth across the kernel/user
@@ -261,23 +262,14 @@
goto out;
}

- lock_kernel();
file->f_dentry = d_alloc_root(sock->inode);
if (!file->f_dentry) {
- unlock_kernel();
put_filp(file);
put_unused_fd(fd);
fd = -ENOMEM;
goto out;
}

- /*
- * The socket maintains a reference to the inode, so we
- * have to increment the count.
- */
- sock->inode->i_count++;
- unlock_kernel();
-
file->f_op = &socket_file_ops;
file->f_mode = 3;
file->f_flags = O_RDWR;
@@ -340,18 +332,9 @@
struct inode * inode;
struct socket * sock;

- lock_kernel();
- /* Damn! get_empty_inode is not SMP safe.
- I ask, why does it have decorative spinlock
- at the very beginning? Probably, dcache ops should
- be lock_kernel'ed inside inode.c
- */
inode = get_empty_inode();
- if (!inode) {
- unlock_kernel();
+ if (!inode)
return NULL;
- }
- unlock_kernel();

sock = socki_lookup(inode);

@@ -369,7 +352,7 @@
sock->sk = NULL;
sock->file = NULL;

- sockets_in_use++;
+ atomic_inc(&sockets_in_use);
return sock;
}

@@ -392,9 +375,8 @@
if (sock->fasync_list)
printk(KERN_ERR "sock_release: fasync list not empty!\n");

- --sockets_in_use; /* Bookkeeping.. */
+ atomic_dec(&sockets_in_use); /* Bookkeeping.. */
sock->file=NULL;
- iput(sock->inode);
}

int sock_sendmsg(struct socket *sock, struct msghdr *msg, int size)
@@ -1604,7 +1586,7 @@

int socket_get_info(char *buffer, char **start, off_t offset, int length)
{
- int len = sprintf(buffer, "sockets: used %d\n", sockets_in_use);
+ int len = sprintf(buffer, "sockets: used %d\n", atomic_read(&sockets_in_use));
if (offset >= len)
{
*start = buffer;

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/