Re: [PATCH] protect remove_proc_entry

From: Kirill Korotaev
Date: Wed Jan 04 2006 - 06:24:04 EST


Hi Andrew,

I have a full patch for this.


Please don't top-post. It makes things hard...
do you prefer separate mails with patch and with reference to original report? will do so.

I don't remember the details yet, but lock was not god here, we used semaphore. I pointed to this problem long ago when fixed error path in proc with moduleget.

This patch protects proc_dir_entry tree with a proc_tree_sem semaphore. I suppose lock_kernel() can be removed later after checking that no proc handlers require it.
Also this patch remakes de refcounters a bit making it more clear and more similar to dentry scheme - this is required to make sure that everything works correctly.

Patch is against 2.6.15-rcX and was tested for about a week. Also works half a year on 2.6.8 :)

[ patch which uses an rwsem for procfs and somewhat removes lock_kernel() ]



I worry about replacing a spinlock with a sleeping lock. In some
circumstances it can cause a complete scalability collapse and I suspect
this could happen with /proc. Although I guess the only fastpath here is
proc_readdir(), and as the lock is taken there for reading, we'll be OK..

The patch does leave some lock_kernel() calls behind. If we're going to do
this, I think they should all be removed?

Races in /proc have been plentiful and hard to find. The patch worries me,
frankly. I'd like to see quite a bit more description of the locking
schema and some demonstration that it's actually complete before taking the
plunge.

ok, here goes some more descriptions:

1.
proc_readdir is a sleeping ops:
sys_getdents
`- vfs_readdir
`- proc_readdir
`- filldir
`- put_user/copy_to_user
that's why we must use semaphore. of course spinlock can be used too,
but this will case another problem: it must be dropped to call filldir, but
beofre it will be retaken the dentry being filldir-ed may be removed and
we won't see it's siblings in output.

2.
lock_kernel() is needed to handle at least simultaneous sys_read vs
create_proc_entry with sequental de->proc_fops = XXX. this can be
handled by passing fops directly to create_proc_entry.
i.e. there is a 3rd problem I pointed to you before:
create_proc_entry() and setting of de->owner/de->proc_fops is inatomic.
lock_kernel() is not a best way to protect against this, sure...
I would prefer to fix it with a separate patch somehow...

3.
refcounting:
each proc_dir_entry's refcounter is the reference from inode plus
references from children. once this count reaches zero - dentry is freed.
so on each proc_register() parent is get-ed, on each remove_proc_entry
parent is put-ed.

Kirill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/