Re: deferring __fput()

From: Al Viro
Date: Sat Jun 23 2012 - 16:58:03 EST


On Sat, Jun 23, 2012 at 08:45:05PM +0100, Al Viro wrote:
> BTW, I really wonder why do we need to have that void *data in task_work; we can
> always embed the sucker into a bigger struct (if nothing else, task_work +
> void *data) and get to it via container_of(). And in quite a few cases we don't
> want that data thing at all. Moreover, the reasons to use hlist_head instead of
> a single forward pointer are very thin on the ground:
> * task_work_add() adds to the beginning of the list
> * task_work_cancel() walks the list to find the entry to remove
> * trask_work_run() takes the list out, walks it to the end,
> then walks it backwards via pprev. Could as well reverse the pointers
> while walking forward...

You know what... Let's dump that "reverse the list" thing completely. What we ought to
do instead of that is honestly keeping both the head of the (single-linked) list and
pointer to pointer to its last element. Sure, that'll eat one more word in task_struct.
And it doesn't matter, since we'll be able to kill something else in there - namely,
->scm_work_list. The whole reason we have it is that we want to prevent recursion
in __scm_destroy(); with __fput() getting done via task_work_add() this recursion
will be gone automatically.

So here's what I want to do:

1) kill task_work->data; the only user that cares will allocate task_work + struct cred * and
use container_of() to get to it.

2) replace current->task_works with struct task_work *task_works, **last_work and replace
hlist_node in task_work with struct task_work *next; task_work_add() would add to the tail,
task_work_cancel() would do the obvious "scan the single-linked list, remove the element
found" wihtout bothering with hlist and task_work_run() would just walk the list in direct
order.

3) at that point task_work is equal in size (and layout, BTW) to rcu_head. So we can add it
into the same union in struct file where we already have list_head and rcu_head. No space
eaten up. fput() would, once the counter reaches 0, remove the file from list (the only
place walking that list skips the ones with zero refcount anyway) and, if we are in a normal
process, use task_work_add() to have __fput() done to it. If we are in kernel thread or
atomic context, just move the sucker to global list and use schedule_work() to have said
list emptied and everything in it fed to __fput().

4) kill ->scm_work_list, along with all contortions in __scm_destroy(); just have it do all
the fput() (if any) it wants to do for this cookie.

5) in fs/aio.c, replace
if (unlikely(!fput_atomic(req->ki_filp))) {
spin_lock(&fput_lock);
list_add(&req->ki_list, &fput_head);
spin_unlock(&fput_lock);
schedule_work(&fput_work);
} else {
req->ki_filp = NULL;
really_put_req(ctx, req);
}
with
fput(req->ki_filp);
req->ki_filp = NULL;
really_put_req(ctx, req);
Sure, we are not allowed to block there. And we won't. fput_head/fput_work/aio_fput_routine()
goes to hell, and so does the whole "called_fput" mess in kiocb_batch_refill() in there - retry
loop and all.

Objections, anyone? Linus, Oleg, Dave?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/