Re: namei() query

From: kumon@flab.fujitsu.co.jp
Date: Fri Apr 21 2000 - 04:20:22 EST


Linus Torvalds writes:
> It might be interesting to re-write "schedule()" as a macro or inline
> functions, something like
>
> static inline void schedule(void)
> {
> __schedule();
> reacquire_kernel_lock(current);
> }
>
> and see which callers are the worst hit, and try to fix them up.

Measurement with this modification will show that schedule_timeout()
is the major offender.

Actually, we've done with different manner: reading return-address on
the stack when the schedule() is called with kernel-lock.

The result shows 90% of call with kernel_lock to schedule() is
schedule_timeout(). The rest 10% is interruptible_sleep_on().

> It's almost certainly going to be "poll()" that is the big one
> contributing to schedule(), judging by your other numbers.

In this case, poll() is not guilty.
Apache is written in (perhaps) BSD manner, it uses select() instead.

My previous posting shows there's no do_poll() spinlock waiting, but
do_select() does. Please notice that sock_poll() is the 2nd most
waiting time consumer, which is also called from do_select().
Additional measuement completely coifirmed this argument. All call to
schedule_timeout() with kernel_lock is comming from do_select().

Looking into the do_select(), the kernel lock in following code exist.
do_select() contains schedule_timeout() inside the lock like do_poll().

Like Linus's sample code, I can move lock/unlock_kernel inside the
for(;;) loop. The places are marked by ###.
Or, even move into much inner loop: just before fget(), just after fput().

But I don't realy understand, what portion actually needs the lock?

-----------a part of do_select()
##on lock_kernel();
        if (retval < 0)
                goto out;
        n = retval;
        retval = 0;
        for (;;) {
                set_current_state(TASK_INTERRUPTIBLE);
                #### can move lock_kernel() here
                for (i = 0 ; i < n; i++) {
                        unsigned long bit = BIT(i);
                        unsigned long mask;
                        struct file *file;

                        off = i / __NFDBITS;
                        if (!(bit & BITS(fds, off)))
                                continue;
                        file = fget(i);
                        mask = POLLNVAL;
                        if (file) {
                                mask = DEFAULT_POLLMASK;
                                if (file->f_op && file->f_op->poll)
##off then on mask = file->f_op->poll(file, wait);
                                fput(file);
                        }
                        if ((mask & POLLIN_SET) && ISSET(bit, __IN(fds,off))) {
                                SET(bit, __RES_IN(fds,off));
                                retval++;
                                wait = NULL;
                        }
                        if ((mask & POLLOUT_SET) && ISSET(bit, __OUT(fds,off))) {
                                SET(bit, __RES_OUT(fds,off));
                                retval++;
                                wait = NULL;
                        }
                        if ((mask & POLLEX_SET) && ISSET(bit, __EX(fds,off))) {
                                SET(bit, __RES_EX(fds,off));
                                retval++;
                                wait = NULL;
                        }
                }
                #### can move unock_kernel() here
                wait = NULL;
                if (retval || !__timeout || signal_pending(current))
                        break;
                __timeout = schedule_timeout(__timeout);
        }
        current->state = TASK_RUNNING;

out:
        if (*timeout)
                free_wait(wait_table);

        /*
         * Up-to-date the caller timeout.
         */
        *timeout = __timeout;
##off unlock_kernel();
-----------

sock_poll() may be called from file->f_op->poll() and the function is:

static unsigned int sock_poll(struct file *file, poll_table * wait)
{
        struct socket *sock;
        int err;

##off unlock_kernel();
        sock = socki_lookup(file->f_dentry->d_inode);

        /*
         * We can't return errors to poll, so it's either yes or no.
         */

        err = sock->ops->poll(file, sock, wait);
##on lock_kernel();
        return err;
}

--
Computer Systems Laboratory, Fujitsu Labs.
kumon@flab.fujitsu.co.jp

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Apr 23 2000 - 21:00:18 EST