Re: [PATCH 0 of 4] Generic AIO by scheduling stacks

From: Davide Libenzi
Date: Sat Feb 10 2007 - 13:19:35 EST


On Sat, 10 Feb 2007, bert hubert wrote:

> On Fri, Feb 09, 2007 at 02:33:01PM -0800, Linus Torvalds wrote:
>
> > - IF the system call blocks, we call the architecture-specific
> > "schedule_async()" function before we even get any scheduler locks, and
> > it can just do a fork() at that time, and let the *child* return to the
> > original user space. The process that already started doing the system
> > call will just continue to do the system call.
>
> Ah - cool. The average time we have to wait is probably far greater than the
> fork overhead, microseconds versus milliseconds.
>
> However, and there probably is a good reason for this, why isn't it possible
> to do it the other way around, and have the *child* do the work and the
> original return to userspace?

If the parent is going to schedule(), someone above has already dropped
the parent's task_struct inside a wait queue, so the *parent* will be the
wakeup target [1].
Linus take to the generic AIO is a neat one, but IMO continuos fork/exits
are going to be expensive. Even if the task is going to sleep, that does
not mean that the parent (well, in Linus case, the child actually) does
not have more stuff to feed to async(). IMO the frequency of AIO
submission and retrieval can get pretty high (hence the frequency of
fork/exit), and there might be a price to pay for it at the end.
IMO one solution, following the non-fibril way, may be:

- Keep a pool of per-process threads (a per-process pool already has stuff
like "files" already correctly setup, just for example - no need to
teach everywhere around the kernel of the "async" special case)

- When a schedule happen on the submission thread, we get a thread
(task_struct really) of the available pool

- We setup the submission (now going to sleep) thread return IP to an
async_complete (or whatever name) stub. This will drop a result in a
queue, and wake the async_wait (or whatever name) wait queue head

- We may want to swap at least the PID (signals, ...?) between the two, so
even if we're re-emrging with a new task_struct, the TID will be the same

- We make the "returning" thread to come back to userspace through some
special helper ala ret_from_fork (ret_from_async ?)

- We want also to keep a record (hash?) of userspace cookies and threads
currently servicing them, so that we can implement cancel (send signal)

Open issues:

- What if the pool becomes empty since all thread are stuck under schedule?
o Grow the pool (and delay-shrink at quiter times)?
o Make the caller really sleep?
o Fall back in queue-request mode?

- Look at the Devil hiding in the details and showing up many times during
the process

Yup, I can see Zach having a lot of fun with it ;)



[1] Well, you could add a list_head to the task_struct, and teach the
add-to-waitqueue to drop a reference to all the wait queue entries
hosting the task_struct. Then walk&fix (likely be only one entry) when
you swap the submission thread context (thread_info, per_call stuff, ...)
over a service thread task_struct. At that point you can re-emerge
with the same task_struct. Pretty nasty though.


- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/