Re: 2.6.11-rc1-mm1

From: Karim Yaghmour
Date: Sun Jan 23 2005 - 02:34:33 EST



Hello Roman,

Roman Zippel wrote:
> Well, let's concentrate for a moment on the last thing and check later
> if and how they fit into relayfs. Since ltt will be first main user, let's
> optimize it for this.
> Also since relayfs is intended for large, fast data transfers, per cpu
> buffers are pretty much always required, so it would make sense to leave
> this to relayfs (less to get wrong for the client).

But how does relayfs organize the namespace then? What if I have
multiple channels per CPU, each for a different type of data, will
all channels for the same CPU be under the same directory or will
each type of data have its own directory with one entry per CPU?
I don't have an answer to that, and I don't know that we should. Why
not just leave it to the client to organize his data as he wishes.
If we must assume that everyone will have at least one channel per
CPU, then why not provide helper functions built on top of very
basic functions instead of fixing the namespace in stone?

> I have to modify it a little (only the if (!buffer) part is new):
>
> cpu = get_cpu();
> buffer = relay_get_buffer(chan, cpu);
> while(1) {
> offset = local_add_return(buffer->offset, length);
> if (likely(offset + length <= buffer->size))
> break;
> buffer = relay_switch_buffer(chan, buffer, offset);
> if (!buffer) {
> put_cpu();
> return;
> }
> }
> memcpy(buffer->data + offset, data, length);
> put_cpu();
>
> This has a very short fast path and I need very good reasons to change/add
> anything here. OTOH the slow path with relay_switch_buffer() is less
> critical and still leaves a lot of flexibility.

This is not good for any client that doesn't know beforehand the exact
size of their data units, as in the case of LTT. If LTT has to use this
code that means we are going to loose performance because we will need to
fill an intermediate data structure which will only be used for relay_write().
Instead of zero-copy, we would have an extra unnecessary copy. There has
got to be a way for clients to directly reserve and write as they wish.
Even Zach Brown recognized this in his tracepipe proposal, here's from
his patch:
+ * - let caller reserve space and get a pointer into buf

>>1) get_cpu() and put_cpu() won't do. You need to outright disable
>>interrupts because you may be called from an interrupt handler.
>
>
> Look closer, it's already interrupt safe, the synchronization for the
> buffer switch is left to relay_switch_buffer().

Sorry, I'm still missing something. What exactly does local_add_return()
do? I assume this code has got to be interrupt safe? Something like:
#define local_add_return(OFFSET, LEN) \
do {\
...
local_irq_save(); \
OFFSET += LEN;
local_irq_restore(); \
...
} while(0);

I'm assuming local_irq_XXX because we were told by quite a few people
in the related thread to avoid atomic ops because they are more expensive
on most CPUs than cli/sti.

Also how does relay_get_buffer() operate? What if I'm writing an event
from within a system call and I'm about to switch buffers and get
an interrupt at the if(likely(...))? Isn't relay_get_buffer() going to
return the same pointer as the one obtained for the syscall, and aren't
both cases now going to effect relay_switch_buffer(), one of which will
be superfluous?

> This adds a conditional and is not really needed. Above shows how to make
> it interrupt safe and if the clients wants to reuse the same buffer, leave
> the locking to the client.

Fine, but how is the client going to be able to reuse the same buffer if
relayfs always assumes per-CPU buffer as you said above? This would be
solved if at its core relayfs' functions worked on single channels and
additional code provided helpers for making the SMP case very simple.

> That's quite a lot of code with at least 14 conditions (or 13 conditions
> too much) and this is just relayfs.

I believe Tom has refactored the code with your comments in mind, and has
something ready for review. I just want to clear up the above before we
make this final. Among other things, he just dropped all modes, and there's
only a basic relay_write() that closely resembles what you have above.

> That's not always true, where perfomance matters we provide different
> functions (e.g. spinlocks), so having an alternative version of
> relay_write is a possibility (although I'd like to see the user first).

Sure, see above in the case of LTT.

Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@xxxxxxxxxxx || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/