Re: Make pipe data structure be a circular list of pages, ratherthan

From: Linus Torvalds
Date: Wed Jan 12 2005 - 15:19:57 EST




On Wed, 12 Jan 2005, Davide Libenzi wrote:

> On Sun, 9 Jan 2005, Linus Torvalds wrote:
>
> > On Sun, 9 Jan 2005, Linus Torvalds wrote:
> > >
> > > Since you guys stupidly showed interest, here's a very first-order
> > > approximation of filling the pipe from some other source.
> >
> > Here's a somewhat fixed and tested version, which actually does something
> > on x86.
>
> Question. How do you think to splice() skb pages, or any other non page-based
> format?

That's why there's an "offset/length" pair. Right now the only limitation
is actually
- that at least the start of the area can be described as a "struct page
*" (this means that things like a PCI MMIO mapping cannot be spliced -
but that's true for other reasons anyway, since a "memcpy()" doesn't
even work on such things on most architectures)
- that it be mappable with kmap() (this means that if it's a multi-page
thing, it needs to be in low memory currently).

The second thing would actually going away in a full implementation
anyway, to be replaced by "map this page in" and "unmap this page" buffer
operations. We need those "map" operations in order to handle other
"prepare for copy" situations, notably waiting for disk IO to be finished
(ie I want to be able to splice pages to the pipe without having to wait
for them - you'd wait for the contents only when the data itself is
needed).

My example patch (which I didn't post publicly because it's a
work-in-progress) already made offset/length be "unsigned int", exactly
because I foresee the "page" possibly being part of a hugetlb page, or at
least a bigger multi-page allocation.

[ Side note: even the first issue - PCI MMIO or other descriptor that
doesn't normally have a "struct page *" associated with it - could
in theory be handled by just creating a fake "struct page", and hiding
the information into it. Then the map/unmap functions would just have to
return the virtual address that is usable for a memcpy - since on _some_
architectures you can actually do a memcpy on IO space too. That might
be useful for things like AGP or similar that depend on architecture-
specific issues anyway ]

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/