Re: Direct I/O

From: Alexander S A Kjeldaas (Alexander.Kjeldaas@fast.no)
Date: Mon Mar 13 2000 - 06:54:55 EST


On Fri, Mar 10, 2000 at 05:59:15PM +0000, Stephen C. Tweedie wrote:
> Hi,
>
> On Thu, 9 Mar 2000 19:56:46 +0100, Alexander S A Kjeldaas
> <Alexander.Kjeldaas@fast.no> said:
>
> > I'm looking for opinions about how direct I/O can be implemented on
> > Linux. I am interested in a design that does not touch the
> > page-tables of the process, and where the data is DMAed directly to
> > user-space visible pages.
>
> Sorry. You can't access user space from DMA _without_ touching page
> tables at some point. The only two ways of doing it are to create
> memory in kernel space and then mmap() it into user space (and faulting
> those in requires direct page table access), or by following user page
> tables to convert user virtual addresses into physical memory
> addresses (which is what map_user_kiobuf() does in 2.3).
>

"touch page tables" was inaccurate. What I meant was that I do not
want any part of the page-table of the process to be invalidated
during a "common" read. For instance, some "page flipping" direct I/O
implementations do not reuse the pages in the page table of the user
process, requiring parts of the page-table to be invalidated on a
read.

> > My goal is to have an interface that lets me DMA directly from disk
> > into the CPU-cache
>
> You can't. You can only DMA to/from main memory, not cache.
>

You mean there is no hardware that can do this? I hear SGI are
bragging about their advanced bus-snooping capabilities. I thought
they would be able to do this. The data _is_ on a shared bus so it
should theoretically be available. If it really isn't possible, the
advantage of not invalidating page-table entries is less, but it is
still quite measurable.

> > The idea might be ugly, but it might be better than nothing :-)
>
> > o An mmap hint (MAP_BLOCK?) that requests that the mmap should block
> > until the region requested has been read into memory.
>
> Chuck Lever has been doing a lot of work on this sort of thing. He has
> been posting patches for mincore() (to find out the residency of an area
> of memory), and madvise() (to force pagein or discarding of memory
> ranges).

Do you or anyone have a pointer to these patches?

>
> > Could something like the above be an acceptable alternative to direct
> > I/O to user-space-provided pages?
>
> Direct IO is already possible anyway. The raw devices in 2.3 provide
> it (at least unless you have user memory higher in memory than the
> device drivers can access). We are working on a more flexible
> infrastructure for this sort of thing, too: the kiobuf patches I posted
> to l-k this week are a step towards separating out the management of
> filesystem caches internally from the way we pass the data to and from
> user space.
>

What parts of the kernel is going to start using kiobufs? I looked
through the kernel and it looked like kiobufs were basically only used
by the raw devices. How does kiobufs relate to for instance
bufferheads?

One more question: Will the raw devices be able to switch a high-mem
page into a dma-friendly page if the page is used as a buffer for a
direct I/O read? I expect that on a high-mem machine, the kernel will
give out primarily high-mem pages to processes. It would be great if
the kernel could automatically "upgrade" high-mem pages that are used
as read buffers for raw I/O, to low-mem pages (<2G).

astor

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Mar 15 2000 - 21:00:24 EST