[tripperda@nvidia.com: Re: 32-bit dma allocations on 64-bit platforms]

From: Terence Ripperda
Date: Wed Jun 23 2004 - 18:21:17 EST


----- Forwarded message from Terence Ripperda <tripperda@xxxxxxxxxx> -----

Date: Wed, 23 Jun 2004 14:36:43 -0700
From: Terence Ripperda <tripperda@xxxxxxxxxx>
To: Andi Kleen <ak@xxxxxx>
Cc: Terence Ripperda <tripperda@xxxxxxxxxx>, discuss@xxxxxxxxxx,
tiwai@xxxxxxx
Subject: Re: 32-bit dma allocations on 64-bit platforms

On Wed, Jun 23, 2004 at 12:22:10PM -0700, ak@xxxxxx wrote:
> [I would suggest to post x86-64 specific issues to discuss@xxxxxxxxxx
> too]

thanks, I'll do that.

> I get from this that your hardware cannot DMA to >32bit.

correct.


> First vmalloc_32 is a rather broken interface and should imho
> just be removed. The function name just gives promises that cannot
> be kept. It was always quite bogus. Please don't use it.

sure. we haven't used this to allocate dma memory in a long time. I was
just using it as an example in showing how difficult it was to allocate
32-bit memory. vmalloc essentially ends up being a
__get_free_page(GFP_*)
call.

> The situation on EM64T is very unfortunate I agree. There was a reason
> we asked AMD to add an IOMMU and it's quite bad that the Intel chipset
> people ignored that wisdom and put us into this compatibility mess.

> Failing that it would be best if the other PCI DMA hardware
> could just address enough memory, but that's less realistic
> than just fixing the chipset.

also note that even if pci express hw was fixed, the machines still have
pci slots, which any old pci card can be plugged into.

> The x86-64 port had decided early to keep the 16MB GFP_DMA zone
> to get maximum driver compatibility and because the AMD IOMMU gave
> us an nice alternative over bounce buffering.

that was a very understandable decision. and I do agree that using the
AMD IOMMU is a very nice architecture. it is unfortunate to have to deal
with this on EM64T. Will AMD's pci-express chipsets still maintain an
IOMMU, even if it's not needed for AGP anymore? (probably not public
information, I'll check via my channels).

> I must say I'm somewhat reluctant to break an working in tree driver.
> Especially for the sake of an out of tree binary driver. Arguably the
> problem is probably not limited to you, but it's quite possible that
> even the in tree DRI drivers have it, so it would be still worth to
> fix it.

agreed. I completely understand that there is no desire to modify the
core kernel to help our driver. that's one of the reasons I looked
through the other drivers, as I suspect that this is a problem for
many drivers. I only looked through the code for each briefly, but
didn't see anything to handle this. I suspect it's more of a case that
the drivers have not been stress tested on an x86_64 machine w/ 4+ G
of memory.

also interesting to hear about some of the pci devices with limited
addressing capability.

> I see two somewhat realistic ways to handle this:

either of those approaches sounds good. keeping compatibility with older
devices/drivers is certainly a good thing.

can the core kernel handle multiple new zones? I haven't looked at the
code, but the zones seem to always be ZONE_DMA and ZONE_NORMAL, with
some architectures adding ZONE_HIMEM at the end. if you add a
ZONE_DMA_32 (or whatever) between ZONE_DMA and ZONE_NORMAL, will the
core vm code be able to handle that? I guess one could argue if it
can't yet, it should be able to, then each architecture could create
as many zones as they wanted.

another brainstorm: instead of counting on just a large-grained zone and
call to __get_free_pages() returning an allocation within a given
bit-range, perhaps there could be large-grained zones, with a
fine-grained hint used to look for a subset within the zone. for example,
there could be a DMA32 zone, but a mask w/ 24- or 29- bits enabled could
be used to scan the DMA32 zone for a valid address. (don't know how well
that fits into the current architecture).

Thanks,
Terence

----- End forwarded message -----
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/