Re: [PATCH] dma-mapping: fix page attributes for dma_mmap_*

From: Russell King - ARM Linux admin
Date: Tue Aug 06 2019 - 12:45:38 EST


On Tue, Aug 06, 2019 at 05:08:54PM +0100, Will Deacon wrote:
> On Sat, Aug 03, 2019 at 08:48:12AM +0200, Christoph Hellwig wrote:
> > On Fri, Aug 02, 2019 at 11:38:03AM +0100, Will Deacon wrote:
> > >
> > > So this boils down to a terminology mismatch. The Arm architecture doesn't have
> > > anything called "write combine", so in Linux we instead provide what the Arm
> > > architecture calls "Normal non-cacheable" memory for pgprot_writecombine().
> > > Amongst other things, this memory type permits speculation, unaligned accesses
> > > and merging of writes. I found something in the architecture spec about
> > > non-cachable memory, but it's written in Armglish[1].
> > >
> > > pgprot_noncached(), on the other hand, provides what the architecture calls
> > > Strongly Ordered or Device-nGnRnE memory. This is intended for mapping MMIO
> > > (i.e. PCI config space) and therefore forbids speculation, preserves access
> > > size, requires strict alignment and also forces write responses to come from
> > > the endpoint.
> > >
> > > I think the naming mismatch is historical, but on arm64 we wanted to use the
> > > same names as arm32 so that any drivers using these things directly would get
> > > the same behaviour.
> >
> > That all makes sense, but it totally needs a comment. I'll try to draft
> > one based on this. I've also looked at the arm32 code a bit more, and
> > it seems arm always (?) supported Normal non-cacheable attribute, but
> > Linux only optionally uses it for arm v6+ because of fears of drivers
> > missing barriers.
>
> I think it was also to do with aliasing, but I don't recall all of the
> details.

ARMv6+ is where the architecture significantly changed to introduce
the idea of [Normal, Device, Strongly Ordered] where Normal has the
cache attributes.

Before that, we had just "uncached/unbuffered, uncached/buffered,
cached/unbuffered, cached/buffered" modes.

The write buffer (enabled by buffered modes) has no architected
guarantees about how long writes will sit in it, and there is only
the "drain write buffer" instruction to push writes out.

Up to and including ARMv5, we took the easy approach of just using
the "uncached/unbuffered" mode since that is (a) the safest, and (b)
avoids write buffers that alias when there are multiple different
mappings.

We could have used a different approach, making all IO writes contain
a "drain write buffer" instruction, and map DMA memory as "buffered",
but as there were no Linux barriers defined to order memory accesses
to DMA memory (so, for example, ring buffers can be updated in the
correct order) back in those days, using the uncached/unbuffered mode
was the sanest and most reliable solution.

>
> > The other really weird things is that in arm32
> > pgprot_dmacoherent incudes the L_PTE_XN bit, which from my understanding
> > is the no-execture bit, but pgprot_writecombine does not. This seems to
> > not very unintentional. So minus that the whole DMA_ATTR_WRITE_COMBÐNE
> > seems to be about flagging old arm specific drivers as having the proper
> > barriers in places and otherwise is a no-op.
>
> I think it only matters for Armv7 CPUs, but yes, we should probably be
> setting L_PTE_XN for both of these memory types.

Conventionally, pgprot_writecombine() has only been used to change
the memory type and not the permissions. Since writecombine memory
is still capable of being executed, I don't see any reason to set XN
for it.

If the user wishes to mmap() using PROT_READ|PROT_EXEC, then is there
really a reason for writecombine to set XN overriding the user?

That said, pgprot_writecombine() is mostly used for framebuffers, which
arguably shouldn't be executable anyway - but who'd want to mmap() the
framebuffer with PROT_EXEC?

>
> > Here is my tentative plan:
> >
> > - respin this patch with a small fix to handle the
> > DMA_ATTR_NON_CONSISTENT (as in ignore it unless actually supported),
> > but keep the name as-is to avoid churn. This should allow 5.3
> > inclusion and backports
> > - remove DMA_ATTR_WRITE_COMBINE support from mips, probably also 5.3
> > material.
> > - move all architectures but arm over to just define
> > pgprot_dmacoherent, including a comment with the above explanation
> > for arm64.
>
> That would be great, thanks.
>
> > - make DMA_ATTR_WRITE_COMBINE a no-op and schedule it for removal,
> > thus removing the last instances of arch_dma_mmap_pgprot
>
> All sounds good to me, although I suppose 32-bit Arm platforms without
> CONFIG_ARM_DMA_MEM_BUFFERABLE may run into issues if DMA_ATTR_WRITE_COMBINE
> disappears. Only one way to find out...

Looking at the results of grep, I think only OMAP2+ and Exynos may be
affected.

However, removing writecombine support from the DMA API is going to
have a huge impact for framebuffers on earlier ARMs - that's where we
do expect framebuffers to be mapped "uncached/buffered" for performance
reasons and not "uncached/unbuffered". It's quite literally the
difference between console scrolling being usable and totally unusable.

Given what I've said above, switching to using buffered mode for normal
DMA mappings is data-corrupting risky - as in your filesystem could get
fried. I don't think we should play fast and loose with people's data
by randomly changing that "because we'd like to", and I don't see that
screwing the console is really an option either.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up