Re: [PATCH] x86: Add an explicit barrier() to clflushopt()

From: Chris Wilson
Date: Tue Jan 12 2016 - 11:39:12 EST


On Mon, Jan 11, 2016 at 09:05:06PM +0000, Chris Wilson wrote:
> I can narrow down the principal buggy path by doing the clflush(vend-1)
> in the callers at least.

That leads to the suspect path being a read back of a cache line from
main memory that was just written to by the GPU. Writes to memory before
using them on the GPU do not seem to be affected (or at least we have
sufficient flushing in sending the commands to the GPU that we don't
notice anything wrong).

And back to the oddity.

Instead of doing:

clflush_cache_range(vaddr + offset, size);
clflush(vaddr+offset+size-1);
mb();
memcpy(user, vaddr+offset, size);

what also worked was:

clflush_cache_range(vaddr + offset, size);
clflush(vaddr);
mb();
memcpy(user, vaddr+offset, size);

(size is definitely non-zero, offset is offset_in_page(), vaddr is from
kmap_atomic()).

i.e.

void clflush_cache_range(void *vaddr, unsigned int size)
{
const unsigned long clflush_size = boot_cpu_data.x86_clflush_size;
void *p = (void *)((unsigned long)vaddr & ~(clflush_size - 1));
void *vend = vaddr + size;

if (p >= vend)
return;

mb();

for (; p < vend; p += clflush_size)
clflushopt(p);

clflushopt(vaddr);

mb();
}

I have also confirmed that this doesn't just happen for single
cachelines (i.e. where the earlier clflush(vend-1) and this clflush(vaddr)
would be equivalent).

At the moment I am more inclined this is serialising the clflush()
(since clflush to the same cacheline is regarded as ordered with respect
to the earlier clflush iirc) as opposed to the writes not landing timely
from the GPU.

Am I completely going mad?
-Chris

--
Chris Wilson, Intel Open Source Technology Centre