possible cache aliasing problem with O_DIRECT?

From: Jun Sun (jsun@mvista.com)
Date: Tue Dec 10 2002 - 21:20:51 EST

I am chasing a problem which might be a cache aliasing problem
when a disk file is opened with O_DIRECT flag.

I attached the source code of two programs. One generates a binary file
and the other opens the file with O_DIRECT and reads it. It checks
the content of the file while reading it.

I tested this on a MIPS board with NEC vr5432 CPU, which has a
virtually indexed, two-way set associative d-cache, and can easily
re-produce the data corruption problem.

I attached a patch which apparently solves the problem.

I am not an expert in fs and mm, but my guess is:

1) user process allocates a big buffer
2) the user buffer is mapped into kernel virtual space for doing direct IO
   through map_user_kiobuf()
3) since the virtual address for buffer area is different in user space
   from that in kernel virtual, kernel should do a flush cache for those
   pages after doing the IO. That is why my attached patch makes it work.

Does this make sense?

However, I still have some puzzles. For it to work completely, another
cache flushing needs to be done for the address range of the buffer in user
space. I thought this should be done some where inside map_user_kiobuf()
but could not find it anywhere. Did I miss it? Or it just happens to work
even without it?

Another puzzling part is that I also tested the program on another couple
of MIPS boards which *should* suffer from this problem, but failed to
re-produce it.

Any thoughts?

