Re: Interacting with coherent memory on external devices

From: Christoph Lameter
Date: Wed Apr 22 2015 - 14:18:07 EST


On Wed, 22 Apr 2015, Jerome Glisse wrote:

> Now if you have the exact same address space then structure you have on
> the CPU are exactly view in the same way on the GPU and you can start
> porting library to leverage GPU without having to change a single line of
> code inside many many many applications. It is also lot easier to debug
> things as you do not have to strungly with two distinct address space.

Right. That already works. Note however that GPU programming is a bit
different. Saying that the same code runs on the GPU is strong
simplification. Any effective GPU code still requires a lot of knowlege to
make it work in a high performant way.

The two distinct address spaces can be controlled already via a number of
mechanisms and there are ways from either side to access the other one.
This includes mmapping areas from the other side.

If you really want this then you should even be able to write a shared
library that does this.

> Finaly, leveraging transparently the local GPU memory is the only way to
> reach the full potential of the GPU. GPU are all about bandwidth and GPU
> local memory have bandwidth far greater than any system memory i know
> about. Here again if you can transparently leverage this memory without
> the application ever needing to know about such subtlety.

Well if you do this transparently then the GPU may not have access to its
data when it needs it. You are adding demand paging to the GPUs? The
performance would suffer significantly. AFAICT GPUs are not designed to
work like that and would not have optimal performance with such an
approach.

> But again let me stress that application that want to be in control will
> stay in control. If you want to make the decission yourself about where
> things should end up then nothing in all we are proposing will preclude
> you from doing that. Please just think about others people application,
> not just yours, they are a lot of others thing in the world and they do
> not want to be as close to the metal as you want to be. We just want to
> accomodate the largest number of use case.

What I think you want to do is to automatize something that should not be
automatized and cannot be automatized for performance reasons. Anyone
wanting performance (and that is the prime reason to use a GPU) would
switch this off because the latencies are otherwise not controllable and
those may impact performance severely. There are typically multiple
parallel strands of executing that must execute with similar performance
in order to allow a data exchange at defined intervals. That is no longer
possible if you add variances that come with the "transparency" here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/