Re: Thread-private mappings and graphics (was Re: Per-Processor Data Page)

Raul Miller (moth@magenta.com)
Thu, 16 Dec 1999 17:31:37 -0500


On Thu, Dec 16, 1999 at 11:55:54AM -0800, Jim Gettys wrote:
> > What do benchmarks look like -- comparing "swapping card state" vs.
> > "keep opengl state off-board and card actions atomic"?

> Benchmarks have always been that it is much better to keep the state
> in the card as much as possible, and all the hardware trends are that
> this is more and more important, as processor speeds increase faster
> than I/O speeds.
>
> You can't afford to load a ton of registers to the right state and
> then draw a single line: you are bus limited on graphics in decent
> implementations, even for 2D graphics.

That doesn't really address the issue of swapping card state from
one context to another.

If you're swapping a lot, that would seem to have all the cost of not
putting all the registers in the card, while if you're not swapping
often it doesn't seem like there's a whole lot of advantage to one
methodology over another.

I can see that there's a latency issue if you defer loading the
card with data until you have enough data -- but if you optimistically
load the data and have the capability of aborting and loading a
data from a different context when needed, well... maybe that still
turns out to be slower than paging graphics contexts in and out of
the card. But the question is:

how *much* slower is it? Although it matters to some people, a drop
from 30fps to 29fps isn't completely outrageous.

[Of course, even that isn't really relevant to the issue of thread private
mappings, unless there's some requirement that all threads store their
graphical information at the same virtual address. I suppose there are
people who wouldn't dream of using anything other than absolute addresses,
because of the speed penalty... but perhaps such people should be using
their own hand-built instances of the graphics libraries as well.]

> Even then you have a synchronization problem (among multiple clients
> in the DRI case), or if there were a multithreaded X server (it was
> tried, but was substantially slower than a single threaded X server):
> you have a bunch of registers that have to be correct at the time the
> rendering operation is actually executed.
>
> And yes, the hardware trend is toward decent hardware that you can
> actually start and stop successfully: will be a while before it is
> pervasive. The wheel of invention has a full turn yet in the PC market
> to get there.

Here's where I get to display my ignorance: Are you saying there's no
good way of aborting partially set-up state on current pc hardware
(e.g. voodoo)? Or just that you need to keep track of such state
outside the board -- that you have to go through a whole sequence of
setup operations and that recording the requisite data both off- and
on-board has a measurable speed penalty (5%, 20%, ?)?

Thanks,

-- 
Raul

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/