Re: Thread-private mappings and graphics (was Re: Per-Processor Data

James Simmons (jsimmons@edgeglobal.com)
Fri, 17 Dec 1999 09:29:16 -0500 (EST)


> I don't understand why people are so hung up about page faults.
>
> I think it's ENTIRELY because of historical baggage, and the particular
> implementation under Irix.
>
> What I'm surprised about is that nobody seems to just come out and say:
>
> Page faults are BAD. Playing with the page tables is EXPENSIVE. Page
> faults fundamentally are NOT thread-safe, because page tables are
> fundamentally shared among threads.

The thread sharing under linux is true. But their are advatanges to using
page faults. One on low end hardware you should NEVER allow direct access
to DMA from userland. On good hardware the DMA engine test the state of
the CPU to see if a privileged command will be processed. So these
commands will only be processed by OS whereas the other commands can come
from userland. This is not true for crap hardware. Now I know you can't
say that X hasn't had security holes before. You can have a possible
security hole. You can send commands like copy the 16 Meg framebuffer to
the first 16 megs of system memory. You just hosed the system. Now you
could do all DMA commands by ioctl calls (slooow) so the kernel can parse
them and then send them to the real DMA buffer. Or you can implement ping
pong buffers. I assume you are familar with the concept so I will not
waste my time explaining it. As you know the page fault is how you
impement this. You lose some performace in exchange for system security
and stablility. But this is much much better performace than the the
ioctl method. The GLX group already use a psuedo DMA buffer to handle this
and the performace lose is very small. The same for MMIO regions. Most
card are brain dead in that they have mode setting register with 3D
registers. So if you just mmap the MMIO region a userland app has the
chance to program the wrong registers. You can fry your video card. Now
you can say only X should have access to this region. Well this defeats
the idea of direct rendering. That ordinary apps can bypass the X protocal
to have direct access to video hardware for speed improvements. Again
you can use page fault tricks to protect yourself.

So if you want a system thats fast but unstable and not secure then just
plain mmap the DMA and MMIO regions to userland. Just let the programers
beware that program linix directly is unstable, unsafe and can destory
your hardware. If you want a system that secure and stable then you will
lose some performace. You just have to make it as small as possible. The
other option is not to support low end hardware for the next 5 years until
standard hardware is sane. To make it secure you could use ioctls or page
fault handlers. Which would you prefere? For 3D hardware you wouldn't
want to use a switch statement. You coudl use a table lookup that would
in time grow out of control amd not be standard between cards.

> Ok. Nobody else said it, so now I have.
>
> YOU SHOULD NOT PLAY MM GAMES! They do not scale in SMP, they do not scale
> with threads, and the costs of missing are absolutely huge. The whole
> thing is also extremely hard to debug, and implies a much tighter coupling
> between the kernel and the X server than there should ever be!

First one of the biggest complants about DRI is that it tightly coupled
with X. You need X to use the accel engine with DRI. Linux is a hackers
OS. People like making their own thing. With DRI you force people to
resort to X to use graphics. Their are other projects like OFBis, GGI,
Berlin, libsvga that would like to use the accel engine. You have to find
a way to share the graphics context state and lock state to all possible
graphical systems. You could have a special DRI library that everyone
could use.

Why do I think having a special DRI library is a bad idea. Make sure the
DRI API doesn't change in the future because of a flaw in its design. Else
you break every graphical system on Linux that would depend on it. One of
nice things about internal driver APIs is you can change them without
usually affecting userland APIs. This is especially true for stable
kernels.

For a system thats doesn't SMP scale well and has huge performace cost
then its pretty amazing that this technique produces real time graphics on
SGI machines with 128 CPUs and 16 video cards.

> You can do a _regular_ SMP-safe lock with _real_ thread safety and no
> faulting behaviour in a few instructions. We're talking maybe 50 cycles
> here - about 40 cycles for the actual two locked instructions, and a very
> generous 10 cycles to check whether you are the old owner and going to the
> switch routine if not).

[performace snip ..]

True but its less safe. When a process wants a lock it calls drm_lock_take.
This funtion returns 1 if the lock is taken or 0 if free. Now in theory a
well behave program listen to this and if the lock is taken goes and does
something else. Else it grabs the lock. A rogue program would ignore it
and send DMA commands anyways thus locking the machine on poor hardware.
Unless DRI has away to force a program to obey the lock no matter what
this is not safe.

To truly see which is the better method once I have completed the new
linux SGI DRE I will test it to compare it to DRI for performance,
stablility, and security with higher end cards. Then with the above
methods I described test it on lower end cards. Only code results will
show who is right or wrong. Then that should determine which should be
supported.

James Simmons (o_
fbdev/gfx developer (o_ (o_ //\
http://www.linux-fbdev.org (/)_ (/)_ V_/_
http://linuxgfx.sourceforge.net

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/