GPU driver misbehavior [Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q9]

From: Timothy Miller
Date: Tue Oct 05 2004 - 15:42:44 EST




Lee Revell wrote:


"Misbehaving video card drivers are another source of significant delays
in scheduling user code. A number of video cards manufacturers recently
began employing a hack to save a PCI bus transaction for each display
operation in order to gain a few percentage points on their WinBench
[Ziff-Davis 98] Graphics WinMark performance.

The video cards have a command FIFO that is written to via the PCI bus.
They also have a status register, read via the PCI bus, which says
whether the command FIFO is full or not. The hack is to not check
whether the command FIFO is full before attempting to write to it, thus
saving a PCI bus read.

The problem with this is that the result of attempting to write to the
FIFO when it is full is to stall the CPU waiting on the PCI bus write
until a command has been completed and space becomes available to accept
the new command. In fact, this not only causes the CPU to stall waiting
on the PCI bus, but since the PCI controller chip also controls the ISA
bus and mediates interrupts, ISA traffic and interrupt requests are
stalled as well. Even the clock interrupts stop.

These video cards will stall the machine, for instance, when the user
drags a window. For windows occupying most of a 1024x768 screen on a
333MHz Pentium II with an AccelStar II AGP video board (which is based
on the 3D Labs Permedia 2 chip set) this will stall the machine for
25-30ms at a time!"

I would expect that I'm not the first to think of this, but I haven't seen it mentioned, so it makes me wonder. Therefore, I offer my solution.

Whenever you read the status register, keep a copy of the "number of free fifo entries" field. Whenever you're going to do a group of writes to the fifo, you first must check for enough free entries. The macro that does this checks the copy of the status register to see if there were enough free the last time you checked. If so, deduct the number of free slots you're about to use, and move on. If not, re-read the status register and loop or sleep if you don't have enough free.

The copy of the status register will always be "correct" in that it will always report a number of free entries less than or equal to the actual number, and it will never report a number greater than what is available (barring a hardware glitch of a bug which is bad for other reasons). This is because you're assuming the fifo doesn't drain, when in fact, it does.

This results in nearly optimal performance, because usually you end up reading the status register mostly when the fifo is full (a time when extra bus reads don't hurt anything). If you have a 256-entry fifo, then you end up reading the status register once for ever 256 writes, for a performance loss of only 0.39%, and you ONLY get this performance loss when the fifo drains faster than you can fill it.

One challenge to this is when you have more than one entity trying to access the same resource. But in that case, you'll already have to be using some sort of mutex mechanism anyhow.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/