Re: [PATCH v2 00/25] AMDKFD kernel driver

From: Oded Gabbay
Date: Tue Jul 22 2014 - 04:11:30 EST


On 22/07/14 10:23, Daniel Vetter wrote:
On Mon, Jul 21, 2014 at 10:23:43PM +0300, Oded Gabbay wrote:
But Jerome, the core problem still remains in effect, even with your
suggestion. If an application, either via userspace queue or via ioctl,
submits a long-running kernel, than the CPU in general can't stop the
GPU from running it. And if that kernel does while(1); than that's it,
game's over, and no matter how you submitted the work. So I don't really
see the big advantage in your proposal. Only in CZ we can stop this wave
(by CP H/W scheduling only). What are you saying is basically I won't
allow people to use compute on Linux KV system because it _may_ get the
system stuck.

So even if I really wanted to, and I may agree with you theoretically on
that, I can't fulfill your desire to make the "kernel being able to
preempt at any time and be able to decrease or increase user queue
priority so overall kernel is in charge of resources management and it
can handle rogue client in proper fashion". Not in KV, and I guess not
in CZ as well.

At least on intel the execlist stuff which is used for preemption can be
used by both the cpu and the firmware scheduler. So we can actually
preempt when doing cpu scheduling.

It sounds like current amd hw doesn't have any preemption at all. And
without preemption I don't think we should ever consider to allow
userspace to directly submit stuff to the hw and overload. Imo the kernel
_must_ sit in between and reject clients that don't behave. Of course you
can only ever react (worst case with a gpu reset, there's code floating
around for that on intel-gfx), but at least you can do something.

If userspace has a direct submit path to the hw then this gets really
tricky, if not impossible.
-Daniel


Hi Daniel,
See the email I just sent to Jerome regarding preemption. Bottom line, in KV, we can preempt running queues, except from the case of a stuck gpu kernel. In CZ, this was solved.

So, in this regard, I don't think there is any difference between userspace queues and ioctl.

Oded
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/