Re: [PATCH] [request for inclusion] Realtime LSM

From: Con Kolivas
Date: Thu Jan 13 2005 - 21:11:40 EST


utz lehmann wrote:
On Thu, 2005-01-13 at 16:25 -0500, Lee Revell wrote:

On Thu, 2005-01-13 at 22:07 +0100, Arjan van de Ven wrote:

On Thu, Jan 13, 2005 at 03:04:26PM -0600, Jack O'Quin wrote:

(Probably, this simplistic analysis misses some other, more subtle,
factors.)

I think you can do nasty things to the locks held by those threads too


RT threads should not do FS writes of their own. But, a badly broken
or malicious one could, I suppose. So, that might provide a mechanism
for losing more data than usual. Is that what you had in mind?

basically yes.
note that "FS writes" can come from various things, including library calls
made and such. But I think you got my point; even though it might seem a bit
theoretical it sure is unpleasant.


I added Con to the cc: because this thread is starting to converge with
an email discussion we've been having.

The basic issue is that the current semantics of SCHED_FIFO seem make
the deadlock/data corruption due to runaway RT thread issue difficult.
The obvious solution is a new scheduling class equivalent to SCHED_FIFO
but with a mechanism for the kernel to demote the offending thread to
SCHED_OTHER in an emergency. The problem can be solved in userspace
with a SCHED_FIFO watchdog thread that runs at a higher RT priority than
all other RT processes.

This all seems to imply that introducing an rlimit for MAX_RT_PRIO is an
excellent solution. The RT watchdog thread could run as root, and the
rlimit would be used to ensure than even nonroot users in the RT group
could never preempt the watchdog thread.


Just an idea. What about throttling runaway RT tasks?
If the system spend more than 98% in RT tasks for 5s consider this as a
_fatal error_. Print an error message and throttle RT tasks by inserting
ticks where only SCHED_OTHER tasks allowed. For a limit of 98% this
means one SCHED_OTHER only tick all 50 ticks.

The limit and timeout should be configurable and of course it can be
disabled.

I know this is against RT task preempt all SCHED_OTHER but this is only
for a fatal system state to be able to recover sanely. A locked up
machine is is the worse alternative.

There is a patch in -mm currently designed to use a sysrq key combination which converts all real time tasks to sched normal to save you if you desire in a lockup situation. We do want to preserve RT scheduling behaviour at all times without caveats for privileged users.

Cheers,
Con

Attachment: signature.asc
Description: OpenPGP digital signature