RT scheduling and a way to make a process hang, unkillable

From: Corey Hickey
Date: Sat Feb 14 2009 - 20:17:16 EST


Hello,

I've encountered a bit of a problem in recent kernels that include
"Group scheduling for SCHED_RR/FIFO": it is possible for a process run
by root to hang itself and become unkillable--even by a 'kill -9'.

The following kernel options must be set:
CONFIG_GROUP_SCHED=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_USER_SCHED=y

The procedure is for a program to:
1. run as root
2. set SCHED_FIFO
3. change UID to a user with no realtime CPU share allocated

I'm attaching a test program that does exactly this. Run it with no
arguments or examine the print_usage() function to see detailed
information. Briefly, though, it should be run as root with the path to
a program to exec, like:
# ./hangme /bin/bash

The program hangs in a "running" state, like this:
nobody 4357 0.0 0.0 904 16 pts/1 R+ 16:09 0:00 /bin/bash

The only way to kill the program is to allocate the corresponding user
some realtime CPU share:
echo 10000 > /sys/kernel/uids/65534/cpu_rt_runtime


This may or may not actually be a bug, but I think it's at least
confusing and unexpected. I had a difficult time narrowing this down
from a problem I was having with Debian's slmodemd package. I think it
would be much nicer for setuid() to return an error if the process is
realtime and the target user doesn't have any CPU share allocated (if
that's feasible).


This problem is similar in principle to a bug reported by Rafael J.
Wysocki on 2008-02-01, and which was subsequently fixed:

http://lkml.org/lkml/2008/1/31/490
http://lkml.org/lkml/2008/2/4/332

If I understand correctly, that was a case in which a program would hang
by doing the following:
1. run setuid-root
2. set SCHED_FIFO
3. change effective UID to match real UID

The difference in my case is that the program is running with root's
real UID as well as effective UID, so, at the time SCHED_FIFO is set,
there's no reason to deny realtime priority. My program changes real UID
_after_ setting SCHED_FIFO, and that's what causes the hang.


I've run my test program, with the same results, on the following kernels:
2.6.26
2.6.28
2.6.29-rc5

Warning! Under 2.6.28 it is impossible to allocate users CPU share, and
the program will not be killable:

http://lkml.org/lkml/2009/1/14/113


I'm also attaching my kernel configuration. Please let me know if you'd
like more information or for me to test a patch.

Thank you,
Corey
#include <stdio.h>
#include <sys/resource.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#include <sched.h>
#include <errno.h>
#include <limits.h>

#define UID 65534

int adjust_priority() {
struct sched_param sp;
sp.sched_priority = sched_get_priority_max(SCHED_FIFO);
if (sched_setscheduler(0, SCHED_FIFO, &sp) != -1) {
printf("running as SCHED_FIFO, priority %d\n", sp.sched_priority);
return 1;
} else {
puts("could not set SCHED_FIFO");
return 0;
}
}

int change_uid() {
if (setuid(UID)) {
int err = errno;
printf("setuid() failed: %s\n", strerror(err));
return 0;
}
printf("UID set to %d\n", UID);
return 1;
}


void print_usage() {
printf(
"This program attempts to illustrate a way by which a process can hang and\n"
"become unkillable:\n"
"1. run as root\n"
"2. set SCHED_FIFO\n"
"3. change to a user with no realtime CPU share allocated\n"
"\n"
"The following kernel options must be set:\n"
"CONFIG_GROUP_SCHED=y\n"
"CONFIG_RT_GROUP_SCHED=y\n"
"CONFIG_USER_SCHED=y\n"
"\n"
"Warning: do not try this under Linux 2.6.28. Due to a bug, you will not be\n"
"able to write to cpu_rt_runtime:\n"
"http://lkml.org/lkml/2009/1/14/113\n";
"This appears to have been fixed in 2.6.29-rc2, but not yet in 2.6.28.5\n"
"\n"
"If you're running a 2.6.29-rc kernel, you should lower root's\n"
"cpu_rt_runtime first:\n"
"# echo 900000 > /sys/kernel/uids/0/cpu_rt_runtime\n"
"\n"
"usage: ./hangme </path/to/executable> [arg1 ...]\n"
"example: ./hangme /bin/bash\n"
"\n"
"The executable specified should take some time to run, otherwise it may\n"
"complete and exit normally within the current time slice (I assume).\n"
"Running a shell is ideal.\n"
);
}

void print_msg() {
printf(
"\n"
"I am now going to change my UID. See if I hang.\n"
"If I do, try to kill me:\n"
"# kill -9 %d\n"
"Once you're ready to make me unhang:\n"
"# echo 10000 > /sys/kernel/uids/%d/cpu_rt_runtime\n"
"\n"
"Note: if you get an \"Invalid argument\" error with 2.6.29-rc kernels,\n"
"try lowering root's runtime like this:\n"
"# echo 900000 > /sys/kernel/uids/0/cpu_rt_runtime\n"
"...but it doesn't seem to work once this program is running!\n"
"Is that another bug or do I misunderstand?\n"
"\n"
, getpid(), UID);
}

int main(int argc, char **argv)
{
if (argc <= 1) {
print_usage();
return 2;
}
if (getuid()) {
printf("this test only works when run as root\n");
return 2;
}
if (!adjust_priority()) {
return 1;
}
print_msg();
if (! change_uid()) {
return 1;
}
++argv;
printf("going to exec: %s\n", argv[0]);
execv(argv[0], argv);
printf("exec failed\n");
return 1;
}

Attachment: config-2.6.29-rc5.bz2
Description: Binary data