[RFC PATCH RESEND 0/4] Maintain the FIFO order for same priority RT tasks

From: Xunlei Pang
Date: Mon Apr 27 2015 - 02:50:18 EST


From: Xunlei Pang <pang.xunlei@xxxxxxxxxx>

1. TWO FIFO QUEUES
We know, there are two main queues each cpu for RT SMP scheduler. Let's
name them "run queue" and "pushable queue" respectively.

"run queue" is from the intra-cpu's perspective, while "pushable queue" is
from the inter-cpu's perspective. "pushable queue" also has a very close
relationship with task affinity. The two queues separately maintain their
FIFO order for tasks queued at the same priority.

If we got a wrong FIFO order of any of the two queues, we say it broke FIFO.


2. THE WAY THEY WORK
Currently, when an rt task gets queued, it can be put to the head or
tail of its "run queue" at the same priority according to different
scenarios. Further if it is migratable, it will also and always be put
to the tail of its "pushable queue" at the same priority because "plist"
is used to manage this queue in strict FIFO order.

a) If a task got enqueued or dequeued, it got added to or removed from the
"run queue", and also "pushable queue"(added to tail) if it is migratable.

b) If a runnable task acquires cpu(current task), it stays in the front of
the "run queue", but got removed from the "pushable queue".

c) If a runnable task releases cpu, it stays in the "run queue"(if it
releases cpu involuntarily, stays just behind new current in this queue;
if it releases cpu voluntarily like yield, usually was requeued behind
other tasks queued at the same priority in this queue), and got added to
the tail of its "pushable queue" at the same priority if it is migratable.

d) If a tasks's affinity is changed from one cpu to multiple cpus, it got
added to the tail of "pushable queue" at the same priority.

e) If a tasks's affinity is changed from multiple cpus to one cpu, it got
removed from the "pushable queue".


3. PROBLEMATIC CASES
Unfortunately, in current kernel implementations, as far as I know, c)
described in "2. THE WAY THEY WORK" have some cases that broke FIFO of
one of the two queues for tasks queued at the same priority.

- Case 1: current task gets preempted by a higher priority task
current will be enqueued behind other tasks with the same priority in the
"pushable queue", while it actually stays ahead of these tasks in the "run
queue". Afterwards, if there comes a pull from other cpu or a push from
local cpu, the task behind current in the "run queue" will be removed from
the "pushable queue" and gets running, as the global rt scheduler fetches
tasks from the head of the "pushable queue" to do the pulling or pushing.
The kernel broke FIFO in this case.

current task gets preempted by an equal priority task:
This is done in check_preempt_equal_prio(), and can be divided into 3 sub cases.
- Case 2 : p is the only one has the same priority as current.
p preempts current successfully, but the kernel still had the
same problem as in Case 1 during the preempting.
- Case 3 : p is not the only one has the same priority as current, let's
say q is already queued before p. so, in this case we should
choose q to do the preempting, not p. So the kernel broke FIFO in
this case.
- Case 4 : p is going to preempt current, but when doing the actual preempting,
current isn't pushed away due to the change of system status. so
eventually p becomes the new current and gets running, while
current is queued back waiting in the same "run queue". Obviously,
the kernel broke FIFO in this case.

NOTE: when a task's affinity is changed and it becomes migratable(see d) described
in "2. THE WAY THEY WORK"), in current kernel implementation it will be queued
behind other tasks with the same priority in the "pushable queue". This may seems
contratry to the corresponding order in the "run queue". But from "pushable queue"'s
prospective, when a task's affinity is changed and further got removed from or
added to the "pushable queue" because of that, it means requeuing. In my view, I
don't think this broke FIFO of the "pushable queue". Any discussion?

But for "case 1" and "case 2", the task eventually wasn't got removed from or added
to the "pushable queue" due to affinity changing, so its "pushable queue" FIFO
order should also stays unchanged, and must be the same as its "run queue" order.
So we say "case 1" and "case 2" broke FIFO of its "pushable queue".

4. SUMMARY
case 1 : broke its "pushable queue" FIFO order.
case 2 : broke its "pushable queue" FIFO order.
case 3 : broke its "run queue" FIFO order.
case 4 : broke its "run queue" FIFO order.

5. PATCHSET
Patch 1 : Fix "Case 3".
Patch 2 : Prerequisite for Patch 3 - Provide plist_add_head() for "pushable queue".
Patch 3 : Fix "Case 1" and "Case 2".
Patch 4 : Fix "Case 4".

Xunlei Pang (4):
sched/rt: Modify check_preempt_equal_prio() for multiple tasks queued
at the same priority
lib/plist: Provide plist_add_head() for nodes with the same prio
sched/rt: Fix wrong SMP scheduler behavior for equal prio cases
sched/rt: Requeue p back if the preemption initiated by
check_preempt_equal_prio_common() failed

include/linux/plist.h | 34 ++++++-
include/linux/sched.h | 5 +
include/linux/sched/rt.h | 16 +++
kernel/sched/core.c | 6 +-
kernel/sched/rt.c | 253 ++++++++++++++++++++++++++++++++++++++++-------
lib/plist.c | 28 +++++-
6 files changed, 299 insertions(+), 43 deletions(-)

--
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/