Re: How how latent should non-preemptive scheduling be?

From: Sitsofe Wheeler
Date: Fri Sep 19 2008 - 07:55:24 EST


Ingo Molnar wrote:
here's two quick howtos:

http://redhat.com/~mingo/sched-devel.git/readme-tracer.txt
http://redhat.com/~mingo/sched-devel.git/howto-trace-latencies.txt

These two files appear to be identical. Is this intentional?

Anyway after following your instructions a putting together a small script to dice the output, I collated the 10 switches which took the longest:

# Top ten longest switches
# Rel TS Process Abs TS
0.122161 hald-3423 1867.821170 ***
0.039438 <idle>-0 1867.379054
0.036318 hald-3423 1867.669009
0.031362 <idle>-0 1868.002762
0.030000 hald-3423 1867.699009
0.028933 <idle>-0 1867.529238
0.028539 <idle>-0 1867.228861
0.028196 <idle>-0 1867.128731
0.027763 <idle>-0 1868.101449
0.027513 <idle>-0 1867.028606

# tracer: sched_switch from around longest switch
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
<idle>-0 [00] 1867.608017: 0:140:R + 3:115:S
<idle>-0 [00] 1867.608038: 0:140:R + 3423:120:D
<idle>-0 [00] 1867.608045: 0:140:R ==> 3:115:R
ksoftirqd/0-3 [00] 1867.608048: 3:115:S ==> 3423:120:R
hald-3423 [00] 1867.629350: 3423:120:R + 6096:120:S
hald-3423 [00] 1867.632691: 3423:120:R + 3827:120:S
hald-3423 [00] 1867.669009: 3423:120:R + 3998:120:S
hald-3423 [00] 1867.699009: 3423:120:R + 6097:120:S
***hald-3423 [00] 1867.821170: 3423:120:R ==> 6096:120:R
rhythmbox-6096 [00] 1867.821219: 6096:120:S ==> 6097:120:R
rhythmbox-6097 [00] 1867.821262: 6097:120:R + 3826:120:S
rhythmbox-6097 [00] 1867.821289: 6097:120:S ==> 3826:120:R
pulseaudio-3826 [00] 1867.821332: 3826:120:R + 6097:120:S
pulseaudio-3826 [00] 1867.821374: 3826:120:S ==> 6097:120:R
rhythmbox-6097 [00] 1867.821380: 6097:120:S ==> 3998:120:R
rhythmbox-3998 [00] 1867.821709: 3998:120:S ==> 3827:120:R
pulseaudio-3827 [00] 1867.824041: 3827:120:R + 3826:120:S


LatencyTOP version 0.4 (C) 2008 Intel Corporation

Cause Maximum Percentage
Scheduler: waiting for cpu 152.4 msec 13.8 %
Userspace lock contention 5.0 msec 68.0 %
Waiting for event (poll) 5.0 msec 14.3 %
Waiting for event (select) 4.9 msec 3.5 %
msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu 1.9 msec 0.1 %
msleep acpi_ec_wait acpi_ec_transaction acpi_ec_re 1.9 msec 0.2 %
msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu 1.9 msec 0.1 %
Executing raw SCSI command 1.1 msec 0.0 %
Waiting for TTY to finish sending 0.4 msec 0.0 %


Process rhythmbox (3998) Total: 328.3 msec
Scheduler: waiting for cpu 152.4 msec 80.0 %
Userspace lock contention 4.1 msec 15.7 %
Waiting for event (poll) 2.7 msec 4.1 %


you need to enable:

CONFIG_SCHED_TRACER=y
CONFIG_CONTEXT_SWITCH_TRACER=y

I actually have both of these enabled but there's still no wakeup tracer (as mentioned in the git kernel documentation http://tinyurl.com/4f9s4l ). The good news is that your instructions don't need the wakeup tracer.

it's not particularly well named though. Why doesnt it say LATENCY_TRACER or something?

I agree it would be nicer if it had a better name.

Additionally I think I found a trigger - unplugging the power cable from the EeePC and having it run on battery seems to then set off this periodic stall every 30 seconds... There's no CPU frequency scaling enabled either (Celeron M's seemingly don't have P states and support for cpufreq is configured out).

sounds like potential SMM triggered latencies.

I have just gone away and read about the SMM ( http://blogs.msdn.com/carmencr/archive/2005/08/31/458609.aspx ). If you're right there is pretty much nothing that can be done about the problem : (
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/