stop_machine question

From: xtarke
Date: Tue Jun 14 2011 - 17:17:22 EST


Hi guys,

I've been made some study in System Management Interrupts (SMI) using Linux. Looking into lwn.net, I found a module called "Hardware Latency Detector (formerly SMI detector)" written by Jon Masters a few years ago, as Jon says (http://lwn.net/Articles/337018/):

"This is a loadable module that grabs the CPU for
configurable periods of time (all under stop_machine()) and samples the TSC
looking for discontinuity. If observed latencies exceed a threshold (for
example caused by an System Management Interrupt or similar) then the
event is recorded in a global ring_buffer, readable via debugfs."

I modified the module to grab the TSC directly as you can see in the "get_sample" code below:

static int get_sample(void *unused)
{

(...)

do {
t2 = ktime_get();

rdtscll(tsc[1]);

total = ktime_to_us(ktime_sub(t2, start)); /* sample width */
//diff = ktime_to_us(ktime_sub(t2, t1)); /* current diff */

diff = (s64)(tsc[1] - tsc[0]); /* tsc diff*/

/* This shouldn't happen */
if (diff < 0) {
printk(KERN_ERR BANNER "time running backwards\n");
goto out;
}


if (diff > data.threshold){
overthrc++;
sample = diff; /* only want highest value */
s.timestamp = tsc[1];
}

tsc[0] = tsc[1];

count++;

} while (total <= data.sample_width);

(...)
}

The function get_sample is invoked by a kernel thread coded below:

static int kthread_fn(void *unused)
{
(...)

while (!kthread_should_stop()) {

mutex_lock(&data.lock);

err = stop_machine(get_sample, unused, &cpus);
if (err) {
/* Houston, we have a problem */
mutex_unlock(&data.lock);
goto err_out;
}

interval = data.sample_window - data.sample_width;
do_div(interval, USEC_PER_MSEC); /* modifies interval value */

mutex_unlock(&data.lock);

if (data.count > data.max_count){
enabled = 0;
stop_kthread();
wake_up(&data.wq);
data.count = 0;
}

if (msleep_interruptible(interval))
goto out;
(...)

}

Running this code in kernel 2.6.37.6 (Slackware 13.37) there is a huge discontinuity, (diff = (s64)(tsc[1] - tsc[0])), about 25000 cycles, just when I run this code with a msleep_interruptible() in the kernel thread. When I remove it, the problem disappears. Moreover, the discontinuity appears always in the same time in the get_sample's while and never in the first execution of the kthread_fn's while. The kernel trace doesn't shows get_sample to be interrupted.

My question, is possible to some kernel code run inside stop_machine? In other words, why get_sample seems to be interrupted when I use a sleep?

Any help will be apreeciated.

Thanks a lot,

Renan Augusto Starke

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz
stepping : 2
cpu MHz : 2133.336
cache size : 2048 KB

The system is a Dell Optiplex 745.

Xtarke under Slackware 13.1
Linux user ID #354257
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/