Re: CONFIG_PREEMPT and server workloads

From: Takashi Iwai
Date: Thu Mar 18 2004 - 10:31:14 EST


At Thu, 18 Mar 2004 07:03:58 +0100,
Andrea Arcangeli wrote:
>
> On Thu, Mar 18, 2004 at 05:00:01AM +0100, Marinos J. Yannikos wrote:
> > Hi,
> >
> > we upgraded a few production boxes from 2.4.x to 2.6.4 recently and the
> > default .config setting was CONFIG_PREEMPT=y. To get straight to the
> > point: according to our measurements, this results in severe performance
> > degradation with our typical and some artificial workload. By "severe" I
> > mean this:
>
> this is expected (see the below email, I predicted it on Mar 2000), keep
> preempt turned off always, it's useless. Worst of all we're now taking
> spinlocks earlier than needed, and the preempt_count stuff isn't
> optmized away by PREEMPT=n, once those bits will be fixed too it'll go
> even faster.
>
> preempt just wastes cpu with tons of branches in fast paths that should
> take one cycle instead.
>
> Takashi Iwai did lots of research on the preempt vs lowlatency and
> he found that preempt buys nothing and he confirmed my old theories

well, i personally am not against the current preempt mechanism from
the viewpoint of the audio-processing purpose :) the implementation
is relatively clean and easy.

but i agree with Andrea, that surely we can achieve the alsmo same
RT-performance even without preemption, i.e. with less perempt
overhead. it's not necessary to be default.

(snip)
> These fixes from Takashi Iwai brings 2.6 back in line with 2.4, I
> suggested to use EIP dumps from interrupts to get the hotspots, he
> promptly used the RTC for that and he could fixup all the spots, great
> job he did since now we've a very low worst case sched latency in 2.6
> too:
>
> --- linux/fs/mpage.c-dist 2004-03-10 16:26:54.293647478 +0100
> +++ linux/fs/mpage.c 2004-03-10 16:27:07.405673634 +0100
> @@ -695,6 +695,7 @@ mpage_writepages(struct address_space *m
> unlock_page(page);
> }
> page_cache_release(page);
> + cond_resched();
> spin_lock(&mapping->page_lock);
> }
> /*

the above one is the major source of RT-latency.
only this oneliner will reduce more than 90% of RT-latencies.
in my case with reiserfs, i got 0.4ms RT-latency with my test suite
(with athlon 2200+).

there is another point to be fixed in the reiserfs journal
transaction. then you'll get 0.1ms RT-latency without preemption.

for ext3, these two spots are relevant.

--- linux-2.6.4-8/fs/jbd/commit.c-dist 2004-03-16 23:00:40.000000000 +0100
+++ linux-2.6.4-8/fs/jbd/commit.c 2004-03-18 02:42:41.043448624 +0100
@@ -290,6 +290,9 @@ write_out_data_locked:
commit_transaction->t_sync_datalist = jh;
break;
}
+
+ if (need_resched())
+ break;
} while (jh != last_jh);

if (bufs || need_resched()) {
--- linux-2.6.4-8/fs/ext3/inode.c-dist 2004-03-18 02:33:38.000000000 +0100
+++ linux-2.6.4-8/fs/ext3/inode.c 2004-03-18 02:33:40.000000000 +0100
@@ -1987,6 +1987,7 @@ static void ext3_free_branches(handle_t
if (is_handle_aborted(handle))
return;

+ cond_resched();
if (depth--) {
struct buffer_head *bh;
int addr_per_block = EXT3_ADDR_PER_BLOCK(inode->i_sb);


i think the first one is needed for preemptive kernel, too.
with these patches, also 0.1-0.2ms RT-latency is achieved.


BTW, my measurement tool is found at

http://www.alsa-project.org/~iwai/latencytest-0.5.2.tar.gz


--
Takashi Iwai <tiwai@xxxxxxx> ALSA Developer - www.alsa-project.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/