Re: CONFIG_PREEMPT and server workloads

From: Andrew Morton
Date: Thu Mar 18 2004 - 14:03:34 EST


Takashi Iwai <tiwai@xxxxxxx> wrote:
>
> > These fixes from Takashi Iwai brings 2.6 back in line with 2.4, I
> > suggested to use EIP dumps from interrupts to get the hotspots, he
> > promptly used the RTC for that and he could fixup all the spots, great
> > job he did since now we've a very low worst case sched latency in 2.6
> > too:
> >
> > --- linux/fs/mpage.c-dist 2004-03-10 16:26:54.293647478 +0100
> > +++ linux/fs/mpage.c 2004-03-10 16:27:07.405673634 +0100
> > @@ -695,6 +695,7 @@ mpage_writepages(struct address_space *m
> > unlock_page(page);
> > }
> > page_cache_release(page);
> > + cond_resched();
> > spin_lock(&mapping->page_lock);
> > }
> > /*
>
> the above one is the major source of RT-latency.

This one's fine.

> for ext3, these two spots are relevant.
>
> --- linux-2.6.4-8/fs/jbd/commit.c-dist 2004-03-16 23:00:40.000000000 +0100
> +++ linux-2.6.4-8/fs/jbd/commit.c 2004-03-18 02:42:41.043448624 +0100
> @@ -290,6 +290,9 @@ write_out_data_locked:
> commit_transaction->t_sync_datalist = jh;
> break;
> }
> +
> + if (need_resched())
> + break;
> } while (jh != last_jh);
>
> if (bufs || need_resched()) {

This one I need to think about. Perhaps we can remove the yield point a
few lines above.

One needs to be really careful with the lock-dropping trick - there are
weird situations in which the kernel fails to make any forward progress.
I've been meaning to do another round of latency tuneups for ages, so I'll
check this one out, thanks.

There's also the SMP problem: this CPU could be spinning on a lock with
need_resched() true, but the other CPU is hanging on the lock for ages
because its need_resched() is false. In the 2.4 ll patch I solved that via
the scary hack of broadcasting a reschedule instruction to all CPUs if an
rt-prio task just became runnable. In 2.6-preempt we use
preempt_spin_lock().

But in 2.6 non-preempt we have no solution to this, so worst-case
scheduling latencies on 2.6 SMP CONFIG_PREEMPT=n are high.


Last time I looked the worst-case latency is in fact over in the ext3
checkpoint code. It's under spinlock and tricky to fix.

> --- linux-2.6.4-8/fs/ext3/inode.c-dist 2004-03-18 02:33:38.000000000 +0100
> +++ linux-2.6.4-8/fs/ext3/inode.c 2004-03-18 02:33:40.000000000 +0100
> @@ -1987,6 +1987,7 @@ static void ext3_free_branches(handle_t
> if (is_handle_aborted(handle))
> return;
>
> + cond_resched();
> if (depth--) {
> struct buffer_head *bh;
> int addr_per_block = EXT3_ADDR_PER_BLOCK(inode->i_sb);
>

This one's OK.

btw, several months ago we discussed the idea of adding a sysctl to the
ALSA drivers which would cause a dump_stack() to be triggered if the audio
ISR detected a sound underrun.

This would be a very useful feature, because it increases the number of
low-latency developers from O(2) to O(lots). If some user is complaining
of underruns we can just ask them to turn on the sysctl and we get a trace
pointing at the culprit code.

And believe me, we need the coverage. There are all sorts of weird code
paths which were found during the development of the 2.4 low-latency patch.
i2c drivers, fbdev drivers, all sorts of things which you and I don't
test.

I know it's a matter of

if (sysctl_is_set)
dump_stack();

in snd_pcm_update_hw_ptr_post() somewhere, but my brain burst when working
out the ALSA sysctl architecture.

Is this something you could add please?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/