Re: OS Masters

From: Sean Hunter (seanh@sportingbet.com)
Date: Thu Apr 20 2000 - 02:08:01 EST


On Thu, Apr 20, 2000 at 02:19:26AM +0000, Andrew Morton wrote:
> - Write nasty torture-test programs, run them.
>
> - Change all the resource allocation layers such as kmalloc(),
> get_free_page(), etc so they return failures under controlled,
> _repeatable_ conditions. Run the kernel. Report lots of bugs.
>
> - Do the same with I/O functions: for example, change a block device
> driver so that it reports errors under controlled conditions.

I fancy having a go at these. Any tips?
 
> - The reason timing races are so hard to find is that they are often
> very short. Find a clever way to generically increase the duration
> of timing windows so that bugs are exposed. For example, you could
> use the check-memory-usage hooks (described below) to make one CPU
> run really slowly - make it do a udelay(1000) every few instructions,
> see what happens.

A way I use to catch races in userspace is to step the debugger into
the race window, leave it halted there, and run the program normally
in a seperate shell. This sort of thing means you can be guarranteed
to hit the race every time. I wonder if we can't use the same
principle in kernel-space? Have a special macro that we can insert
into a suspected race window that will stop one processor and wait
until another one hits the race window (or am I just talking
bollocks?)

> - Make the kernel's core functions self-checking. Step one is to
> work out what all the subtle, secret preconditions are for all the
> functions (the global lock should be held, interrupts should be
> disabled, the sk_buff_head should be locked, etc). Step two is to
> fill the kernel with assertions which check that these rules are
> really being observed.
>
> - Deadlock detection: if a spinlock can be claimed from both process
> and interrupt context then make sure that the process-context claim
> is always done with spin_lock_irqsave() or with interrupts disabled.
> (Something like that...) Lots of different scenarios here.

Something like the two above has interested me for a while, and I see
them as being very similar in implementation (a set of ASSERT macros
that print an oops if the conditions are not met). Not very glamorous
work, but bound to find some bugs. What assertions would we need, and
where do they go? Is there a set of general rules about where and
when certain locks must be held and whether interrupts should be on or
off etc?
 
Sean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Apr 23 2000 - 21:00:16 EST