Just bringing up a latency issue I've noticed recently.
In or around 2.6.14-rc4 some changes were made to have the call to
kmem_cache_free() from file_free() in the Linux kernel be deferred, running
as a tasklet via file_free_rcu(), rather than running kmem_cache_free()
right from file_free() directly.
I've noticed that rcu_process_callbacks() can take quite a while to run
now that it routinely calls file_free_rcu() to run kmem_cache_free().
This can make the cpu unavailable for 100's of usec on 1GHz machines, with
or without preemption configured on (much of this path is non-preemptible).
This can result in some unpredictable periods of fairly long cpu latency,
such as when a thread is waiting to be woken by an interrupt handler on a
'now quiet' cpu. Changing file_free() to call kmem_cache_free() directly
completely eliminates this unexpected latency.