Re: [PATCH] 3 performance tweaks

Date: Thu May 25 2000 - 06:18:30 EST

Mark Hemment writes:
> Some of my goals are;
> o Return L1 cache "hot" objects when possible.
> o Lightweight allocation/release paths, with small L1 footprint.
> o SMP fastpath takes no locks.

Mostly my goals are overlapping to yours.

Using the per-cpu slab-cache, locking isn't needed to alloc memory if
the cache hits, and I hoped it could utilize L1/2 caches more.

Though it works well, I found a flaw which nullify CPU cache

The problem is, "hot" data doesn't directly mean L1 cache or L2 cache
hit on a SMP system, at least in the current linux interrupt handling

For example:
Suppose data is transfered from the system to outside by a network.
Firstly the data is copied from a user space to a malloc'ed kernel
space on CPU-A, then a device is initiated to send. When the transfer
completes, an interrupt happens to CPU-X and the data is free'ed on

The problem exists because X is not always A.

In this scenario, CPU-X returns the data to its cache. But if A != X,
contents of the data are not on the cache of CPU-X at all.

In a x86 SMP system, an IO-APIC equally distributes interrupts among
CPUs, the amount of cached data is not biased to a specific CPU. So,
it is likely to hit slab-cache, but unfortunately the just returned
"hot" data may not be on the L1/L2 caches.
 We can say, data is circulating among CPUs with no meaningfull
information transfer.

I think this scenario happens frequently on typical servers such as a
web-server or a file-server.

To break this scenario, there are two solutions, I think.

1. run the bh_handler on the initial CPU.
2. return the collected memory to the original slab-cache.

I haven't implemented neither of them yet.

Does anybody have ideas?

Computer Systems Laboratory, Fujitsu Labs.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to Please read the FAQ at

This archive was generated by hypermail 2b29 : Wed May 31 2000 - 21:00:13 EST