Re: [PATCH] 3 performance tweaks

From: kumon@flab.fujitsu.co.jp
Date: Thu May 25 2000 - 06:18:30 EST

Next message: Dave Jones: "Re: -O2 vs -O3"
Previous message: almesber@lrc.di.epfl.ch: "Re: ramfs: mounting ramfs as / at boot time"
In reply to: Mark Hemment: "Re: [PATCH] 3 performance tweaks"
Next in thread: Jamie Lokier: "Re: [PATCH] 3 performance tweaks"
Reply: Jamie Lokier: "Re: [PATCH] 3 performance tweaks"
Reply: Manfred Spraul: "Re: [PATCH] 3 performance tweaks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Mark Hemment writes:
> Some of my goals are;
> o Return L1 cache "hot" objects when possible.
> o Lightweight allocation/release paths, with small L1 footprint.
> o SMP fastpath takes no locks.

Mostly my goals are overlapping to yours.

Using the per-cpu slab-cache, locking isn't needed to alloc memory if
the cache hits, and I hoped it could utilize L1/2 caches more.

Though it works well, I found a flaw which nullify CPU cache
effectiveness.

The problem is, "hot" data doesn't directly mean L1 cache or L2 cache
hit on a SMP system, at least in the current linux interrupt handling
scheme.

For example:
Suppose data is transfered from the system to outside by a network.
Firstly the data is copied from a user space to a malloc'ed kernel
space on CPU-A, then a device is initiated to send. When the transfer
completes, an interrupt happens to CPU-X and the data is free'ed on
CPU-X.

The problem exists because X is not always A.

In this scenario, CPU-X returns the data to its cache. But if A != X,
contents of the data are not on the cache of CPU-X at all.

In a x86 SMP system, an IO-APIC equally distributes interrupts among
CPUs, the amount of cached data is not biased to a specific CPU. So,
it is likely to hit slab-cache, but unfortunately the just returned
"hot" data may not be on the L1/L2 caches.
We can say, data is circulating among CPUs with no meaningfull
information transfer.

I think this scenario happens frequently on typical servers such as a
web-server or a file-server.

To break this scenario, there are two solutions, I think.

1. run the bh_handler on the initial CPU.
2. return the collected memory to the original slab-cache.

I haven't implemented neither of them yet.

Does anybody have ideas?

-- Computer Systems Laboratory, Fujitsu Labs. kumon@flab.fujitsu.co.jp

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/

Next message: Dave Jones: "Re: -O2 vs -O3"
Previous message: almesber@lrc.di.epfl.ch: "Re: ramfs: mounting ramfs as / at boot time"
In reply to: Mark Hemment: "Re: [PATCH] 3 performance tweaks"
Next in thread: Jamie Lokier: "Re: [PATCH] 3 performance tweaks"
Reply: Jamie Lokier: "Re: [PATCH] 3 performance tweaks"
Reply: Manfred Spraul: "Re: [PATCH] 3 performance tweaks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed May 31 2000 - 21:00:13 EST