Re: [PATCHSET x86/core/percpu] improve the first percpu chunk allocation

From: Tejun Heo
Date: Tue Feb 24 2009 - 08:28:19 EST

Next message: Jiri Slaby: "Re: quirk_usb_disable_ehci takes 2x 1.5s on boot"
Previous message: Bob Copeland: "Re: [Bug #12490] ath5k related kernel panic in 2.6.29-rc1"
In reply to: Ingo Molnar: "Re: [PATCHSET x86/core/percpu] improve the first percpu chunkallocation"
Next in thread: Ingo Molnar: "Re: [PATCHSET x86/core/percpu] improve the first percpu chunkallocation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello, Ingo.

Ingo Molnar wrote:
> It's not an optimization, it's a pessimisation :)

Hmmm... big word. Looking up pessimisation... Ah, okay, it's from
pessimistic.

> Please read what i wrote to you. We want the percpu static and
> dynamic areas to be _one and the same thing_. (With just the
> different that static allocations have a handy compile-time
> offset shortcut - but the access is still the same.)
>
> Right now, with your latest code we still have this:
>
> * Use this to get to a cpu's version of the per-cpu object
> * dynamically allocated. Non-atomic access to the current CPU's
> * version should probably be combined with get_cpu()/put_cpu().
> */
> #define per_cpu_ptr(ptr, cpu) SHIFT_PERCPU_PTR((ptr), per_cpu_offset((cpu)))
>
> This slows down per_cpu_ptr() and makes the dynamic percpu case
> a second-class citizen because most actual usages are for the
> current CPU, still have to go via the per_cpu_offset()
> indirection.

Heh... I suppose this is why you and I are keeping disagreeing.
Currently, __my_cpu_offset is defined as percpu_read(this_cpu_off) and
__get_cpu_var() is defined as (*SHIFT_PERCPU_PTR(&per_cpu_var(var),
__my_cpu_offset), so our static access is now basically *per_cpu_ptr().

If per_cpu_ptr() is second class citizen, get_cpu_var() is too. :-)
So, there's nothing more indirect about per_cpu_ptr() compared to
get_cpu_var() anymore.

> We cannot do that optimization due to the NUMA and SMP
> assymetry. If NUMA and SMP had the same linear structure, as i
> suggested we do, we could do it.

No no no, there's no difference whatsoever. Either I'm glossly
misunderstanding something or you're because I really cannot see any
difference between static and dynamic ones except for whether the
offset itself is static or not.

What's missing is unification of static and dynamic accessors and thus
the faster accessors - percpu_read() and friends - for dynamic ones.
This will be the next round of patches.

> Currently you rely on per_cpu_offset() indirection basically as
> a soft-TLB entry covering all dynamic allocations. That sucks.
>
> Ok?

IIUC, the per_cpu_offset() indirection stems from %gs addressing
restriction. We can't teach gcc about it and so the percpu_read() and
friends. Come on, our static percpu variable uses per_cpu_offset()
too.

If my reality seems to be disassociated from other's more than it
usually is, please feel free to enlighten me. :-)

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jiri Slaby: "Re: quirk_usb_disable_ehci takes 2x 1.5s on boot"
Previous message: Bob Copeland: "Re: [Bug #12490] ath5k related kernel panic in 2.6.29-rc1"
In reply to: Ingo Molnar: "Re: [PATCHSET x86/core/percpu] improve the first percpu chunkallocation"
Next in thread: Ingo Molnar: "Re: [PATCHSET x86/core/percpu] improve the first percpu chunkallocation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]