Re: [PATCH 2/2] jump_label: refine placement of static_keys

From: Eric Dumazet
Date: Wed Nov 10 2021 - 12:43:57 EST


On Wed, Nov 10, 2021 at 9:06 AM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
>
> On Wed, 10 Nov 2021 at 16:22, Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> >
> > On Wed, Nov 10, 2021 at 2:24 AM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
> > >
> > > On Wed, 10 Nov 2021 at 09:36, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Nov 09, 2021 at 05:09:06PM -0800, Eric Dumazet wrote:
> > > > > From: Eric Dumazet <edumazet@xxxxxxxxxx>
> > > > >
> > > > > With CONFIG_JUMP_LABEL=y, "struct static_key" content is only
> > > > > used for the control path.
> > > > >
> > > > > Marking them __read_mostly is only needed when CONFIG_JUMP_LABEL=n.
> > > > > Otherwise we place them out of the way to increase data locality.
> > > > >
> > > > > This patch adds __static_key to centralize this new policy.
> > > > >
> > > > > Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
> > > > > ---
> > > > > arch/x86/kvm/lapic.c | 4 ++--
> > > > > arch/x86/kvm/x86.c | 2 +-
> > > > > include/linux/jump_label.h | 25 +++++++++++++++++--------
> > > > > kernel/events/core.c | 2 +-
> > > > > kernel/sched/fair.c | 2 +-
> > > > > net/core/dev.c | 8 ++++----
> > > > > net/netfilter/core.c | 2 +-
> > > > > net/netfilter/x_tables.c | 2 +-
> > > > > 8 files changed, 28 insertions(+), 19 deletions(-)
> > > > >
> > > >
> > > > Hurmph, it's a bit cumbersome to always have to add this __static_key
> > > > attribute to every definition, and in fact you seem to have missed some.
> > > >
> > > > Would something like:
> > > >
> > > > typedef struct static_key __static_key static_key_t;
> > > >
> > > > work? I forever seem to forget the exact things you can make a typedef
> > > > do :/
> > >
> > > No, that doesn't work. Section placement is an attribute of the symbol
> > > not of its type. So we'll need to macro'ify this.
> >
> > Yes, this is also why I chose a short __static_key (initially I was
> > using something more descriptive but longer)
> >
> > >
> > > But I'm not sure I understand why we need different policies here.
> > > Static keys are inherently __read_mostly (unless they are not writable
> > > to begin with), so keeping them all together in one place in the
> > > binary should be sufficient, no?
> >
> > It is not optimal for CONFIG_JUMP_LABEL=n cases.
> >
> > For instance, networking will prefer having rps_needed / rfs_needed in
> > the same cache lines than other hot read_mostly stuff,
> > instead of being far away in other locations.
> >
> > ffffffff830e0f80 D dev_weight_tx_bias
> > ffffffff830e0f84 D dev_rx_weight
> > ffffffff830e0f88 D dev_tx_weight
> > ffffffff830e0f8c D gro_normal_batch
> > ffffffff830e0f90 D rps_sock_flow_table
> > ffffffff830e0f98 D rps_cpu_mask
> > ffffffff830e0f9c D rps_needed
> > ffffffff830e0fa0 D rfs_needed
> > ffffffff830e0fa4 D netdev_flow_limit_table_len
> > ffffffff830e0fa8 d netif_napi_add.__print_once
> > ffffffff830e0fac D netdev_unregister_timeout_secs
> > ffffffff830e0fb0 D ptype_base
> >
> >
> > When CONFIG_JUMP_LABEL=y, rps_needed/xps_needed being in a remote
> > location is a win because it 'saves' 32 bytes than can be used better
>
> I understand that you want the key out of the way for
> CONFIG_JUMP_LABEL=n, but the question was why we shouldn't do that
> unconditionally. If we put all the keys together in a section, they
> will only share cachelines with each other.
>
> Also, what is the performance impact on a real world use case of this change?

Yes, this matters for low latency stuff, mostly.

For CONFIG_JUMP_LABEL=n, I suggest we do not change the current layout,
there is no need to. I do not want to risk performance regressions for
no good reason.

Unless you have something in mind _requiring_ all these atomic_t being
grouped together ?