Re: [PATCH V3 04/10] x86/pks: Preserve the PKRS MSR on context switch

From: Thomas Gleixner
Date: Thu Dec 17 2020 - 09:51:49 EST


On Fri, Nov 06 2020 at 15:29, ira weiny wrote:
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -43,6 +43,7 @@
> #include <asm/io_bitmap.h>
> #include <asm/proto.h>
> #include <asm/frame.h>
> +#include <asm/pkeys_common.h>
>
> #include "process.h"
>
> @@ -187,6 +188,27 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
> return ret;
> }
>
> +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
> +DECLARE_PER_CPU(u32, pkrs_cache);
> +static inline void pks_init_task(struct task_struct *tsk)

First of all. I asked several times now not to glue stuff onto a
function without a newline inbetween. It's unreadable.

But what's worse is that the declaration of pkrs_cache which is global
is in a C file and not in a header. And pkrs_cache is not even used in
this file. So what?

> +{
> + /* New tasks get the most restrictive PKRS value */
> + tsk->thread.saved_pkrs = INIT_PKRS_VALUE;
> +}
> +static inline void pks_sched_in(void)

Newline between functions. It's fine for stubs, but not for a real implementation.

> diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
> index d1dfe743e79f..76a62419c446 100644
> --- a/arch/x86/mm/pkeys.c
> +++ b/arch/x86/mm/pkeys.c
> @@ -231,3 +231,34 @@ u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags)
>
> return pk_reg;
> }
> +
> +DEFINE_PER_CPU(u32, pkrs_cache);

Again, why is this global?

> +void write_pkrs(u32 new_pkrs)
> +{
> + u32 *pkrs;
> +
> + if (!static_cpu_has(X86_FEATURE_PKS))
> + return;
> +
> + pkrs = get_cpu_ptr(&pkrs_cache);

So this is called from various places including schedule and also from
the low level entry/exit code. Why do we need to have an extra
preempt_disable/enable() there via get/put_cpu_ptr()?

Just because performance in those code paths does not matter?

> + if (*pkrs != new_pkrs) {
> + *pkrs = new_pkrs;
> + wrmsrl(MSR_IA32_PKRS, new_pkrs);
> + }
> + put_cpu_ptr(pkrs);

Now back to the context switch:

> @@ -644,6 +668,8 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
>
> if ((tifp ^ tifn) & _TIF_SLD)
> switch_to_sld(tifn);
> +
> + pks_sched_in();
> }

How is this supposed to work?

switch_to() {
....
switch_to_extra() {
....
if (unlikely(next_tif & _TIF_WORK_CTXSW_NEXT ||
prev_tif & _TIF_WORK_CTXSW_PREV))
__switch_to_xtra(prev, next);

I.e. __switch_to_xtra() is only invoked when the above condition is
true, which is not guaranteed at all.

While I have to admit that I dropped the ball on the update for the
entry patch, I'm not too sorry about it anymore when looking at this.

Are you still sure that this is ready for merging?

Thanks,

tglx