Re: [RFC][PATCH v2 09/11] context_tracking,livepatch: Dont disturb NOHZ_FULL

From: Peter Zijlstra
Date: Wed Oct 06 2021 - 05:04:51 EST


On Wed, Oct 06, 2021 at 10:12:17AM +0200, Petr Mladek wrote:

> > @@ -180,8 +186,10 @@ void klp_update_patch_state(struct task_
> > * of func->transition, if klp_ftrace_handler() is called later on
> > * the same CPU. See __klp_disable_patch().
> > */
> > - if (test_and_clear_tsk_thread_flag(task, TIF_PATCH_PENDING))
> > + if (test_tsk_thread_flag(task, TIF_PATCH_PENDING)) {
>
> This would require smp_rmb() here. It will make sure that we will
> read the right @klp_target_state when TIF_PATCH_PENDING is set.

Bah, yes.

> > task->patch_state = READ_ONCE(klp_target_state);
>
> Note that smp_wmb() is not needed here because
> klp_complete_transition() calls klp_synchronize_transition()
> aka synchronize_rcu() before clearing klp_target_state.
> This is why the original code worked.
>
>
> > + clear_tsk_thread_flag(task, TIF_PATCH_PENDING);
> > + }
> >
> > preempt_enable_notrace();
> > }
> > @@ -270,15 +278,30 @@ static int klp_check_and_switch_task(str
> > {
> > int ret;
> >
> > - if (task_curr(task))
> > + if (task_curr(task)) {
> > + /*
> > + * This only succeeds when the task is in NOHZ_FULL user
> > + * mode, the true return value guarantees any kernel entry
> > + * will call klp_update_patch_state().
> > + *
> > + * XXX: ideally we'd simply return 0 here and leave
> > + * TIF_PATCH_PENDING alone, to be fixed up by
> > + * klp_update_patch_state(), except livepatching goes wobbly
> > + * with 'pending' TIF bits on.
> > + */
> > + if (context_tracking_set_cpu_work(task_cpu(task), CT_WORK_KLP))
> > + goto clear;
>
> If I get it correctly, this will clear TIF_PATCH_PENDING immediately
> but task->patch_state = READ_ONCE(klp_target_state) will be
> done later by ct_exit_user_work().
>
> This is a bit problematic:
>
> 1. The global @klp_target_state is set to KLP_UNDEFINED when all
> processes have TIF_PATCH_PENDING is cleared. This is actually
> still fine because func->transition is cleared as well.
> As a result, current->patch_state is ignored in klp_ftrace_handler.
>
> 2. The real problem happens when another livepatch is enabled.
> The global @klp_target_state is set to new value and
> func->transition is set again. In this case, the delayed
> ct_exit_user_work() might assign wrong value that might
> really be used by klp_ftrace_handler().

Urgghh.. totally missed that.

> IMHO, the original solution from v1 was better. We only needed to

It was also terribly broken in other 'fun' ways. See below.

> be careful when updating task->patch_state and clearing
> TIF_PATCH_PENDING to avoid the race.
>
> The following might work:
>
> static int klp_check_and_switch_task(struct task_struct *task, void *arg)
> {
> int ret;
>
> /*
> * Stack is reliable only when the task is not running on any CPU,
> * except for the task running this code.
> */
> if (task_curr(task) && task != current) {
> /*
> * This only succeeds when the task is in NOHZ_FULL user
> * mode. Such a task might be migrated immediately. We
> * only need to be careful to set task->patch_state before
> * clearing TIF_PATCH_PENDING so that the task migrates
> * itself when entring kernel in the meatime.
> */
> if (is_ct_user(task)) {
> klp_update_patch_state(task);
> return 0;
> }
>
> return -EBUSY;
> }
>
> ret = klp_check_stack(task, arg);
> if (ret)
> return ret;
>
> /*
> * The task neither is running on any CPU and nor it can get
> * running. As a result, the ordering is not important and
> * barrier is not needed.
> */
> task->patch_state = klp_target_state;
> clear_tsk_thread_flag(task, TIF_PATCH_PENDING);
>
> return 0;
> }
>
> , where is_ct_user(task) would return true when task is running in
> CONTEXT_USER. If I get the context_tracking API correctly then
> it might be implemeted the following way:

That's not sufficient, you need to tag the remote task with a ct_work
item to also runs klp_update_patch_state(), otherwise the remote CPU can
enter kernel space between checking is_ct_user() and doing
klp_update_patch_state():

CPU0 CPU1

<user>

if (is_ct_user()) // true
<kernel-entry>
// run some kernel code
klp_update_patch_state()
*WHOOPSIE*


So it needs to be something like:


CPU0 CPU1

<user>

if (context_tracking_set_cpu_work(task_cpu(), CT_WORK_KLP))

<kernel-entry>
klp_update_patch_state klp_update_patch_state()


So that CPU0 and CPU1 race to complete klp_update_patch_state() *before*
any regular (!noinstr) code gets run.

Which then means it needs to look something like:

noinstr void klp_update_patch_state(struct task_struct *task)
{
struct thread_info *ti = task_thread_info(task);

preempt_disable_notrace();
if (arch_test_bit(TIF_PATCH_PENDING, (unsigned long *)&ti->flags)) {
/*
* Order loads of TIF_PATCH_PENDING vs klp_target_state.
* See klp_init_transition().
*/
smp_rmb();
task->patch_state = __READ_ONCE(klp_target_state);
/*
* Concurrent against self; must observe updated
* task->patch_state if !TIF_PATCH_PENDING.
*/
smp_mb__before_atomic();
arch_clear_bit(TIF_PATCH_PENDING, (unsigned long *)&ti->flags);
} else {
/*
* Concurrent against self, see smp_mb__before_atomic()
* above.
*/
smp_rmb();
}
preempt_enable_notrace();
}