Re: [PATCH v9 4/9] cgroup: cgroup v2 freezer

From: Roman Gushchin
Date: Tue Apr 02 2019 - 18:09:08 EST


Hi Oleg,


Thank you very much for your time! Really appreciate it.

I'll rebase the patchset, address your comments and send v10 soon-ish.

Please, find some notes below.

On Tue, Apr 02, 2019 at 06:02:41PM +0200, Oleg Nesterov wrote:
> Hi Roman,
>
> let me apologize again for the huge delay.
>
> I see nothing really wrong in this version, so no objections from me.
>
> However, 4/9 doesn't apply, so it seems you will need to make v10 anyway
> to adapt these changes to the recent changes in kernel/signal.c ;)
>
> Just a couple of minor nits below...
>
> On 03/16, Roman Gushchin wrote:
> >
> > + * If always_leave is not set, and the cgroup is freezing,
> > + * we're racing with the cgroup freezing. In this case, we don't
> > + * drop the frozen counter to avoid a transient switch to
> > + * the unfrozen state. To make sure that the task won't go
> > + * to the userspace before reaching the signal handler loop,
> > + * let's set TIF_SIGPENDING flag.
> > + */
> > +void cgroup_leave_frozen(bool always_leave)
> > +{
> > + struct cgroup *cgrp;
> > +
> > + spin_lock_irq(&css_set_lock);
> > + cgrp = task_dfl_cgroup(current);
> > + if (always_leave || !test_bit(CGRP_FREEZE, &cgrp->flags)) {
> > + cgroup_dec_frozen_cnt(cgrp);
> > + cgroup_update_frozen(cgrp);
> > + WARN_ON_ONCE(!current->frozen);
> > + current->frozen = false;
> > + } else {
> > + set_tsk_thread_flag(current, TIF_SIGPENDING);
>
> The setting of TIF_SIGPENDING looks unnecessary and even not correct; because
> this flag must not be updated without ->siglock held (even if "set" is more
> or less safe).
>
> If JOBCTL_TRAP_FREEZE is already set, then TIF_SIGPENDING must be set too.
>
> Otherwise set_tsk_thread_flag(TIF_SIGPENDING) can't help because the task can
> do recalc_sigpending() at any moment.
>
> In particular, get_signal() does dequeue_signal()->recalc_sigpending() right
> after cgroup_leave_frozen(), so I fail to understand why do we need to set
> TIF_SIGPENDING.

So this "else" part is only for vfork path. What I'm trying to do, is to
avoid flipping the cgroup state from frozen to unfrozen and back to frozen.

You're right, TIF_SIGPENDING isn't enough, we need proper cgroup_freeze_task()
with some additional synchronization.

>
>
> > @@ -912,6 +912,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
> > tsk->fail_nth = 0;
> > #endif
> >
> > +#ifdef CONFIG_CGROUPS
> > + tsk->frozen = 0;
> > +#endif
>
> Hmm, do we really need this? How can a cgroup_task_frozen() task call
> copy_process() ?

We shouldn't, but let me double check it.

>
> > +static void do_freezer_trap(void)
> > + __releases(&current->sighand->siglock)
> > +{
> > + /*
> > + * If a fatal signal is pending, there is no way back for the process,
> > + * so let it escape from the freezer trap and exit.
> > + * If the task has been frozen, cgroup_leave_frozen() will be invoked
> > + * to update the cgroup state, if necessary.
> > + */
> > + if (fatal_signal_pending(current)) {
> > + current->jobctl &= ~JOBCTL_TRAP_FREEZE;
> > + spin_unlock_irq(&current->sighand->siglock);
> > + return;
> > + }
> > +
> > + /*
> > + * If there are other trap bits pending except JOBCTL_TRAP_FREEZE,
> > + * let's make another loop to give it a chance to be handled.
> > + * In any case, we'll return back.
> > + */
> > + if (((current->jobctl & (JOBCTL_PENDING_MASK | JOBCTL_TRAP_FREEZE)) !=
> > + JOBCTL_TRAP_FREEZE) || fatal_signal_pending(current)) {
> ^^^^^^^^^^^^^^^^^^^^
>
> We have already checked fatal_signal_pending() at the start?

Yeah, I'll remove the second call. Thanks!

>
> And in fact, you can probably remove fatal_signal_pending() altogether...
> Note that with recent changes get_signal() does
>
> if (signal_group_exit(signal)) {
> ksig->info.si_signo = signr = SIGKILL;
> sigdelset(&current->pending.signal, SIGKILL);
> recalc_sigpending();
> goto fatal;
> }
>
> before the main loop, so afaics fatal_signal_pending() == T in do_freezer_trap()
> is simply impossible. This means that you can't clear JOBCTL_TRAP_FREEZE, but
> this is probably fine... if not, you can add jobctl &= ~JOBCTL_TRAP_FREEZE into
> the "if (signal_group_exit(signal))" above.

I'll try.

>
> > @@ -2401,12 +2453,27 @@ bool get_signal(struct ksignal *ksig)
> > do_signal_stop(0))
> > goto relock;
> >
> > - if (unlikely(current->jobctl & JOBCTL_TRAP_MASK)) {
> > - do_jobctl_trap();
> > - spin_unlock_irq(&sighand->siglock);
> > + if (unlikely(current->jobctl &
> > + (JOBCTL_TRAP_MASK | JOBCTL_TRAP_FREEZE))) {
> > + if (current->jobctl & JOBCTL_TRAP_MASK) {
> > + do_jobctl_trap();
> > + spin_unlock_irq(&sighand->siglock);
> > + } else if (current->jobctl & JOBCTL_TRAP_FREEZE)
> > + do_freezer_trap();
> > +
> > goto relock;
> > }
> >
> > + /*
> > + * If the task is leaving the frozen state, let's update
> > + * cgroup counters and reset the frozen bit.
> > + */
> > + if (unlikely(cgroup_task_frozen(current))) {
> > + spin_unlock_irq(&sighand->siglock);
> > + cgroup_leave_frozen(true);
> > + spin_lock_irq(&sighand->siglock);
>
> I'd suggest to do "goto relock" rather than spin_lock_irq(&sighand->siglock).
> To ensure we can't miss SIGKILL which can come right after we drop siglock,
> note again the new signal_group_exit() check above.

Sure, will switch to "goto relock".

Thank you!

Roman