Re: [PATCH] x86, traps: Fix ist_enter from userspace

From: Andy Lutomirski
Date: Sat Jan 31 2015 - 21:19:16 EST


On Jan 31, 2015 12:18 PM, "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>
> On Sat, Jan 31, 2015 at 05:01:34AM -0800, Andy Lutomirski wrote:
> > context_tracking_user_exit() has no effect if in_interrupt() returns true,
> > so ist_enter() didn't work. Fix it by calling exception_enter(), and thus
> > context_tracking_user_exit(), before incrementing the preempt count.
> >
> > This also adds an assertion that will catch the problem reliably if
> > CONFIG_PROVE_RCU=y to help prevent the bug from being reintroduced.
> >
> > Fixes: 959274753857 x86, traps: Track entry into and exit from IST context
> > Reported-by: Sasha Levin <sasha.levin@xxxxxxxxxx>
> > Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
> > ---
> > arch/x86/kernel/traps.c | 25 ++++++++++++++++++-------
> > 1 file changed, 18 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> > index 7176f84f95a4..c74f2f5652da 100644
> > --- a/arch/x86/kernel/traps.c
> > +++ b/arch/x86/kernel/traps.c
> > @@ -110,15 +110,11 @@ static inline void preempt_conditional_cli(struct pt_regs *regs)
> >
> > enum ctx_state ist_enter(struct pt_regs *regs)
> > {
> > - /*
> > - * We are atomic because we're on the IST stack (or we're on x86_32,
> > - * in which case we still shouldn't schedule.
> > - */
> > - preempt_count_add(HARDIRQ_OFFSET);
> > + enum ctx_state prev_state;
> >
> > if (user_mode_vm(regs)) {
> > /* Other than that, we're just an exception. */
> > - return exception_enter();
> > + prev_state = exception_enter();
> > } else {
> > /*
> > * We might have interrupted pretty much anything. In
> > @@ -127,12 +123,27 @@ enum ctx_state ist_enter(struct pt_regs *regs)
> > * but we need to notify RCU.
> > */
> > rcu_nmi_enter();
> > - return IN_KERNEL; /* the value is irrelevant. */
> > + prev_state = IN_KERNEL; /* the value is irrelevant. */
> > }
> > +
> > + /*
> > + * We are atomic because we're on the IST stack (or we're on x86_32,
> > + * in which case we still shouldn't schedule).
> > + *
> > + * This must be after exception_enter(), because exception_enter()
> > + * won't do anything if in_interrupt() returns true.
> > + */
> > + preempt_count_add(HARDIRQ_OFFSET);
>
> So the reason we need this here is that ist_exit() does not have
> access to the bits tested by user_mode_vm()? Or is the idea to
> avoid an additional test in ist_exit()? (Wondering why this isn't
> in the above "else" clause.)

It's result of Linus's suggestion here:

https://lkml.org/lkml/2014/11/19/1131

The idea is that scheduling in IST context is generally bad, and,
before this change, we wouldn't have warned. By bumping the preempt
count unconditionally, we make it clear that we can't schedule.

For the one exceptional special case, we have ist_begin_non_atomic.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/