Re: [PATCH v3 5/5] x86/entry/64: Bypass enter_from_user_mode on non-context-tracking boots

From: Frederic Weisbecker
Date: Wed Nov 18 2015 - 19:57:26 EST


On Mon, Nov 16, 2015 at 03:57:05PM -0800, Andy Lutomirski wrote:
> On Mon, Nov 16, 2015 at 2:50 PM, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
> > On Mon, Nov 16, 2015 at 11:10:55AM -0800, Andy Lutomirski wrote:
> >> On Nov 13, 2015 7:26 AM, "Frederic Weisbecker" <fweisbec@xxxxxxxxx> wrote:
> >> >
> >> > On Thu, Nov 12, 2015 at 12:59:04PM -0800, Andy Lutomirski wrote:
> >> > > On CONFIG_CONTEXT_TRACKING kernels that have context tracking
> >> > > disabled at runtime (which includes most distro kernels), we still
> >> > > have the overhead of a call to enter_from_user_mode in interrupt and
> >> > > exception entries.
> >> > >
> >> > > If jump labels are available, this uses the jump label
> >> > > infrastructure to skip the call.
> >> >
> >> > Looks good. But why are we still calling context tracking code on IRQs at all?
> >>
> >> Same reasons as before:
> >>
> >> 1. This way the IRQ exit path is almost completely shared with all the
> >> other exit paths.
> >
> > I'm all for consolidation in general. Unless it brings bad middle states.
>
> The middle state works fine, though. With these patches, the middle
> state should have essentially no performance hit compared to the
> previous state in default configurations.

The current middle state works but introduced a performance hit on IRQs.
But yeah this patchset doesn't make the situation worse.

>
> >
> > If I knew before that I would have to argue endlessly in order to protest against
> > these context tracking changes, I would have NACK'ed the x86 consolidation rework in
> > the state it was while it got merged.
> >
> >>
> >> 2. It combines the checks for which context we were in with what CPL
> >> we entered from.
> >>
> >> Part 2 should be complete across the whole x86 kernel soon once the
> >> 64-bit syscall code gets fixed up.
> >>
> >> We should get rid of the duplication in the irq entry hooks. Want to
> >> help with that?
> >
> > Which one? The duplication against irq_enter() and irq_exit()?
>
> Yes.

The duplication you introduced then :-)

>
> >
> > I think that irq_exit() should be moved to the IRQ very end and perform the
> > final signal/schedule/preempt_schedule_irq() loop. But it requires a bit of
> > rework on all archs in order to do that. This could be done iteratively though.
> >
> >> Presumably we should do the massive remote polling speedup to the nohz code,
> >
> > Hmm, I don't get what you mean here.
> >
>
> Currently (4.4-rc1), when an IRQ hits user mode, here's roughly what we do:
>
> - Tell context tracking that we're in the kernel
> - Switch ct state
> - Wake up RCU
> - Adjust vtime
> - irq_enter
> - Adjust preempt count
> - Wake up RCU
> - Tell vtime accounting that we're in an IRQ
>
> All of the initial stuff should be, in the long term, just a write to
> some variable and a possible barrier.

I think that eventually we may indeed want to merge context tracking and
preempt count concerning kernel and irq tracking. Eventually we can
indeed just write IRQ & KERNEL to context tracking at once on irq entry
and clear that on exit with a special treatment in case of IRQ exit
rescheduling.

The case if RCU is special though, I'm not sure we can merge it with the rest.

Also it means irq_exit() will have to take care of irq exit rescheduling and signals.
I think this is something we can do.

> Whatever CPU is doing
> housekeeping can poll to keep track of user vs system time.

If you're assuming the case where housekeeping will poll on tickless CPUs cputime,
I'm not sure we are going to take that path. This is going to be a lot of overhead.

> The irq_enter stuff, in turn, could either set some variable telling the
> housekeeper that we're in an IRQ or it could continue to directly
> adjust time accounting.
>
> In any event, all of this should be extremely fast, which it currently isn't.

Definetly.

>
> >> and we should also teach enter_from_user_mode to transition directly to IRQ state as
> >> appropriate. Then irq_enter can be much faster.
> >
> > I don't get what you mean here either. You mean calling irq_enter() from enter_from_user_mode()?
> >
>
> No, I mean teaching irq_enter that, on x86 at least, we promise that
> irq_enter is only ever called from CONTEXT_KERNEL so it can do less
> redundant work.

I don't get how making sure that irq_enter() is only called with CONTEXT_KERNEL
is going to imply less redundant work. But doing the irq tracking from context tracking
is certainly going to factorize things.

> Or, even better, we could fold the irq_enter and user->kernel hooks
> into a single context tracking call.

Yeah that makes sense.

Now this all sounds like a lot of core rework just for a feature (nohz/context tracking)
that is used by few users. Lets hope we can agree on a solution that is an improvement for
everyone. For example moving the irq end work on core code sounds like something that would
be desired for everyone.

If we happen to agree on a solution, I'll definetly help and do the improvements on context
tracking subsystem.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/