Re: [RFC PATCH v3 for 4.15 08/24] Provide cpu_opv system call

From: Thomas Gleixner
Date: Fri Nov 17 2017 - 15:07:58 EST


On Fri, 17 Nov 2017, Andi Kleen wrote:
> > The most straight forward is to have a mechanism which forces everything
> > into the slow path in case of debugging, lack of progress, etc. The slow
>
> That's the abort address, right?

Yes.

> For the generic case the fall back path would require disabling preemption
> unfortunately, for which we don't have a mechanism in user space.
>
> I think that is what Mathieu tried to implement here with this call.

Yes. preempt disabled execution of byte code to make sure that the
transaction succeeds.

But, why is disabling preemption mandatory? If stuff fails due to hitting a
breakpoint or because it retried a gazillion times without progress, then
the abort code can detect that and act accordingly. Pseudo code:

abort:
if (!slowpath_required() &&
!breakpoint_caused_abort() &&
!stall_detected()) {
do_the_normal_abort_postprocessing();
goto retry;
}

lock(slowpath_lock[cpu]);

if (!slowpath_required()) {
unlock(slowpath_lock[cpu]);
goto retry;
}

if (rseq_supported)
set_slow_path();

/* Same code as inside the actual rseq */
do_transaction();

if (rseq_supported)
unset_slow_path();

unlock(slowpath_lock[cpu]);

The only interesting question is how to make sure that all threads on that
CPU see the slowpath required before they execute the commit so they are
forced into the slow path. The simplest thing would be atomics, but that's
what rseq wants to avoid.

I think that this can be solved cleanly with the help of the membarrier
syscall or some variant of that without all that 'yet another byte code
interpreter' mess.

The other question is whether do_transaction() is required to run on that
specific CPU. I don't think so because that magic interpreter operates even
when the required target cpu is offline and with locking in place there is
no reason why running on the target CPU would be required.

Sure, that's going to affect performance, but only for two cases:

1) Debugging. That's completely uninteresting

2) No progress at all. Performance is down the drain anyway, so it does
not matter at all whether you spend a few more cycles or not to
resolve that.

I might be missing something as usual :)

Thanks

tglx