Re: [PATCH] ring-buffer: Simplify reservation with try_cmpxchg() loop

From: Steven Rostedt
Date: Thu Jan 25 2024 - 17:14:26 EST


On Thu, 25 Jan 2024 16:18:37 -0500
Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:

> >
> > This is how you are able to avoid the "before/after" logic I have, as
> > the race is automatically detected. The least significant bits of the
> > timestamp is ignored for the event delta calculation.
>
> Not quite, as I explained at the beginning of this email. All bits from the
> previous timestamp, including its low bits, are useful to know how many
> overflows happened since the last tsc.

Yes, but it still means updating that timestamp you will compare to doesn't
have the race I have. If the timestamp's upper bits are the same, or are
off by one and the lower bits are higher than the current timestamp, you
don't need to inject. But if the lower bits are higher than the timestamp
or the higehr bits are off by more than one then you do.

The lower bits are a delta against "0" of the current timestamp lower bits.
There's no race in updating those bits as long as the upper bits remain the
same or are off by one and the current timestamp lower bits are lower than
the saved time stamp.

In my case, because the delta is off of the entire timestamp, what I write
into the saved timestamp, all bits matter. And to handle that I need the
before/after timestamps to know if the currently saved timestamp didn't
have a race.

>
> > And if a race
> > happens where the interrupting event saves a later timestamp and comes
> > back here, if the interrupted event writes the older timestamp, it just
> > causes that delta calculation to overflow again and you inject another
> > 64bit timestamp into the buffer.
>
> This part is correct: in the race you describe, we end up with the
> possibility of bringing the last_tsc backwards, which can only cause
> the tracer to use the full 64-bit timestamp when in fact it could use
> the compact representation. But it's rare and should not matter in
> practice.
>
> And by the way this algorithm is designed to work with preemption/migration
> enabled as well, not just interrupts. So the race can come from a thread
> running concurrently on another CPU and it should work as well.
>
> [...]
>
> >
> > Going through a transition of changing it could end up being just as
> > complex. I'm not sure the complexity in that transition is better than
> > the complexity of the current code, as this code has been there for 15
> > years, and I know of at least 2 other projects that depend on this
> > format as is.
>
> I agree with you that it's not clear-cut whether introducing this change
> would be a benefit at this stage considering the extra complexity of
> extending the ABI while keeping backward compatibility.
>
> But it's something we can keep in mind if we ever have to do major ABI
> extensions for other reasons.

Yeah, it's something to think about if we want to use a different format
for something else.

Thanks for the review.

-- Steve