Re: [PATCH 2/4] tracing/user_events: Introduce multi-format events

From: Beau Belgrave
Date: Tue Jan 30 2024 - 17:44:37 EST


On Tue, Jan 30, 2024 at 01:52:30PM -0500, Steven Rostedt wrote:
> On Tue, 30 Jan 2024 10:05:15 -0800
> Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> wrote:
>
> > On Mon, Jan 29, 2024 at 09:24:07PM -0500, Steven Rostedt wrote:
> > > On Mon, 29 Jan 2024 09:29:07 -0800
> > > Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > > Thanks, yeah ideally we wouldn't use special characters.
> > > >
> > > > I'm not picky about this. However, I did want something that clearly
> > > > allowed a glob pattern to find all versions of a given register name of
> > > > user_events by user programs that record. The dot notation will pull in
> > > > more than expected if dotted namespace style names are used.
> > > >
> > > > An example is "Asserts" and "Asserts.Verbose" from different programs.
> > > > If we tried to find all versions of "Asserts" via glob of "Asserts.*" it
> > > > will pull in "Asserts.Verbose.1" in addition to "Asserts.0".
> > >
> > > Do you prevent brackets in names?
> > >
> >
> > No. However, since brackets have a start and end token that are distinct
> > finding all versions of your event is trivial compared to a single dot.
> >
> > Imagine two events:
> > Asserts
> > Asserts[MyCoolIndex]
> >
> > Resolves to tracepoints of:
> > Asserts:[0]
> > Asserts[MyCoolIndex]:[1]
> >
> > Regardless of brackets in the names, a simple glob of Asserts:\[*\] only
> > finds Asserts:[0]. This is because we have that end bracket in the glob
> > and the full event name including the start bracket.
> >
> > If I register another "version" of Asserts, thne I'll have:
> > Asserts:[0]
> > Asserts[MyCoolIndex]:[1]
> > Asserts:[2]
> >
> > The glob of Asserts:\[*\] will return both:
> > Asserts:[0]
> > Asserts:[2]
>
> But what if you had registered "Asserts:[MyCoolIndex]:[1]"
>

Good point, the above would still require a regex type pattern to not
get pulled in.

> Do you prevent colons?
>

No, nothing is prevented at this point.

It seems we could either prevent certain characters to make it easier or
define a good regex that we should document.

I'm leaning toward just doing a simple suffix and documenting the regex
well.

> >
> > At this point the program can either record all versions or scan further
> > to find which version of Asserts is wanted.
> >
> > > >
> > > > While a glob of "Asserts.[0-9]" works when the unique ID is 0-9, it
> > > > doesn't work if the number is higher, like 128. If we ever decide to
> > > > change the ID from an integer to say hex to save space, these globs
> > > > would break.
> > > >
> > > > Is there some scheme that fits the C-variable name that addresses the
> > > > above scenarios? Brackets gave me a simple glob that seemed to prevent a
> > > > lot of this ("Asserts.\[*\]" in this case).
> > >
> > > Prevent a lot of what? I'm not sure what your example here is.
> > >
> >
> > I'll try again :)
> >
> > We have 2 events registered via user_events:
> > Asserts
> > Asserts.Verbose
> >
> > Using dot notation these would result in tracepoints of:
> > user_events_multi/Asserts.0
> > user_events_multi/Asserts.Verbose.1
> >
> > Using bracket notation these would result in tracepoints of:
> > user_events_multi/Asserts:[0]
> > user_events_multi/Asserts.Verbose:[1]
> >
> > A recording program only wants to enable the Asserts tracepoint. It does
> > not want to record the Asserts.Verbose tracepoint.
> >
> > The program must find the right tracepoint by scanning tracefs under the
> > user_events_multi system.
> >
> > A single dot suffix does not allow a simple glob to be used. The glob
> > Asserts.* will return both Asserts.0 and Asserts.Verbose.1.
> >
> > A simple glob of Asserts:\[*\] will only find Asserts:[0], it will not
> > find Asserts.Verbose:[1].
> >
> > We could just use brackets and not have the colon (Asserts[0] in this
> > case). But brackets are still special for bash.
>
> Are these shell scripts or programs. I use regex in programs all the time.
> And if you have shell scripts, use awk or something.
>

They could be both. In our case, it is a program.

> Unless you prevent something from being added, I don't see the protection.
>

Yeah, it just makes it way less likely. Given that, I'm starting to lean
toward just documenting the regex well and not trying to get fancy.

> >
> > > >
> > > > Are we confident that we always want to represent the ID as a base-10
> > > > integer vs a base-16 integer? The suffix will be ABI to ensure recording
> > > > programs can find their events easily.
> > >
> > > Is there a difference to what we choose?
> > >
> >
> > If a simple glob of event_name:\[*\] cannot be used, then we must document
> > what the suffix format is, so an appropriate regex can be created. If we
> > start with base-10 then later move to base-16 we will break existing regex
> > patterns on the recording side.
> >
> > I prefer, and have in this series, a base-16 output since it saves on
> > the tracepoint name size.
>
> I honestly don't care which base you use. So if you want to use base 16,
> I'm fine with that.
>
> >
> > Either way we go, we need to define how recording programs should find
> > the events they care about. So we must be very clear, IMHO, about the
> > format of the tracepoint names in our documentation.
> >
> > I personally think recording programs are likely to get this wrong
> > without proper guidance.
> >
>
> Agreed.
>
> -- Steve

Thanks,
-Beau