Re: [PATCH 2/4] tracing/user_events: Introduce multi-format events

From: Beau Belgrave
Date: Tue Jan 30 2024 - 13:14:51 EST


On Tue, Jan 30, 2024 at 11:12:22PM +0900, Masami Hiramatsu wrote:
> On Mon, 29 Jan 2024 09:29:07 -0800
> Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> wrote:
>
> > On Fri, Jan 26, 2024 at 03:04:45PM -0500, Steven Rostedt wrote:
> > > On Fri, 26 Jan 2024 11:10:07 -0800
> > > Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > > > OK, so the each different event has suffixed name. But this will
> > > > > introduce non C-variable name.
> > > > >
> > > > > Steve, do you think your library can handle these symbols? It will
> > > > > be something like "event:[1]" as the event name.
> > > > > Personally I like "event.1" style. (of course we need to ensure the
> > > > > user given event name is NOT including such suffix numbers)
> > > > >
> > > >
> > > > Just to clarify around events including a suffix number. This is why
> > > > multi-events use "user_events_multi" system name and the single-events
> > > > using just "user_events".
> > > >
> > > > Even if a user program did include a suffix, the suffix would still get
> > > > appended. An example is "test" vs "test:[0]" using multi-format would
> > > > result in two tracepoints ("test:[0]" and "test:[0]:[1]" respectively
> > > > (assuming these are the first multi-events on the system).
> > > >
> > > > I'm with you, we really don't want any spoofing or squatting possible.
> > > > By using different system names and always appending the suffix I
> > > > believe covers this.
> > > >
> > > > Looking forward to hearing Steven's thoughts on this as well.
> > >
> > > I'm leaning towards Masami's suggestion to use dots, as that won't conflict
> > > with special characters from bash, as '[' and ']' do.
> > >
> >
> > Thanks, yeah ideally we wouldn't use special characters.
> >
> > I'm not picky about this. However, I did want something that clearly
> > allowed a glob pattern to find all versions of a given register name of
> > user_events by user programs that record. The dot notation will pull in
> > more than expected if dotted namespace style names are used.
> >
> > An example is "Asserts" and "Asserts.Verbose" from different programs.
> > If we tried to find all versions of "Asserts" via glob of "Asserts.*" it
> > will pull in "Asserts.Verbose.1" in addition to "Asserts.0".
>
> If we use dot for the suffix number, we can prohibit user to use it
> for their name. They still can use '_' (or change the group name?)

We could, however, we have user_event integration in OpenTelemetry and
I'm unsure if we should really try to restrict names. We'll also at some
point have libside integration, which might not have the same
restrictions on the user-tracer side as the kernel-tracer side.

I'm trying to restrict the user_event group name from changing outside
of an eventual tracer namespace. I'd like for each container to inherit
a tracer namespace long-term which decides what the actual group name
will be instead of users self-selecting names to prevent squatting or
spoofing of events.

> I just concerned that the name can be parsed by existing tools. Since
> ':' is used as a separator for group and event name in some case (e.g.
> tracefs "set_event" is using, so trace-cmd and perf is using it.)
>

Good point.

What about just event_name[unique_id]? IE: Drop the colon.

Brackets are still special in bash, but it would prevent simple glob
patterns from matching to incorrect tracepoints under user_events_multi.

> > While a glob of "Asserts.[0-9]" works when the unique ID is 0-9, it
> > doesn't work if the number is higher, like 128. If we ever decide to
> > change the ID from an integer to say hex to save space, these globs
> > would break.
>
> Hmm, why can't we use regexp?

We can use regex, but we'll need to agree the suffix format. We won't be
able to change it after that point. I'd prefer a base-16/Hex suffix
either in brackets or a simple dot.

> And if we limits the number of events up to 1000 for each same-name event
> we can use fixed numbers, like Assets.[0-9][0-9][0-9]
>

I'm always wrong when I guess how many events programs will end up
using. Folks always surprise me. I'd rather have a solution that scales
to the max number of tracepoints allowed on the system (currently 16-bit
max value).

Thanks,
-Beau

> Thank you,
>
> >
> > Is there some scheme that fits the C-variable name that addresses the
> > above scenarios? Brackets gave me a simple glob that seemed to prevent a
> > lot of this ("Asserts.\[*\]" in this case).
> >
> > Are we confident that we always want to represent the ID as a base-10
> > integer vs a base-16 integer? The suffix will be ABI to ensure recording
> > programs can find their events easily.
> >
> > Thanks,
> > -Beau
> >
> > > -- Steve
>
>
> --
> Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>