Re: [PATCH v2 00/21] libtracefs: Introducing tracefs_sql() to create synthetice events with an SQL line

From: Ahmed S. Darwish
Date: Wed Aug 04 2021 - 07:57:10 EST


Hi Steven,

On Tue, Aug 03, 2021 , Steven Rostedt wrote:
>
> Major update since v1:
>
> It was brought to my attention that the man page did not state that the
> SQL syntax required JOIN .. ON in the statement. That is, they were not
> optional. I decided to fix that. But not by updating the man page, but by
> actually making JOIN .. ON optional. If you leave that out, the synthetic
> event will not be completely created, but it will have enough to create
> a histogram. See the bottom (HISTOGRAMS) for more info!
>
...
>
> HISTOGRAMS
>
> Simple SQL statements without the JOIN ON may also be used, which will
> create a histogram instead. When doing this, the struct tracefs_hist
> descriptor can be retrieved from the returned synthetic event descriptor via
> the tracefs_synth_get_start_hist(3).
>

Thanks a lot! Actually, I meant going even one step further ;)

I was imagining something like the following:

$ trace-cmd sql-shell # OR

$ perf tracefs-sql-shell

Welcome to tracefs SQL shell...

> SELECT PNAME(common_pid),msr,val
FROM write_msr
WHERE msr=72 OR msr=2096

.-------------------------------------------.
| PNAME(common_pid) | msr | val |
|---------------------|------ |-------------|
| qemu-system-x86 | 0x48 | 0 |
| qemu-system-x86 | 0x48 | 0 |
| qemu-system-x86 | 0x48 | 0 |
| kworker/u16:2 | 0x830 | 0x1000008fb |
| .... | .... | ..... |
+-------------------------------------------+

> SELECT MAX(end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) AS MaxSystemLatency_us,
PNAME(common_pid)
FROM sched_waking AS start JOIN sched_switch AS end
ON start.pid = stop.next_pid

.-------------------------------------------.
| MaxSystemLatency_us | PNAME(common_pid) |
|---------------------|---------------------|
| 350 | cyclictest |
+-------------------------------------------+

> SELECT (end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) AS latency,
PNAME(common_pid), PRIO(common_pid)
FROM sched_waking AS start JOIN sched_switch AS end
ON start.pid = stop.next_pid
ORDER BY latency DESC
LIMIT 5

.----------------------------------------------------------.
| Latency | PNAME(common_pid) | PRIO(common_pid) |
|---------|-----------------------------|------------------|
| 829 | cyclictest | SCHED_FIFO:98 |
| 400 | cyclictest | SCHED_FIFO:98 |
| 192 | pulseaudio-rt | SCHED_RR:48 |
| 30 | firefox | SCHED_OTHER:0:0 |
| 10 | kworker/0:0H-events_highpri | SCHED_OTHER:0:-20|
+----------------------------------------------------------+

> SELECT (end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) as MaxIRQLatency_us
FROM irq_disable as start JOIN irq_enable as end
ON start.common_pid = end.common_pid,
start.parent_offs == end.parent_offs
ORDER BY max_irq_disable
LIMIT 1

.------------------.
| MaxIRQLatency_us |
|------------------|
| 37 |
+------------------+

And so on....

The idea was that since the community already picked SQL as a
higher-level tracing language, why hard-code the SQL language with
synthetic events and histograms?

The language can alredy offer something *way more generic*, out of the
box, while still covering the desired special cases.

We can support the standard SQL aggregate functions (e.g., MAX(), MIN(),
SUM(), COUNT(), DISTINCT(), AVG(), etc.) + some kernel-specific
functions (e.g., PROCESS_NAME(), PROCESS_PRIO(), USECS(), etc.) + the
standard SQL keyworkds like ORDER BY, LIMIT, DESC, ASC, etc. This would
offer some nice friendly competition to BPF tracing, while still being a
(relatively) simple *query-only* language.

I'm not sure if you would be OK with this, but I thought a proposal
won't hurt :)

I can also write some patches on top of this series if you are OK with
the principle in general.

Kind regards,

--
Ahmed S. Darwish
Linutronix GmbH