Re: [re: PATCH v2 00/15 - 00/11] dyndbg: add support for writing debug logs to trace

From: jim . cromie
Date: Fri Dec 15 2023 - 22:09:55 EST


On Thu, Dec 14, 2023 at 8:20 AM Łukasz Bartosik <lb@xxxxxxxxxxxx> wrote:
>
> sob., 9 gru 2023 o 01:31 Łukasz Bartosik <lb@xxxxxxxxxxxx> napisał(a):
> >
> > pt., 8 gru 2023 o 01:15 Jim Cromie <jim.cromie@xxxxxxxxx> napisał(a):
> > >
> > > hi Lukas,
> > >
> > > Sorry for the delay, I probably should have split this up.
> > > (also cc'g gregkh)
> > >
> > > Ive been banging on your v2 patchset:
> > > https://lore.kernel.org/lkml/20231130234048.157509-1-lb@xxxxxxxxxxxx/
> > >
> >
> > Jim, thank you for your thorough testing and review.
> >
> > > Things are looking pretty good, a few issues follow. And some patches.
> > >
> >
> > ;)
> >
> > > trivial:
> > >
> > > dyndbg: export _print_hex_dump # squash wo comment
> > > dyndbg: tweak pr_info format s/trace dest/trace_dest/ # greppability squash
> > > dyndbg: disambiguate quoting in a debug msg
> > > dyndbg: fix old BUG_ON in >control parser
> > >
> > > Then:
> > >
> > > dyndbg: change +T:name_terminator to dot
> > > dyndbg: treat comma as a token separator
> > >
> > > 1st allows 2nd, 2nd allows simpler arg-passing, boot-args, etc:
> > >
> > > echo module,test_dynamic_debug,class,L2,+p > /proc/dynamic_debug/control
> > > modprobe test_dynamic_debug dyndbg=class,L2,+mfl
> > >
> > > Given theres no legacy wrt comma, swapping it now with dot seems
> > > better semantically than "dot as token/list separator".
> > >
> > > Aside: /proc/dynamic_debug/control is always there (if configd), even
> > > when <debugfs> isn't mounted. Its more universal, and copy-pastable.
> > >
> > > dyndbg: __skip_spaces - and comma
> > > dyndbg: split multi-query strings with %
> > >
> > > % allows escape-free multi-cmd dyndbg args:
> > >
> > > modprobe test_dynamic_debug \
> > > dyndbg=open,selftest%class,D2_CORE,+T:selftest.mf
> > >
> > > dyndbg: reduce verbose/debug clutter
> > > dyndbg: move lock,unlock into ddebug_change, drop goto - revisit
> > >
> > > Ive just pushed it, I will bump my version here.
> > > To github.com:jimc/linux.git
> > > + 20d113eb6f9a...66fa2e4cb989 lukas-v2.1 -> lukas-v2.1 (forced update)
> > >
>
> hi Jim,
>
> I will squash the following commits to their appropriate peers:
> - dyndbg: export _print_hex_dump
> - dyndbg: tweak pr_info format s/trace dest/trace_dest/
> - dyndbg: change +T:name_terminator to dot
>
> I will also drop completely "dyndbg: move lock,unlock into
> ddebug_change, drop goto" and leave the remaining commits as is.

cool. and yes, late ack.
I got distracted by move of flags-handler to the bottom of the fn.
Pls add note about moving the lock to protect open/close too.

>
> > > SELFTEST
> > >
> > > Ive taken the liberty to write an ad-hoc test script (inline below),
> > > to exersize the parser with legacy command input, and with the new
> > > stuff mentioned above: comma-separated-tokens, %-separated-multi-cmds,
> > > while counting changes to builtin,etc modules, to validate against
> > > expectations.
> > >
> > > The change-count tests achieve some closed-loop testing, but checking
> > > syslog for messages written always seemed too hard/unreliable. Your
> > > private trace-instances work promises a solution, by giving an
> > > observable closed system to build upon.
> > >
> > > I made some attempts to write messages to the trace-buf, but ran out
> > > of clues*tuits. And I encountered a couple more definite problems:
> > >
> >
> > Let me dig through test scripts you created and issues you run into.
> >
> >
> > > 1- modprobe/rmmod lifecycle probs
> > >
> > > Ive coded some blue-sky and not-so-proper "modprobe,+T:it,-T:it,rmmod"
> > > life-cycle scenarios, which can wedge a previously created instance.
> > > Once wedged, I cannot recover. See the test_private_trace{,_2,_3}
> > > functions, and the error_log_ref() following each.
> > >
>
> The tests are very useful. I root caused the failures. Please see below.
>
> > > This brittleness begs a question; should we have auto-open (mapping
> > > new names to available 1-63 trc-ids) ? And non-close ? If it just did
> > > the right thing, particularly on rmmod, it would prevent "misuse".
> > >
>
> I would rather not have auto-open. For me personal argument against it is
> that I usually make a lot of typos and with auto-open if I don't
> verify what I wrote
> I won't know whether logs will be written to the "right" instance or a
> newly created instance
> with typo name. In case of open/close commands error pops up when I
> try to write logs
> to a trace instance which was not opened.

yes. that is a value. auto-open was 1/2 baked speculation

>
> > > I don't think auto-open obsoletes the open (& esp) close commands, but
> > > Id like to see scripted test scenarios using close in combo with the
> > > right /sys/kernel/tracing/* manipulations, to prove its not all just a
> > > fever dream.
> > >
>
> Would you please elaborate what you mean by close in combo with right
> /sys/kernel/tracing/* manipulations ?
>

you detected the furious flapping, didnt you :-)

lemme start with the global-events trace-buf, for analogy/comparison.
to use it one must do both:
echo 1 > /sys/kernel/tracing/events/dyndbg/enable
echo $some_selection +T > /proc/dynamic_debug/control

using +T:private_buffer no longer requires the enable (im guessing)
but are there any other/corresponding setup actions / manipulations ?

if nothing else is needed, when does the instance open ?
after test-dynamic-debug-doprints on +T:private callsites ?
is it detectable from a script ?
probably: if [ -e /sys/kernel/tracing/instances/foo ]

if the instance is already open, I presume it is not cleared ?

any other play-nice-with-other-users stuff ?

conversely, to write a closed-loop test-script,
should the script just delete the instance/$name
after verifying its contents from the last doprints ?



> > > Your expertise in opening, writing to, manipulating & destroying
> > > private (and the global) tracebufs, distilled into new shell funcs,
> > > would be enormously helpful towards demonstrating the private-buffer
> > > use case and its value.
> > >
>
> I will work on adding new tests for trace.

nice!

>
> > > 2- some new flags parse error:
> > >
> > > [ 1273.074535] dyndbg: 32 debug prints in module test_dynamic_debug
> > > [ 1273.075365] dyndbg: module: test_dynamic_debug dyndbg="class,D2_CORE,+T:foo%class,D2_KMS,+T:foo"
> > > [ 1273.076307] dyndbg: query 0: <class,D2_CORE,+T:foo> on module: <test_dynamic_debug>
> > > [ 1273.077068] dyndbg: split into words: "class" "D2_CORE" "+T:foo"
> > > [ 1273.077673] dyndbg: unknown flag 'c'
> > > [ 1273.078033] dyndbg: flags parse failed on 2: +T:foo
> > > [ 1273.078519] dyndbg: query 1: <class,D2_KMS,+T:foo> on module: <test_dynamic_debug>
> > > [ 1273.079266] dyndbg: split into words: "class" "D2_KMS" "+T:foo"
> > > [ 1273.079872] dyndbg: unknown flag '�'
> > > [ 1273.080228] dyndbg: flags parse failed on 2: +T:foo
> > > [ 1273.080716] dyndbg: processed 2 queries, with 0 matches, 2 errs
> > > : 0 matches on =Tl
> > >
> > > I have a suspicion this relates to moving the parse_flags call in
> > > ddebug_query, but I havent dug in.
> > >
> > >
> > > I also have some reservations about the default dest; 1st that it is a
> > > global state var, as noted at bottom of control:
> > >
> > > [root@v6 lx]# ddgrep \\btrace\\b # a better/narrower search-term ?
> > > ...
> > > Default trace destination: foo # add a '#:' prefix to these lines ?
>
> Do you mean

This. it hides the info behind a # so any parser expecting just
callsites is safe.
and the following colon marks it as info for the clever ones.

> #: Default trace destination:
> #: Opened trace instances: foo bar buz

But I would lalso ike the labels optimized for a tight grep, that wont
hit anything else,
and mostly thats easy to use. "trace_dest" almost does it, is a
pretty good mark on the bikeshed.

> > > Opened trace instances: # all values should be on this line
> > >
>
> I will put all opened trace instance names on the same line
>
> > > Then theres the "preset" value, ie each site's dest_id (sorry I forgot
> > > your fieldname). I presume the default would override such a "preset"
> > > (otherwise it would have no effect).
> > >
> > > Is the default set on last open ? or on next use of +T:<foo> ?
> > >
>
> Default trace destination is set on [+-]T:<foo>

unless foo hasnt been opened, Im guessing
does it matter if the last destination used resulted in a match ?

>
> > > In the no-default world, a user/tester would decide how many
> > > trace-instances are needed, and map sets of callsites to them.
> > >
> > > # configure drm-debug streams for this test scenario
> > > cat<<EOF >/proc/dynamic_debug/control
> > > open drm_core
> > > open drm_mix
> > > open driver_1 # we can separate by modname but not drvr-number
> > > open driver_2
> > > class DRM_UT_CORE -T:drm_core # set dest_id, disabled state
>
> This sets default trace destination to drm_core
>
> > > class DRM_UT_CORE +mf # traces dont do prefixing (should it?)
>
> In this case logs will be neither written to syslog nor trace because
> both pT flags are not set.

I think my example fails to illuminate the corner I seek.

> I verified that both output to trace instance and trace events honors
> mf flags (prefixing) when they are set for a given callsite.

Ok. that is good to know. that sounds reducible to a test-fn.

>
> > > # mix KMS & ATOMIC into 1 stream
> > > class DRM_UT_KMS -T:drm_mix
> > > class DRM_UT_ATOMIC -T:drm_mix
> > > EOF
> > >
> > > Then perhaps to turn on/off when needed: (legacy analogue version)
> > >
> > > echo 0x15 > /sys/module/drm/parameters/debug_trace
> > >
> > > With those trace-dest presets honored, the configured drm_core &
> > > drm_mix streams persist. If a later enablement applies the
> > > then-current default trace-dest, it would spoil this scheme.
> > >

so each callsite has a dest, which is set individually when the
callsite is selected
and -T:name'd in the EOF block.

I would *not* want enablements of those callsites to alter the dest
just because
the default-dest were altered in the interim.

I think it would break the way drm.debug is supported, by only
touching the T bit.


> > > Could you elaborate a scenario or 2 where the default behavior does
> > > something specific, and that its the right thing ? I dont understand
> > > it yet.
> > >
>
> I'm a bit lost here. Help me out please. What do you mean by default behavior ?

default behavior is what happens when +T (wo a dest) is done,
after a default dest has been set. w/o a default-dest, it would be just :0

So having a default dest means implying the :default_dest_name.
whatever it is set to,
on whatever callsites are selected, unless the ddcmd gives a name, ie
+T:this_buf

As of last writing, I dont see how the extra implication helps.
It feels like hidden state.

I can squint at it and see a local / loop var, fwiw.

>
> > > OTOH
> > >
> > > One limitation of the above: the drm.debug_trace (posited above) would
> > > turn on all Ts in all those class'd callsites across the whole
> > > subsystem, irrespective of their preconfigured destination. That was
> > > always inherent in drm.debug, but its less than perfect.
> > >
> > > It sort-of defeats the point of doing +T only on the useful callsites.
> > >
> > > And global event buf is also enabled, it might be flood-prone.
> > >
> > > echo 1 > /sys/kernel/tracing/events/dyndbg/enable
> > >
> > > It would help if we could filter on trace-instance names:
> > > (this sounds familiar :-)
> > >
> > > ddcmd module '*' trace_dest drm_mix +T
> > >

I think we need to implement this new keyword selector,
I think it addresses the flooding possible above,
when using both private traces and the global-trace.

but we dont need it yet.

> > > In reality, the dest-id is not even dependent on tracing per-se, it is
> > > a user classification system (in contrast to class <subsys-names>).
> > > It just happens to be tied by +T:name syntax to tracefs.
> > >
> > > No promise about +p:_alt_log_.mflt having meaning, or working.

or not complaining, pr_notice ?


> > >
> > >

I look forward to v3
Jim