Re: [PATCH v3 00/22] dyndbg: add support for writing debug logs to trace

From: jim . cromie
Date: Tue Jan 02 2024 - 14:14:00 EST


ok, to catch up here,
Lukas & I have been nudging the test script forward,
some of this is settled..

On Thu, Dec 28, 2023 at 5:00 PM Łukasz Bartosik <lb@xxxxxxxxxxxx> wrote:
>
> śr., 27 gru 2023 o 05:23 <jim.cromie@xxxxxxxxx> napisał(a):
> >
> > re-adding gregkh, who fell off the cc's again.
> > he is one of our upstreams.
> >
>
> hi Jim,
>
> It's good to hear from you.
>
> Thanks. I missed to add Greg.
>
> > On Fri, Dec 22, 2023 at 6:51 PM Łukasz Bartosik <lb@xxxxxxxxxxxx> wrote:
> > >
> > > Add support for writing debug logs to trace events and trace instances.
> > > The rationale behing this feature is to be able to redirect debug logs
> > > (per each callsite individually) to trace to aid in debugging. The debug
> > > logs output to trace can be enabled with T flag. Additionally the T flag
> > > accepts trace destination which can be provided to the T flag after ":".
> > > The trace destination field is used to determine where debug logs will be
> > > written.
> > >
> > > In order to redirect callsite debug log to a trace instance it has to be
> > > initialized first with a newly introduced "open" command. For example,
> > > "usb" trace instance in <debugfs>/tracing/instances/ can be initialized
> > > by issuing the command:
> > >
> > > localhost ~ # echo "open usb" >
> > > <debugfs>/dynamic_debug/control
> > >
> > > If the trace instance <debugfs>/tracing/instances/usb already exists
> > > then it will be reused otherwise new usbcore instance will be created.
> > > Maximum 63 trace instances can be "opened". The trace instance name can
> > > include any of ^\w+ and underscore characters. There is also a special
> > > name "0" reserved for writing debug logs to trace prdbg and devdbg events.
> > > The reserved name "0" does not require to be "opened" before use.
> >
> > theres implication here that p only works when :0.
> > Is that intended ?
> > it seems wrong, or at least unnecessary.
> >
>
> I don't see clearly the implication that p only works when :0.
> Which statement implies this to you ? Please keep in mind that English
> is not my native.

heh - Im not sure now - its mostly me probing for unseen inconsistencies.

> > In fact, theres no specific reason to exclude +p:trcbuf.mf,
> > esp if we keep default-trace-buf
> > since +pTfml could have that meaning implicitly.
> > We can call it allowed and "unimplemented until proven useful" on +p
> >
> > +p completely independent of +T is a simplicity,
> > lets keep that until proven useful otherwise.
> >
> > >
> > > To enable writing usbcore module debug logs to the "usb" trace instance
> > > opened above a user can issue the command:
> > >
> > > localhost ~ # echo "module usbcore =pT:usb" >
> > > <debugfs>/dynamic_debug/control
> > >
> > > Please note that output of debug logs to syslog (p flag) and trace (T flag)
> > > can be independently enabled/disabled for each callsite. Therefore the
> > > above command will enable writing of debug logs not only to the trace
> > > instance "usb" but also to syslog.
> > >
> > > When trace destination is followed by another flag the next flag has to
> > > be preeceded with ".", for example
> > >
> > > localhost ~ # echo "module usbcore =pT:usb.l" >
> > > <debugfs>/dynamic_debug/control
> > >
> > > To simplify processing trace destination can be omitted when default
> > > trace destination is set, for example the command
> > >
> > > localhost ~ # echo "module usbcore =pTl" >
> > > <debugfs>/dynamic_debug/control
> > >
> > > will use default trace destination. If default trace destination is not
> > > set the above command will fail. The default trace destination is set
> > > when a command with any of the flags [-+=] and explicitly provided trace
> > > destination matches at least one callsite, for example the command
> > >
> > > localhost ~ # echo "module usbcore -T:usb" >
> > > <debugfs>/dynamic_debug/control
> > >
> > > will disable trace for all usbcore module callsites (if any was enabled)
> > > and will set "usb" instance as default trace destination. To enable writing
> > > thunderbolt module debug logs to the "usb" trace instance as well a user
> > > can issue the command:
> > >
> > > localhost ~ # echo "module thunderbolt =T" >
> > > <debugfs>/dynamic_debug/control
> > >
> > > Since default trace destination was previously set therefore "usb" instance
> > > name does not have to be explicitly provided.
> > >
> > > When no callsite writes debug logs to a trace instance then it can be
> > > released (its reference counter decrased) with the "close" command.
> > decreased
> >
>
> Ack
>
> > > Closing a trace instance make it unavailable for dynamic debug and also
> > makes
>
> Ack
>
> > > allows a user to delete such a trace instance at convenient time later
> > > with rmdir command. For example when "usb" instance is not used anymore
> > > a user can issue the command:
> > >
> > > localhost ~ # echo "close usb" >
> > > <debugfs>/dynamic_debug/control
> > > and then to delete it with:
> > >
> > > localhost ~ # rmdir <debugfs>/tracing/instances/
> > >
> > > To enable writing usbcore module debug logs to trace dyndbg:prdbg and
> > > dyndbg:devdbg events user can issue the command:
> > >
> > > localhost ~ # echo "module usbcore =T:0" >
> > > <debugfs>/dynamic_debug/control
> > >
> > > Then dyndbg trace events can be for example captured with the command:
> > >
> > > localhost ~ # trace-cmd start -e dyndbg
> > >
> > > And displayed with the command:
> > >
> > > localhost ~ # trace-cmd show
> > >
> >
> > is there a recipe to show private traces ? prolly worth a mention here.
> >
>
> Trace-cmd stat shows among other things open trace instances. I will add it.
>
> >
> > >
> > >
> > > Jim, I made a few changes to the test script dd-selftest.rc you created
> > > and I also added new test scenarios. You can find the entire patchset
> > > with the test script changes on top here
> > > https://chromium.googlesource.com/chromiumos/third_party/kernel/+log/refs/sandbox/ukaszb/dyndbg_trace_v3
> > > I wonder where would be the proper place to store the test script
> > > (upstream, your github.com:jimc/linux.git)
> > > Do you have a suggestion for that ?
> > >
> >
> > I got it (your HEAD), renamed and put it in:
> > tools/testing/selftests/dynamic_debug/
> > and copied & modified Makefile and config from ../users/
> > I added your SOB, that seems proper.
> >
>
> Great, thanks ;)
>
> > nice additions. and I like the colors. maybe they belong in lib.mk,
> > but thats 4 later.
> >
> > and tweaked the modules / DUTs to *hopefully* get stable callsite counts
> > from the tests on them, in virtually any usable config.
> > DUTs: file init/main, module mm_init, and module test_dynamic_debug (a
> > config constraint)
> > The ref-counts are taken from a default virtme-ng -k config on my x86-64 box
> >
> > I now pass tests up to:
> >
> > [root@v6 wk-proof]# ./tools/testing/selftests/dynamic_debug/dyndbg_selftest.sh
> > ...
> > # TEST_PRIVATE_TRACE_6
> > [ 1009.296557] dyndbg: read 3 bytes from userspace
> > [ 1009.296876] dyndbg: query 0: <=_> on module: <*>
> > [ 1009.297171] dyndbg: processed 1 queries, with 1559 matches, 0 errs
> > [ 1009.311621] dyndbg: read 39 bytes from userspace
> > [ 1009.312032] dyndbg: query 0: <open
> > A_bit_lengthy_trace_instance_name> on module: <*>
> > [ 1009.312569] dyndbg: instance is already opened
> > name:A_bit_lengthy_trace_instance_name
> > [ 1009.313063] dyndbg: processed 1 queries, with 0 matches, 1 errs
> > : ./tools/testing/selftests/dynamic_debug/dyndbg_selftest.sh:454
> > ddcmd() expected to exit with code 0
> > Error: 'File exists'
> >
> > it seems invocation dependent
> > I suspect 2 possibilities:
> > 1- invoking as shell-script, repeatedly vs vng -e script-name
> > (one-shot per boot)
> > 2- virtme-ng's minimal-default config has no CONFIG_USB.
> > this changes post-conditions of the script, affecting preconditions
> > of the next run.
> >
>
> I guess you must have enabled CONFIG_USB in your setup otherwise
> dyndbg_selftest.sh should fail earlier in TEST_PRIVATE_TRACE_5 ?
>

Its usually on, but I have at least one config,
IIRC from default vng -vk, that left it off.

whats the shell syntax to check CONFIG_MODULES ?

ATM, basic-tests uses "module main" (maybe worth narrowing)
comma-terminator-tests uses "module params"
test-percent-splitting uses test_dynamic_debug to demo a useful case,
so it needs MODULES
so does the self-start-various-end tests

the private-trace tests appear to use a mix of
test_dynamic-debug, & module params,
maybe that can be materially improved, I dont know.

IIRC for builtins I think we're now using 'file kernel/
> > 2 means that we should replace usbcore with some other always *builtin* module.
> >
>
> Sounds good. Do you have a module candidate ?



>
> I'm also concerned that we might need to validate dyndbg_selftest.sh
> against specific kernel version because
> I took the dyndbg_selftest.sh from lukas-v3.0 branch and I had to do
> the following changes to make it work on kernel v6.6:
>

Yes, I hit this one too.
I decided to avoid using mm_init for this reason.
(without actually looking whether it was a config-based variance,
which I assumed, or a commit-based one)
I chose 'module params' instead,
despite the need to escape the brackets in the search:

check_match_ct '\[params\]' 4

> --- a/tools/testing/selftests/dynamic_debug/dyndbg_selftest.sh
> +++ b/tools/testing/selftests/dynamic_debug/dyndbg_selftest.sh
> @@ -166,12 +166,12 @@ EOF
> function comma_terminator_tests {
> echo -e "${GREEN}# COMMA_TERMINATOR_TESTS ${NC}"
> # try combos of space & comma
> - check_match_ct mm_init 5
> + check_match_ct mm_init 4
> ddcmd module,mm_init,=_ # commas as spaces
> ddcmd module,mm_init,+mpf # turn on non-classed
> - check_match_ct =pmf 5
> + check_match_ct =pmf 4
> ddcmd ,module ,, , mm_init, -p # extra commas & spaces
> - check_match_ct =mf 5
> + check_match_ct =mf 4
> ddcmd =_
> }
> function test_percent_splitting {
> @@ -350,7 +350,7 @@ function test_private_trace_3 {
> ddcmd class D2_CORE +T:foo \; class D2_KMS +T:foo
> ddcmd "class,D2_CORE,+T:foo;,class,D2_KMS,+T:foo"
> ddcmd class,D2_CORE,+T:foo\;class,D2_KMS,+T:foo
> - check_match_ct =T 2 -v
> + check_match_ct =T 3 -v
> check_match_ct =Tl 0
> check_match_ct =Tmf 0
> echo 1 >/sys/kernel/tracing/tracing_on
> @@ -379,7 +379,7 @@ function test_private_trace_4 {
> modprobe test_dynamic_debug dyndbg=class,D2_CORE,+T:foo%class,D2_KMS,+T:foo
> check_match_ct =Tl 0
> check_match_ct =Tmf 0
> - check_match_ct =T 2
> + check_match_ct =T 3
> echo 1 >/sys/kernel/tracing/tracing_on
>
> What kernel version did you use for the testing ?

turns out Im on top of -rc7
(I suppose I should pay closer attention :-)


> > I chose module params, on the idea that its builtin, but also invoked later,
> > when modules are modprobed with params, so its callsites can be enabled &
> > then invoked via do_prints, to test for writes to the private trace-buf.
> >
> > I've pushed it to lukas-v3.0 at jimc.github.com/linux
> >
> > Would you look at err on _6, above,
> > and figure out if its insufficiently robust to test preconditions, or
> > something else ?
>
> The following change fixes failure in TEST_PRIVATE_TRACE_6:
> @@ -455,7 +455,7 @@ function test_private_trace_6 {
> ddcmd open $name
> is_trace_instance_opened $name
> check_trace_instance_dir $name 1
> - ddcmd "module module =T:$name.l" -v
> + ddcmd "module usbcore =T:$name.l"
> check_match_ct =T:A_bit_lengthy_trace....l 164
>
> When dyndbg_selftest.sh completes with success which is announced by the line
> # Done on: Thu Dec 28 23:04:17 CET 2023
> then it leaves dynamic debug and trace in a state where next
> dyndbg_selftest.sh run should also complete with success.

yes .this works well for vng -e <script-path> runs

> But when any of the tests fails manual intervention or reboot is
> required at the moment.
> Do you think we should have a cleanup function and call it whenever
> dyndbg_selftest.sh fails in order
> to prepare dynamic debug and trace for the next run of dyndbg_selftest.sh ?
>

yes, I think so.
In part to prove that we can "un-wedge" whatever's wrong w/o a reboot.
and/or a rmmod, modprobe cycle.
self_end_* are doing various unwinds,
but a full-powered cleanup fn, with diagnostics and hints
would be a good addition, with demonstration value.



> > and also:
> > rename test_private_trace_N() to better names
> > - and/or comments to intent & pattern of use
> > more tweaks invited in its commit-msg
> >
>
> I will rename the functions and add description of the tests.
>
> > since cycle_test_problem() is not actually a problem, whats a better name ?
> > _not_best_practices ?
> >
>
> Ack
>
> > I think the final test script enhancement needed is (maybe separately):
> > private-buffer-expected-write-verification after do_prints,
> > and around modprobe test_dynamic_debug dyndbg=$multicmd
> > subject to 2 below.
> >
>
> Ack

we definitely have this in part, spread around the dozen or so functions

>
> > > TODOs:
> > > - update dynamic debug documentation,
> > > - finalize discussion related to default trace destination
> > > (whether we want it at all, if yes then conclude on the details)
> > >
> >
> > good list. before docs;
> >
> > wrt default-trace, at least in some details, it seems extra rules to know.
> > for example, if the default were set on successful open,
> > rather than successful rule application (iirc), then it would read a lot
> > like "with" or "using" blocks, esp with <<EOX form
> >
> > with that new form, it might be pretty expressive:
> >
> > echo <<YMMV > /proc/dynamic_debug/control
> > open sesame # ie with
> > module magic_carpet function forward +Tfml
> > module magic_lamp function rub_vigorously +Tfmt
> > open voicetrace
> > module voice2ai function text2cmdprompt +T
> > module undo function oh_crap +T
> > YMMV
> >

I was hoping this example was stupid enough to goad someone
into responding (with a better one). Alas no.

How might DRM want to use this ?

one practical example is to segregate hi-rate drm_dbg events to a
separate buffer,
so theyre not cluttering your trace-cmd view until you want them.

But thats a little contrived, cuz vblank events are already trace-events
into the global buffer, they dont have drm_dbg's for this.


echo <<DRM_TRACE_BLK > /proc/dynamic_debug/control

open vblank_stream # segregate noisy/hi-vol stuff

module amdgpu format *_vblank +:vblank_stream # one
way to select them
module amdgpu format "[DML]:DML_" +:vblank_strem # pretty
much same set ?

open drm_generic
class DRM_CORE +:drm_generic.mf
class DRM_DRIVER_ +:drm_generic.tmf

DRM_TRACE_BLK



> > I think set-default-on-good-rule-applied is flawed:
> > it relies on the most complex expression in the grammar to be correct
> > and without spelling errors, where module unknown is
> > OK grammatically, but nonsense.
>
> in case of "unknown" module default trace destination will not be
> updated because
> the code checks for nfound before updating default destination:
> if (nfound)
> update_tr_default_dst(modifiers);
>
> > set-default-on-open is simpler to explain and test separately.
> >
>
> Actually I think we discussed set-default-on-open some time ago (in
> the beginning).

we could have - I miss a lot :-}
anyway youve named our choices:

> But to sum up the default trace destination there were 3 directions
> proposed/discussed:
> - set-default-on-open,
> - set-default-on-good-rule-applied,

note in my DRM_TRC_BLK example above, I made a subtle error.
DRM_DRIVER_ doesnt exist cuz of the trailing _.
non-matching keywords are silently skipped. (not just class)
the rule wont apply.
If Id misspelled it as DRM_DRIVERS it would be just as quiet.

So now the good/bad status creates an order-dependence;
if the rule is 1st in a set, good/bad chooses the new/old default-dest.
and this affects any following rule, unless that has an explicit dest.
This sounds complicated.

set-default-on-open doesnt have these problems:
each rule gets the same setting

> - preset-default-per-each-callsite with -T:trace_dest_name,

this doesnt have a default-setting, it keeps what it had

>
> All 3 pass information explicitly but IMHO
> preset-default-per-each-callsite is most confusing as user would have
> to remember what was preset for a given callsite when later passing a
> T flag for the same callsite.
>

thats true, but for:
We could display the current destination in control, independent of the flags
(with the :0. default elided)

init/main.c:1117 [main]initcall_blacklist =pT:startup_trc.mf
"blacklisting initcall %s\n"
init/main.c:1156 [main]initcall_blacklisted =T:startup_trc.mf
"initcall %s blacklisted\n"
init/main.c:1351 [main]run_init_process =:startup_trc.mf " with arguments:\n"

1st line writes to syslog and to private startup_trc, both with
"module:function:" dynamic-prefix
2nd only to the private trace, with same prefix
3rd is not enabled, but has its dest preset to the private trace,
again with the prefix.

And the part I glossed over:
a new "trace_name" keyword would allow selection on the composed set
of sites with that preset.

ddcmd trace_name drm_generic +Ts # enable the set which is
pointing at drm_generic




> Then if it comes to set-default-on-open vs
> set-default-on-good-rule-applied I don't have strong opinion but I
> would
> lean towards set-default-on-open because in this case it is pretty
> straightforward, I mean last successful open sets default trace
> destination.

of the 1st 2 options, I lean strongly to on-open.

To test the limits, what if we renamed "open" to "using" ?
using could:
a) force a dest update, from any previous state.
b) only upgrade from :0. dont change :foo.

a is unnecessary, since +T:dest. explicitly does that.
b is more subtle, but more useful.

>
> If you also hesitate which one to choose maybe we should start with
> very simple approach like that +T always defaults to trace events (:0)
> ?

that makes sense.
keep-presets-per-callsite does need keyword to be complete.

>
> > We should probably consider modifying the continue-looping behavior
> > of ddebug_exec_queries, depending upon the return-code.
> > open, close errors could reasonably warrant different treatment,
> > like terminating the mult-cmd loop prematurely on open/close errs.
> > Esp if we keep the set-default-on-good-rule-applied
> > What do you think ?

this is still open, and worth a review, but minor.

> Let's first conclude on default destination.
>
> > I'm inclined to reword a commit-msg or 2, drop RFC, obsolete comments etc.
> >
> > I can do that in a lukas-v3.1, or as in-thread responses,
> > do you have a preference ?
>
> Please go ahead and make changes in the lukas-v3.1 branch. I find it
> very convenient. I will just grab your changes from the branch.

excellent, it does feel pretty smooth.
is your repo also public, or did I just miss the clonable url?

I'll write some trace-thing for howto in the next few days,
(in particular, I'll stay out of dyndbg_selftest.sh, I think its your turn :-)

theres also s/coma/comma/ in the comma-separator patch.


thats all I got now,
Thanks!!
Jim