Beta-test of sysklogd with loadable module debug support.

G.W. Wettstein (greg@wind.rmcc.com)
Wed, 1 May 1996 23:45:47 CDT


Good day to everyone. Hopefully this note finds the week going well
for everyone following Linux development in general and Linux Kernel
development in particular.

We now seem to have concluded Winter 95/96. The development of
sysklogd has been closely paralleled with an onslaught of troublesome
metereological phenomenon in the upper Midwest. The blizards which I
recounted during earlier development releases of sysklogd gave rise to
some rather troublesome flooding. I have finally been able to turn
off the pumps which were diligently keeping water from consuming my
house, shop, barn and home Linux network.

To celebrate this event I would like to extend an invitation to all
kernel/linux developers to begin testing the newest beta version of
sysklogd. The sources are currently labeled v1.3-pl2. My goal is to
release a new version of sysklogd in conjunction with 1.4/2.0 or
whatever Linus decides to call the final release of the current
development kernel.

In terms of bug fixes my current development tree contains all the
fixes which have been contributed to the currently released version of
sysklogd. I am calling for beta testers primarily to test some major
new functionality in klogd. The klogd daemon now provides support for
resolution of addresses which occur in loadable kernel modules.

The addition of this functionality was spurred by a discussion about a
month and a half ago of how to to properly debug faults in loadable
modules. Since none of the solutions seemed to provide really clean
support I decided to try and do this through klogd. Hopefully I have
produced a reasonably clean solution to the problem.

There are bound to be plenty of bugs and feature changes so I wanted
to get the sources out for a general beta-test before I move forward
with further development. Anyone wishing to test this should drop me
an e-mail note to that effect. In return I will forward two source
patches which should drop in cleanly on top of the sysklogd 1.3 source
tree which is available on sunsite, tsx-11 and everyplace else that
mirrors these sites. The total size of the patches is approximately
60K.

I am most interested in having sysklogd tested at sites which make
extensive use of loadable modules. Particularly interesting are sites
which use kerneld and which have significant load/unload activity.

With the impending release of 1.4/2.0 I think the loadable module
support in klogd is extremely important. I tend to be a fan of
statically compiled kernels but this is strictly a personal prejudice
and heavily influenced by the support role that our Linux workstations
subsume in the Cancer Center. The emphasis on modularization and the
popularity of kerneld indicate to me that there will be intensive use
of modules in the next kernel release.

The downside of modules is that they do produce a much more dynamic
kernel environment than is presented by a statically compiled kernel.
I am not sure that we have fully seen the potential for bugs and
subtle interactions in kernels which are heavily dependent on modules
for normal functional operations. My hope is that klogd with module
support will be able to provide much more effective debugging and
troubleshooting of these extremely dynamic kernel environments.

Before closing this note I just wanted to discuss a few important
points regarding support for debugging of loadable modules.

The first caveat that is important is influenced by the
Dr. G.W. Wettstein Theorem on debugging which is roughly summarized by
the following:

"Operating system theory suggests that the optimum time to be
doing exotic things is not when the kernel is coming unwound
like a 4 dollar pocketwatch."

Quite seriously the support for address resolution with loadable
modules requires some substantial mucking with the kernel. The very
nature of the loadable module environment implies that we must in
effect take an instantaneous snapshot of the kernel at the time a
protection fault occurs. If things are really coming unglued the
reliability of this is always a continual concern. As has been stated
before there is no really good solution for this.

Another troublesome problem is deciding when to build/update the
kernel symbol table. At execution time the klogd daemon reads the
static system map and employs a crude heuristic to deal with symbols
from loadable modules. Each kernel message line is scanned for the
Oops: string and the presence of this string triggers the load/reload
of information from the internal kernel symbol tables. Not elegant by
any means but useful as an initial starting point for insuring that we
are generating logs consistent with the state of the kernel at
exception time.

What really needs to be done to make this more effective and/or
foolproof is to incorporate some type of callback so that klogd can
register with the kernel and be notified when a module load/unload has
occurred. I suspect that this is readily doable with the messaging
facility provided to support kerneld. Since I haven't had time to
figure out how all this works I would be very receptive to patches
from someone who knows more about this end of the kernel. The only
disadvantage to this approach is that it would require kerneld support
to be compiled into the kernel.

Since testing all this can be somewhat troublesome I have written a
small loadable module which I have added to the sysklogd source tree.
This module allows the generation of kernel protection faults at will
and has been useful in the development and debugging of the current
klogd sources. The following is an example of a protection fault
which I generated using this module and which has been processed by
klogd:

---------------------------------------------------------------------------
May 1 12:40:59 blizard kernel: klogd 1.3-2, log source = ksyslog started.
May 1 12:41:01 blizard kernel: Loaded 3548 symbols from /boot/System.map.
May 1 12:41:01 blizard kernel: Symbols match kernel version.
May 1 15:31:52 blizard kernel: oops: Module initilization.
May 1 15:31:52 blizard kernel: oops: Registering symbols.
May 1 15:33:28 blizard kernel: Called oops_open.
May 1 15:33:28 blizard kernel: Called oops_ioctl.
May 1 15:33:28 blizard kernel: Cmd: 1, Arg: 0
May 1 15:33:28 blizard kernel: Unable to handle kernel paging request at virtual address f15e97cc
May 1 15:33:28 blizard kernel: current->tss.cr3 = 0051b000, xr3 = 0051b000
May 1 15:33:28 blizard kernel: *pde = 00000000
May 1 15:33:28 blizard kernel: Oops: 0002
May 1 15:33:28 blizard kernel: CPU: 0
May 1 15:33:28 blizard kernel: EIP: 0010:[oops:_oops+16/3868]
May 1 15:33:28 blizard kernel: EFLAGS: 00010212
May 1 15:33:28 blizard kernel: eax: 315e97cc ebx: 00083680 ecx: 001be13b edx: 007b8414
May 1 15:33:28 blizard kernel: esi: 00000000 edi: bffffdad ebp: 0052cf90 esp: 0052cf8c
May 1 15:33:28 blizard kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
May 1 15:33:28 blizard kernel: Process oops_test (pid: 234, process nr: 24, stackpage=0052c000)
May 1 15:33:28 blizard kernel: Stack: 315e97cc 0052cf98 010070b4 bffffeb4 0012e0ca 00366690 00083680 00000001
May 1 15:33:28 blizard kernel: 00000000 002fd810 bffffee0 0010a791 00000003 00000001 00000000 bffffee0
May 1 15:33:28 blizard kernel: bffffdad bffffeb4 ffffffda ffff002b 0010002b 0000002b 0000002b 00000036
May 1 15:33:28 blizard kernel: Call Trace: [oops:_oops_ioctl+48/80] [_sys_ioctl+254/268] [_system_call+89/160]
May 1 15:33:28 blizard kernel: Code: c7 00 05 00 00 00 eb 08 90 90 90 90 90 90 90 90 89 ec 5d c3
---------------------------------------------------------------------------

Thats about it. If anyone is interested in test-firing this stuff
please drop me a note. I would also be interested in any comments or
suggestions that the group may have. Have a pleasant remainder of the
week.

As always,
Dr. G.W. Wettstein Oncology Research Div. Computing Facility
Roger Maris Cancer Center INTERNET: greg@wind.rmcc.com
820 4th St. N.
Fargo, ND 58122
Phone: 701-234-7556
------------------------------------------------------------------------------
"Prioritization is a favorite management buzzword. What it really means
is stuff that isn't going to get done."
-- C.J. Peters, MD
Chief, Special Pathogens Branch, CDC.