Re: core files

Kenneth Albanowski (kjahds@kjahds.com)
Thu, 31 Dec 1998 12:30:09 -0500 (EST)


On Thu, 31 Dec 1998, Albert D. Cahalan wrote:

> Goal: start a debugger on a live process, with all the process
> relationships intact and file descriptors still open. The process
> should be as intact as if you had attached a debugger yourself
> right before the crash.
>
> The general plan is to let users have this feature effective for all
> processes, not just ones expected to crash. That means zero overhead
> is a requirement. The OS must pass standards compliance tests with the
> feature operating. Nothing can interfere with process relationships
> or signal handling
>
> There is a clear need for kernel support here.

Thank you, yes, this states the points I wanted to make.

> Proposed implementation:
>
> Provide debug info via a World-accessable character device.
> On a per-UID basis, a crash monitor watches for trouble.
> If something crashes:
> 1. dying process sleeps
> 2. message passed to crash monitor
> 3. crash monitor can say "dump /tmp/joe-1.core" or start ddd or ...

I'd add a mechanism to ptrace to trigger an in-kernel core dump, perhaps
with a specified file name (to get back to that), or perhaps even add a
new core dump approach that would return the core file to a user buffer,
allowing the core file to be stored remotely. (I'm very interested in this
idea for an embedded design.)

Making the character device world accessible might not be quite right,
though, unless the kernel allows multiple connections, and automatically
gives each connected daemon from the one with the most authority (root
perms) to the least (user perms) a chance at taking the uid, in turn.

It's probably simpler to have a root daemon that uses /etc/debugconf,
~user/debugconf, etc., to figure out what processes get which treatment.

> The crash monitor is totally the user's choice. It could have
> debugging ability, but most versions would rely on a separate
> debugger. It could look up the program in an RPM or DEB database,
> then offer to email an accurate bug report. It could offer to
> unpack or download source code for you. It could offer to hide
> the bug for a moment so that you might be able to save your work.

Yes. Once the daemon gets the uid (and presumably has sufficient
permissions), it can be responsible for triggering any sort of debugger,
ranging from gdb or a core dump, to something resembling Dr. Watson, or
even to something resembling a gdb remote stub, allowing the process to be
debugged over a network. (The daemon could even keep just one process "in
limbo" like this -- if another crashes, it lets the first one die
completely, perhaps generating a core -- again, useful for an embedded
design.)

> BTW, I think the plain core name patch belongs in 2.2.xx while
> the rest is obviously a big 2.3.xx project.

Agreed on 2.3.x.

-- 
Kenneth Albanowski (kjahds@kjahds.com, CIS: 70705,126)

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/