Re: [PATCH] kdb: Get rid of custom debug heap allocator

From: Daniel Thompson
Date: Thu Feb 25 2021 - 07:21:06 EST


On Thu, Feb 25, 2021 at 04:52:58PM +0530, Sumit Garg wrote:
> Currently the only user for debug heap is kdbnearsym() which can be
> modified to rather ask the caller to supply a buffer for symbol name.
> So do that and modify kdbnearsym() callers to pass a symbol name buffer
> allocated from stack and hence remove custom debug heap allocator.

Is it really a good idea to increase stack usage this much? I thought
several architectures will take the debug exception on existing stacks
(and that these can nest with other exceptions).

The reason I'm concerned is that AFAICT the *purpose* of the current
heap is to minimise stack usage... and that this has the effect of
improving debugger robustness when we take exceptions on small shared
stacks.

The reason I called the heap redundant is that currently it also allows
us to have nested calls to kdbnearsym() whilst not consuming stack. In
this case, when I say nested I mean new calls to kdbnearsym() before the
previous caller has consumed the output rather than truely recursive
calls.

This is why I think the heap is pointless. In "normal" usage I don't
think there will never be a nested call to kdbnearsym() so I think a
single static buffer will suffice.

Technically speaking there is one way that kdbnearsym() can nest but I
think it is OK for that to be considered out-of-scope.

To explain...

It can nest is if we recursively enter the debugger! Recursive entry
should never happen, is pretty much untestable and, even if we tested
it, it is not a bug for an architeture to choose not to support it.
Nevertheless kgdb/kdb does include logic to handle this if an
architecture does make it as far are executing the trap. Note that
even if the architecture does somehow land in the debug trap there's
a strong chance the system is is too broken to resume (since we just
took an impossible trap). Therefore kdb will inhibit resume unless the
operator admits what they are doing won't work before trying to do it.

Therefore I think it is ok for namebuf to be statically allocated and
the only thing we need do for stability is ensure that kdbnearsym()
guarantees that namebuf[sizeof(namebuf)-1] == '\0' regardless of the
symbol length. Thus if by some miracle the system can resume after the
user has ignored the warning then kdb can't take a bad memory access
when it tries to print an overwritten symbol name. They see a few
garbage characters... but since they just told us to do something
crazy they should be expecting that.


Daniel.


PS The code to guarantee that if we read past the end of the string
we will still see a '\'0' before making an invalid memory access
should be well commented though... because its pretty nasty.


>
> This change has been tested using kgdbtest on arm64 which doesn't show
> any regressions.
>
> Suggested-by: Daniel Thompson <daniel.thompson@xxxxxxxxxx>
> Signed-off-by: Sumit Garg <sumit.garg@xxxxxxxxxx>
> ---
> kernel/debug/kdb/kdb_debugger.c | 1 -
> kernel/debug/kdb/kdb_main.c | 6 +-
> kernel/debug/kdb/kdb_private.h | 7 +-
> kernel/debug/kdb/kdb_support.c | 294 +-------------------------------
> 4 files changed, 11 insertions(+), 297 deletions(-)
>
> diff --git a/kernel/debug/kdb/kdb_debugger.c b/kernel/debug/kdb/kdb_debugger.c
> index 0220afda3200..e91fc3e4edd5 100644
> --- a/kernel/debug/kdb/kdb_debugger.c
> +++ b/kernel/debug/kdb/kdb_debugger.c
> @@ -140,7 +140,6 @@ int kdb_stub(struct kgdb_state *ks)
> */
> kdb_common_deinit_state();
> KDB_STATE_CLEAR(PAGER);
> - kdbnearsym_cleanup();
> if (error == KDB_CMD_KGDB) {
> if (KDB_STATE(DOING_KGDB))
> KDB_STATE_CLEAR(DOING_KGDB);
> diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
> index 9d69169582c6..ca525a3e0032 100644
> --- a/kernel/debug/kdb/kdb_main.c
> +++ b/kernel/debug/kdb/kdb_main.c
> @@ -526,6 +526,7 @@ int kdbgetaddrarg(int argc, const char **argv, int *nextarg,
> char symbol = '\0';
> char *cp;
> kdb_symtab_t symtab;
> + char namebuf[KSYM_NAME_LEN];
>
> /*
> * If the enable flags prohibit both arbitrary memory access
> @@ -585,7 +586,7 @@ int kdbgetaddrarg(int argc, const char **argv, int *nextarg,
> }
>
> if (!found)
> - found = kdbnearsym(addr, &symtab);
> + found = kdbnearsym(addr, &symtab, namebuf);
>
> (*nextarg)++;
>
> @@ -1503,6 +1504,7 @@ static void kdb_md_line(const char *fmtstr, unsigned long addr,
> int i;
> int j;
> unsigned long word;
> + char namebuf[KSYM_NAME_LEN];
>
> memset(cbuf, '\0', sizeof(cbuf));
> if (phys)
> @@ -1518,7 +1520,7 @@ static void kdb_md_line(const char *fmtstr, unsigned long addr,
> break;
> kdb_printf(fmtstr, word);
> if (symbolic)
> - kdbnearsym(word, &symtab);
> + kdbnearsym(word, &symtab, namebuf);
> else
> memset(&symtab, 0, sizeof(symtab));
> if (symtab.sym_name) {
> diff --git a/kernel/debug/kdb/kdb_private.h b/kernel/debug/kdb/kdb_private.h
> index b857a84de3b5..1707eeebc59a 100644
> --- a/kernel/debug/kdb/kdb_private.h
> +++ b/kernel/debug/kdb/kdb_private.h
> @@ -108,8 +108,7 @@ extern char *kdbgetenv(const char *);
> extern int kdbgetaddrarg(int, const char **, int*, unsigned long *,
> long *, char **);
> extern int kdbgetsymval(const char *, kdb_symtab_t *);
> -extern int kdbnearsym(unsigned long, kdb_symtab_t *);
> -extern void kdbnearsym_cleanup(void);
> +extern int kdbnearsym(unsigned long addr, kdb_symtab_t *symtab, char *namebuf);
> extern char *kdb_strdup(const char *str, gfp_t type);
> extern void kdb_symbol_print(unsigned long, const kdb_symtab_t *, unsigned int);
>
> @@ -233,10 +232,6 @@ extern struct task_struct *kdb_curr_task(int);
>
> #define GFP_KDB (in_dbg_master() ? GFP_ATOMIC : GFP_KERNEL)
>
> -extern void *debug_kmalloc(size_t size, gfp_t flags);
> -extern void debug_kfree(void *);
> -extern void debug_kusage(void);
> -
> extern struct task_struct *kdb_current_task;
> extern struct pt_regs *kdb_current_regs;
>
> diff --git a/kernel/debug/kdb/kdb_support.c b/kernel/debug/kdb/kdb_support.c
> index b59aad1f0b55..319b36ceedf9 100644
> --- a/kernel/debug/kdb/kdb_support.c
> +++ b/kernel/debug/kdb/kdb_support.c
> @@ -57,8 +57,6 @@ int kdbgetsymval(const char *symname, kdb_symtab_t *symtab)
> }
> EXPORT_SYMBOL(kdbgetsymval);
>
> -static char *kdb_name_table[100]; /* arbitrary size */
> -
> /*
> * kdbnearsym - Return the name of the symbol with the nearest address
> * less than 'addr'.
> @@ -79,13 +77,11 @@ static char *kdb_name_table[100]; /* arbitrary size */
> * hold active strings, no kdb caller of kdbnearsym makes more
> * than ~20 later calls before using a saved value.
> */
> -int kdbnearsym(unsigned long addr, kdb_symtab_t *symtab)
> +int kdbnearsym(unsigned long addr, kdb_symtab_t *symtab, char *namebuf)
> {
> int ret = 0;
> unsigned long symbolsize = 0;
> unsigned long offset = 0;
> -#define knt1_size 128 /* must be >= kallsyms table size */
> - char *knt1 = NULL;
>
> if (KDB_DEBUG(AR))
> kdb_printf("kdbnearsym: addr=0x%lx, symtab=%px\n", addr, symtab);
> @@ -93,14 +89,9 @@ int kdbnearsym(unsigned long addr, kdb_symtab_t *symtab)
>
> if (addr < 4096)
> goto out;
> - knt1 = debug_kmalloc(knt1_size, GFP_ATOMIC);
> - if (!knt1) {
> - kdb_printf("kdbnearsym: addr=0x%lx cannot kmalloc knt1\n",
> - addr);
> - goto out;
> - }
> +
> symtab->sym_name = kallsyms_lookup(addr, &symbolsize , &offset,
> - (char **)(&symtab->mod_name), knt1);
> + (char **)(&symtab->mod_name), namebuf);
> if (offset > 8*1024*1024) {
> symtab->sym_name = NULL;
> addr = offset = symbolsize = 0;
> @@ -109,42 +100,6 @@ int kdbnearsym(unsigned long addr, kdb_symtab_t *symtab)
> symtab->sym_end = symtab->sym_start + symbolsize;
> ret = symtab->sym_name != NULL && *(symtab->sym_name) != '\0';
>
> - if (ret) {
> - int i;
> - /* Another 2.6 kallsyms "feature". Sometimes the sym_name is
> - * set but the buffer passed into kallsyms_lookup is not used,
> - * so it contains garbage. The caller has to work out which
> - * buffer needs to be saved.
> - *
> - * What was Rusty smoking when he wrote that code?
> - */
> - if (symtab->sym_name != knt1) {
> - strncpy(knt1, symtab->sym_name, knt1_size);
> - knt1[knt1_size-1] = '\0';
> - }
> - for (i = 0; i < ARRAY_SIZE(kdb_name_table); ++i) {
> - if (kdb_name_table[i] &&
> - strcmp(kdb_name_table[i], knt1) == 0)
> - break;
> - }
> - if (i >= ARRAY_SIZE(kdb_name_table)) {
> - debug_kfree(kdb_name_table[0]);
> - memmove(kdb_name_table, kdb_name_table+1,
> - sizeof(kdb_name_table[0]) *
> - (ARRAY_SIZE(kdb_name_table)-1));
> - } else {
> - debug_kfree(knt1);
> - knt1 = kdb_name_table[i];
> - memmove(kdb_name_table+i, kdb_name_table+i+1,
> - sizeof(kdb_name_table[0]) *
> - (ARRAY_SIZE(kdb_name_table)-i-1));
> - }
> - i = ARRAY_SIZE(kdb_name_table) - 1;
> - kdb_name_table[i] = knt1;
> - symtab->sym_name = kdb_name_table[i];
> - knt1 = NULL;
> - }
> -
> if (symtab->mod_name == NULL)
> symtab->mod_name = "kernel";
> if (KDB_DEBUG(AR))
> @@ -152,23 +107,10 @@ int kdbnearsym(unsigned long addr, kdb_symtab_t *symtab)
> "symtab->mod_name=%px, symtab->sym_name=%px (%s)\n", ret,
> symtab->sym_start, symtab->mod_name, symtab->sym_name,
> symtab->sym_name);
> -
> out:
> - debug_kfree(knt1);
> return ret;
> }
>
> -void kdbnearsym_cleanup(void)
> -{
> - int i;
> - for (i = 0; i < ARRAY_SIZE(kdb_name_table); ++i) {
> - if (kdb_name_table[i]) {
> - debug_kfree(kdb_name_table[i]);
> - kdb_name_table[i] = NULL;
> - }
> - }
> -}
> -
> static char ks_namebuf[KSYM_NAME_LEN+1], ks_namebuf_prev[KSYM_NAME_LEN+1];
>
> /*
> @@ -259,11 +201,13 @@ void kdb_symbol_print(unsigned long addr, const kdb_symtab_t *symtab_p,
> unsigned int punc)
> {
> kdb_symtab_t symtab, *symtab_p2;
> + char namebuf[KSYM_NAME_LEN];
> +
> if (symtab_p) {
> symtab_p2 = (kdb_symtab_t *)symtab_p;
> } else {
> symtab_p2 = &symtab;
> - kdbnearsym(addr, symtab_p2);
> + kdbnearsym(addr, symtab_p2, namebuf);
> }
> if (!(symtab_p2->sym_name || (punc & KDB_SP_VALUE)))
> return;
> @@ -665,232 +609,6 @@ unsigned long kdb_task_state(const struct task_struct *p, unsigned long mask)
> return (mask & kdb_task_state_string(state)) != 0;
> }
>
> -/* Last ditch allocator for debugging, so we can still debug even when
> - * the GFP_ATOMIC pool has been exhausted. The algorithms are tuned
> - * for space usage, not for speed. One smallish memory pool, the free
> - * chain is always in ascending address order to allow coalescing,
> - * allocations are done in brute force best fit.
> - */
> -
> -struct debug_alloc_header {
> - u32 next; /* offset of next header from start of pool */
> - u32 size;
> - void *caller;
> -};
> -
> -/* The memory returned by this allocator must be aligned, which means
> - * so must the header size. Do not assume that sizeof(struct
> - * debug_alloc_header) is a multiple of the alignment, explicitly
> - * calculate the overhead of this header, including the alignment.
> - * The rest of this code must not use sizeof() on any header or
> - * pointer to a header.
> - */
> -#define dah_align 8
> -#define dah_overhead ALIGN(sizeof(struct debug_alloc_header), dah_align)
> -
> -static u64 debug_alloc_pool_aligned[256*1024/dah_align]; /* 256K pool */
> -static char *debug_alloc_pool = (char *)debug_alloc_pool_aligned;
> -static u32 dah_first, dah_first_call = 1, dah_used, dah_used_max;
> -
> -/* Locking is awkward. The debug code is called from all contexts,
> - * including non maskable interrupts. A normal spinlock is not safe
> - * in NMI context. Try to get the debug allocator lock, if it cannot
> - * be obtained after a second then give up. If the lock could not be
> - * previously obtained on this cpu then only try once.
> - *
> - * sparse has no annotation for "this function _sometimes_ acquires a
> - * lock", so fudge the acquire/release notation.
> - */
> -static DEFINE_SPINLOCK(dap_lock);
> -static int get_dap_lock(void)
> - __acquires(dap_lock)
> -{
> - static int dap_locked = -1;
> - int count;
> - if (dap_locked == smp_processor_id())
> - count = 1;
> - else
> - count = 1000;
> - while (1) {
> - if (spin_trylock(&dap_lock)) {
> - dap_locked = -1;
> - return 1;
> - }
> - if (!count--)
> - break;
> - udelay(1000);
> - }
> - dap_locked = smp_processor_id();
> - __acquire(dap_lock);
> - return 0;
> -}
> -
> -void *debug_kmalloc(size_t size, gfp_t flags)
> -{
> - unsigned int rem, h_offset;
> - struct debug_alloc_header *best, *bestprev, *prev, *h;
> - void *p = NULL;
> - if (!get_dap_lock()) {
> - __release(dap_lock); /* we never actually got it */
> - return NULL;
> - }
> - h = (struct debug_alloc_header *)(debug_alloc_pool + dah_first);
> - if (dah_first_call) {
> - h->size = sizeof(debug_alloc_pool_aligned) - dah_overhead;
> - dah_first_call = 0;
> - }
> - size = ALIGN(size, dah_align);
> - prev = best = bestprev = NULL;
> - while (1) {
> - if (h->size >= size && (!best || h->size < best->size)) {
> - best = h;
> - bestprev = prev;
> - if (h->size == size)
> - break;
> - }
> - if (!h->next)
> - break;
> - prev = h;
> - h = (struct debug_alloc_header *)(debug_alloc_pool + h->next);
> - }
> - if (!best)
> - goto out;
> - rem = best->size - size;
> - /* The pool must always contain at least one header */
> - if (best->next == 0 && bestprev == NULL && rem < dah_overhead)
> - goto out;
> - if (rem >= dah_overhead) {
> - best->size = size;
> - h_offset = ((char *)best - debug_alloc_pool) +
> - dah_overhead + best->size;
> - h = (struct debug_alloc_header *)(debug_alloc_pool + h_offset);
> - h->size = rem - dah_overhead;
> - h->next = best->next;
> - } else
> - h_offset = best->next;
> - best->caller = __builtin_return_address(0);
> - dah_used += best->size;
> - dah_used_max = max(dah_used, dah_used_max);
> - if (bestprev)
> - bestprev->next = h_offset;
> - else
> - dah_first = h_offset;
> - p = (char *)best + dah_overhead;
> - memset(p, POISON_INUSE, best->size - 1);
> - *((char *)p + best->size - 1) = POISON_END;
> -out:
> - spin_unlock(&dap_lock);
> - return p;
> -}
> -
> -void debug_kfree(void *p)
> -{
> - struct debug_alloc_header *h;
> - unsigned int h_offset;
> - if (!p)
> - return;
> - if ((char *)p < debug_alloc_pool ||
> - (char *)p >= debug_alloc_pool + sizeof(debug_alloc_pool_aligned)) {
> - kfree(p);
> - return;
> - }
> - if (!get_dap_lock()) {
> - __release(dap_lock); /* we never actually got it */
> - return; /* memory leak, cannot be helped */
> - }
> - h = (struct debug_alloc_header *)((char *)p - dah_overhead);
> - memset(p, POISON_FREE, h->size - 1);
> - *((char *)p + h->size - 1) = POISON_END;
> - h->caller = NULL;
> - dah_used -= h->size;
> - h_offset = (char *)h - debug_alloc_pool;
> - if (h_offset < dah_first) {
> - h->next = dah_first;
> - dah_first = h_offset;
> - } else {
> - struct debug_alloc_header *prev;
> - unsigned int prev_offset;
> - prev = (struct debug_alloc_header *)(debug_alloc_pool +
> - dah_first);
> - while (1) {
> - if (!prev->next || prev->next > h_offset)
> - break;
> - prev = (struct debug_alloc_header *)
> - (debug_alloc_pool + prev->next);
> - }
> - prev_offset = (char *)prev - debug_alloc_pool;
> - if (prev_offset + dah_overhead + prev->size == h_offset) {
> - prev->size += dah_overhead + h->size;
> - memset(h, POISON_FREE, dah_overhead - 1);
> - *((char *)h + dah_overhead - 1) = POISON_END;
> - h = prev;
> - h_offset = prev_offset;
> - } else {
> - h->next = prev->next;
> - prev->next = h_offset;
> - }
> - }
> - if (h_offset + dah_overhead + h->size == h->next) {
> - struct debug_alloc_header *next;
> - next = (struct debug_alloc_header *)
> - (debug_alloc_pool + h->next);
> - h->size += dah_overhead + next->size;
> - h->next = next->next;
> - memset(next, POISON_FREE, dah_overhead - 1);
> - *((char *)next + dah_overhead - 1) = POISON_END;
> - }
> - spin_unlock(&dap_lock);
> -}
> -
> -void debug_kusage(void)
> -{
> - struct debug_alloc_header *h_free, *h_used;
> -#ifdef CONFIG_IA64
> - /* FIXME: using dah for ia64 unwind always results in a memory leak.
> - * Fix that memory leak first, then set debug_kusage_one_time = 1 for
> - * all architectures.
> - */
> - static int debug_kusage_one_time;
> -#else
> - static int debug_kusage_one_time = 1;
> -#endif
> - if (!get_dap_lock()) {
> - __release(dap_lock); /* we never actually got it */
> - return;
> - }
> - h_free = (struct debug_alloc_header *)(debug_alloc_pool + dah_first);
> - if (dah_first == 0 &&
> - (h_free->size == sizeof(debug_alloc_pool_aligned) - dah_overhead ||
> - dah_first_call))
> - goto out;
> - if (!debug_kusage_one_time)
> - goto out;
> - debug_kusage_one_time = 0;
> - kdb_printf("%s: debug_kmalloc memory leak dah_first %d\n",
> - __func__, dah_first);
> - if (dah_first) {
> - h_used = (struct debug_alloc_header *)debug_alloc_pool;
> - kdb_printf("%s: h_used %px size %d\n", __func__, h_used,
> - h_used->size);
> - }
> - do {
> - h_used = (struct debug_alloc_header *)
> - ((char *)h_free + dah_overhead + h_free->size);
> - kdb_printf("%s: h_used %px size %d caller %px\n",
> - __func__, h_used, h_used->size, h_used->caller);
> - h_free = (struct debug_alloc_header *)
> - (debug_alloc_pool + h_free->next);
> - } while (h_free->next);
> - h_used = (struct debug_alloc_header *)
> - ((char *)h_free + dah_overhead + h_free->size);
> - if ((char *)h_used - debug_alloc_pool !=
> - sizeof(debug_alloc_pool_aligned))
> - kdb_printf("%s: h_used %px size %d caller %px\n",
> - __func__, h_used, h_used->size, h_used->caller);
> -out:
> - spin_unlock(&dap_lock);
> -}
> -
> /* Maintain a small stack of kdb_flags to allow recursion without disturbing
> * the global kdb state.
> */
> --
> 2.25.1
>