Re: [PATCH v8 4/6] kallsyms: introduce sections needed to map symbols to built-in modules

From: Masahiro Yamada
Date: Wed Feb 09 2022 - 20:26:10 EST


On Wed, Feb 9, 2022 at 3:44 AM Nick Alcock <nick.alcock@xxxxxxxxxx> wrote:
>
> The mapping consists of three new symbols, computed by integrating the
> information in the (just-added) .tmp_vmlinux.ranges and
> modules_thick.builtin: taken together, they map address ranges
> (corresponding to object files on the input) to the names of zero or
> more modules containing those address ranges.
>
> - kallsyms_module_addresses/kallsyms_module_offsets encodes the
> address/offset of each object file (derived from the linker map), in
> exactly the same way as kallsyms_addresses/kallsyms_offsets does
> for symbols. There is no size: instead, the object files are assumed
> to tile the address space. (This is slightly more space-efficient
> than using a size). Non-text-section addresses are skipped: for now,
> all the users of this interface only need module/non-module
> information for instruction pointer addresses, not absolute-addressed
> symbols and the like. This restriction can easily be lifted in
> future. (Regarding the name: right now the entries correspond pretty
> closely to object files, so we could call the section
> kallsyms_objfiles or something, but the optimizer added in the next
> commit will change this.)
>
> - kallsyms_module_names encodes the name of each module in a modified
> form of strtab: notably, if an object file appears in *multiple*
> modules, all of which are built in, this is encoded via a zero byte,
> a one-byte module count, then a series of that many null-terminated
> strings. As a special case, the table starts with a single zero byte
> which does *not* represent the start of a multi-module list.
>
> - kallsyms_modules connects the two, encoding a table associated 1:1
> with kallsyms_module_addresses / kallsyms_module_offsets, pointing
> at an offset in kallsyms_module_names describing which module (or
> modules, for a multi-module list) the code occupying this address
> range is part of. If an address range is part of no module (always
> built-in) it points at 0 (the null byte at the start of the
> kallsyms_module_names list).
>
> There is no optimization yet: kallsyms_modules and
> kallsyms_module_names will almost certainly contain many duplicate
> entries, and kallsyms_module_{addresses,offsets} may contain
> consecutive entries that point to the same place. The size hit is
> fairly substantial as a result, though still much less than a naive
> implementation mapping each symbol to a module name would be: 50KiB or
> so.
>
> Signed-off-by: Nick Alcock <nick.alcock@xxxxxxxxxx>
> Reviewed-by: Kris Van Hees <kris.van.hees@xxxxxxxxxx>
> ---
> Makefile | 2 +-
> init/Kconfig | 8 +
> scripts/Makefile | 6 +
> scripts/kallsyms.c | 366 +++++++++++++++++++++++++++++++++++++++++++--
> 4 files changed, 371 insertions(+), 11 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index 5e823fe8390f..b719244cb571 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1151,7 +1151,7 @@ cmd_link-vmlinux = \
> $(CONFIG_SHELL) $< "$(LD)" "$(KBUILD_LDFLAGS)" "$(LDFLAGS_vmlinux)"; \
> $(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
>
> -vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) FORCE
> +vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) modules_thick.builtin FORCE
> +$(call if_changed_dep,link-vmlinux)
>
> targets := vmlinux
> diff --git a/init/Kconfig b/init/Kconfig
> index e9119bf54b1f..e1ca3d70cb1c 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1530,6 +1530,14 @@ config POSIX_TIMERS
>
> If unsure say y.
>
> +config KALLMODSYMS
> + default y
> + bool "Enable support for /proc/kallmodsyms" if EXPERT
> + depends on KALLSYMS
> + help
> + This option enables the /proc/kallmodsyms file, which maps symbols
> + to addresses and their associated modules.
> +
> config PRINTK
> default y
> bool "Enable support for printk" if EXPERT
> diff --git a/scripts/Makefile b/scripts/Makefile
> index ce5aa9030b74..c5cc4ac3d660 100644
> --- a/scripts/Makefile
> +++ b/scripts/Makefile
> @@ -29,6 +29,12 @@ ifdef CONFIG_BUILDTIME_MCOUNT_SORT
> HOSTCFLAGS_sorttable.o += -DMCOUNT_SORT_ENABLED
> endif
>
> +kallsyms-objs := kallsyms.o
> +
> +ifdef CONFIG_KALLMODSYMS
> +kallsyms-objs += modules_thick.o
> +endif
> +
> # The following programs are only built on demand
> hostprogs += unifdef
>
> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
> index 54ad86d13784..8f87b724d0fa 100644
> --- a/scripts/kallsyms.c
> +++ b/scripts/kallsyms.c
> @@ -5,7 +5,10 @@
> * This software may be used and distributed according to the terms
> * of the GNU General Public License, incorporated herein by reference.
> *
> - * Usage: nm -n vmlinux | scripts/kallsyms [--all-symbols] > symbols.S
> + * Usage: nm -n vmlinux
> + * | scripts/kallsyms [--all-symbols] [--absolute-percpu]
> + * [--base-relative] [--builtin=modules_thick.builtin]
> + * > symbols.S
> *
> * Table compression uses all the unused char codes on the symbols and
> * maps these to the most used substrings (tokens). For instance, it might
> @@ -24,6 +27,10 @@
> #include <string.h>
> #include <ctype.h>
> #include <limits.h>
> +#include <assert.h>
> +#include "modules_thick.h"
> +
> +#include "../include/generated/autoconf.h"



I do not remember if I had pointed this out before,
but including autoconf.h from a host program is wrong.

Do not use ifdef CONFIG_... in the hostprog code.
Having --builtin=modules_thick.builtin is enough.





--
Best Regards
Masahiro Yamada