Re: [RFC PATCH] kasan: Record and report more information

From: Dmitry Vyukov
Date: Thu Nov 23 2023 - 09:59:36 EST


On Thu, 23 Nov 2023 at 11:06, Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote:
>
> On 2023/11/20 19:06, Dmitry Vyukov wrote:
> > On Wed, 15 Nov 2023 at 22:53, Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote:
> >>
> >> On 2023/11/14 15:27, Dmitry Vyukov wrote:
> >>> On Mon, 13 Nov 2023 at 22:17, Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote:
> >>>>
> >>>> Record and report more information to help us find the cause of
> >>>> the bug (for example, bugs caused by subtle race condition).
> >>>>
> >>>> This patch adds recording and showing CPU number and timestamp at
> >>>> allocation and free (controlled by CONFIG_KASAN_EXTRA_INFO), and
> >>>> adds recording and showing timestamp at error occurrence (CPU number
> >>>> is already shown by dump_stack_lvl). The timestamps in the report use
> >>>> the same format and source as printk.
> >>>>
> >>>> In order to record CPU number and timestamp at allocation and free,
> >>>> corresponding members need to be added to the relevant data structures,
> >>>> which may lead to increased memory consumption.
> >>>>
> >>>> In Generic KASAN, members are added to struct kasan_track. Since in
> >>>> most cases, alloc meta is stored in the redzone and free meta is
> >>>> stored in the object or the redzone, memory consumption will not
> >>>> increase much.
> >>>>
> >>>> In SW_TAGS KASAN and HW_TAGS KASAN, members are added to
> >>>> struct kasan_stack_ring_entry. Memory consumption increases as the
> >>>> size of struct kasan_stack_ring_entry increases (this part of the
> >>>> memory is allocated by memblock), but since this is configurable,
> >>>> it is up to the user to choose.
> >>>>
> >>>> Signed-off-by: Juntong Deng <juntong.deng@xxxxxxxxxxx>
> >>>> ---
> >>>> lib/Kconfig.kasan | 11 +++++++++++
> >>>> mm/kasan/common.c | 5 +++++
> >>>> mm/kasan/kasan.h | 9 +++++++++
> >>>> mm/kasan/report.c | 28 ++++++++++++++++++++++------
> >>>> mm/kasan/report_tags.c | 18 ++++++++++++++++++
> >>>> mm/kasan/tags.c | 15 +++++++++++++++
> >>>> 6 files changed, 80 insertions(+), 6 deletions(-)
> >>>>
> >>>> diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
> >>>> index fdca89c05745..d9611564b339 100644
> >>>> --- a/lib/Kconfig.kasan
> >>>> +++ b/lib/Kconfig.kasan
> >>>> @@ -207,4 +207,15 @@ config KASAN_MODULE_TEST
> >>>> A part of the KASAN test suite that is not integrated with KUnit.
> >>>> Incompatible with Hardware Tag-Based KASAN.
> >>>>
> >>>> +config KASAN_EXTRA_INFO
> >>>> + bool "Record and report more information"
> >>>> + depends on KASAN
> >>>> + help
> >>>> + Record and report more information to help us find the cause of
> >>>> + the bug. The trade-off is potentially increased memory consumption
> >>>> + (to record more information).
> >>>> +
> >>>> + Currently the CPU number and timestamp are additionally recorded
> >>>> + at allocation and free.
> >>>
> >>> Hi Juntong,
> >>>
> >>> Thanks for working on this.
> >>>
> >>
> >>
> >> Thanks for your reply!
> >>
> >>
> >>> As a KASAN developer I understand what this is doing, but I am trying
> >>> to think from a position of a user that does not know details of KASAN
> >>> implementation. From this position it may be useful to say somewhere
> >>> that information is recorded "per heap allocation". Perhaps something
> >>> like:
> >>>
> >>> "Currently the CPU number and timestamp are additionally recorded for
> >>> each heap block at allocation and free time".
> >>
> >>
> >> Yes, I agree, that is a better expression.
> >>
> >>
> >>>
> >>> Also it's unclear what the memory consumption increase is. You say
> >>> "potentially|, so may it not increase at all? If it increases, by how
> >>> much? I obviously want more information, if I can afford it, but I
> >>> can't understand if I can or not based on this description. I would
> >>> assume that this may be a problem only for small/embedded devices.
> >>> Can we provide some ballpark estimation of the memory consumption
> >>> increase? And somehow say that's probably not an issue for larger
> >>> machines?
> >>>
> >>
> >>
> >> How about this expression?
> >>
> >> Currently, in order to record CPU number and timestamp, the data
> >> structure to record allocation and free information will increase
> >> by 12 bytes.
> >>
> >> In Generic KASAN, this affects all allocations less than 32 bytes.
> >> In SW_TAGS KASAN and HW_TAGS KASAN, depending on the stack_ring_size
> >> boot parameter increases the memory consumption by
> >> 12 * stack_ring_size bytes.
> >
> > Let's go with this version.
> >
> >
> >>>> endif # KASAN
> >>>> diff --git a/mm/kasan/common.c b/mm/kasan/common.c
> >>>> index 256930da578a..7a81566d9d66 100644
> >>>> --- a/mm/kasan/common.c
> >>>> +++ b/mm/kasan/common.c
> >>>> @@ -20,6 +20,7 @@
> >>>> #include <linux/module.h>
> >>>> #include <linux/printk.h>
> >>>> #include <linux/sched.h>
> >>>> +#include <linux/sched/clock.h>
> >>>> #include <linux/sched/task_stack.h>
> >>>> #include <linux/slab.h>
> >>>> #include <linux/stacktrace.h>
> >>>> @@ -50,6 +51,10 @@ void kasan_set_track(struct kasan_track *track, gfp_t flags)
> >>>> {
> >>>> track->pid = current->pid;
> >>>> track->stack = kasan_save_stack(flags, true);
> >>>> +#ifdef CONFIG_KASAN_EXTRA_INFO
> >>>> + track->cpu = raw_smp_processor_id();
> >>>> + track->ts_nsec = local_clock();
> >>>
> >>> What does "local_" mean? Is this clock value comparable across CPUs?
> >>>
> >>
> >>
> >> No, local_clock is the local CPU clock and cannot be used for comparison
> >> across CPUs, I made a mistake here.
> >>
> >> I delved into the clock subsystem of the Linux kernel today and I found
> >> that we have two choices.
> >>
> >> - sched_clock(): based on jiffies, high performance, but on some
> >> hardware, it will drift between CPUs.
> >>
> >> - ktime_get_boot_fast_ns(): based on clocksource, highly accurate,
> >> can be compared between CPUs, but performance is worse (seqlock).
> >>
> >> I tested 100000 calls respectively on my laptop, the average of
> >> sched_clock() is 17ns and the average of ktime_get_boot_fast_ns()
> >> is 25ns.
> >>
> >> ktime_get_boot_fast_ns() takes about 1.5 times as long as sched_clock().
> >>
> >> With Generic KASAN enabled, the average of one memory allocation is
> >> 3512ns on my laptop.
> >>
> >> Personally, I prefer ktime_get_boot_fast_ns() because it is more
> >> accurate and the extra time is insignificant for the time required for
> >> one memory allocation with Generic KASAN enabled.
> >>
> >> But maybe using ktime_get_boot_fast_ns() would have a more serious
> >> impact on small/embedded devices.
> >>
> >> Which do you think is the better choice?
> >
> > I don't have a strong preference.
> >
> > Re drift of sched_clock(), do you mean unsynchronized RDTSC on
> > different cores? I had the impression that RDTSC is synchronized
> > across cores on all recent CPUs/systems.
> >
>
>
> After discussions with Marco Elver, I now think that perhaps continuing
> to use local_clock() is a better option.
>
> The full discussion with Marco Elver can be found at
> https://groups.google.com/g/kasan-dev/c/zmxwYv8wZTg
>
> Because local_clock() is the clock source used by printk and is the
> default clock source for ftrace.
>
> Using local_clock() as the clock source for KASAN reports makes it
> easier to correlate the timestamps of allocations, frees, and errors
> with the timestamps of other events in the system.
>
> This is perhaps more important than being able to accurately compare
> across CPUs.
>
> What do you think?


If other similar tools (kfence, ftrace, printk) use local_clock(),
then it's reasonable to use it in kasan as well.