Re: [PATCH] kfence: Replace local_clock() with ktime_get_boot_fast_ns()

From: Juntong Deng
Date: Thu Nov 23 2023 - 04:29:43 EST


On 2023/11/23 6:19, Marco Elver wrote:
On Wed, 22 Nov 2023 at 22:36, Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote:

On 2023/11/23 4:35, Marco Elver wrote:
On Wed, 22 Nov 2023 at 21:01, Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote:

The time obtained by local_clock() is the local CPU time, which may
drift between CPUs and is not suitable for comparison across CPUs.

It is possible for allocation and free to occur on different CPUs,
and using local_clock() to record timestamps may cause confusion.

The same problem exists with printk logging.

ktime_get_boot_fast_ns() is based on clock sources and can be used
reliably and accurately for comparison across CPUs.

You may be right here, however, the choice of local_clock() was
deliberate: it's the same timestamp source that printk uses.

Also, on systems where there is drift, the arch selects
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK (like on x86) and the drift is
generally bounded.

Signed-off-by: Juntong Deng <juntong.deng@xxxxxxxxxxx>
---
mm/kfence/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 3872528d0963..041c03394193 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -295,7 +295,7 @@ metadata_update_state(struct kfence_metadata *meta, enum kfence_object_state nex
track->num_stack_entries = num_stack_entries;
track->pid = task_pid_nr(current);
track->cpu = raw_smp_processor_id();
- track->ts_nsec = local_clock(); /* Same source as printk timestamps. */
+ track->ts_nsec = ktime_get_boot_fast_ns();

You have ignored the comment placed here - now it's no longer the same
source as printk timestamps. I think not being able to correlate
information from KFENCE reports with timestamps in lines from printk
is worse.

For now, I have to Nack: Unless you can prove that
ktime_get_boot_fast_ns() can still be correlated with timestamps from
printk timestamps, I think this change only trades one problem for
another.

Thanks,
-- Marco

Honestly, the possibility of accurately matching a message in the printk
log by the timestamp in the kfence report is very low, since allocation
and free do not directly correspond to a certain event.

It's about being able to compare the timestamps. I don't want to match
an exact event, but be able to figure out which event happened
before/after an allocation or free, i.e. the logical ordering of
events.

With CONFIG_PRINTK_CALLER we can see the CPU ID in printk lines and
are therefore able to accurately compare printk lines with information
given by KFENCE alloc/free info.



That makes sense.


Since time drifts across CPUs, timestamps may be different even if
allocation and free can correspond to a certain event.

This is not a problem with CONFIG_PRINTK_CALLER.

If we really need to find the relevant printk logs by the timestamps in
the kfence report, all we can do is to look for messages that are within
a certain time range.

If we are looking for messages in a certain time range, there is not
much difference between local_clock() and ktime_get_boot_fast_ns().

Also, this patch is in preparation for my next patch.

My next patch is to show the PID, CPU number, and timestamp when the
error occurred, in this case time drift from different CPUs can
cause confusion.

It's not quite clear how there's a dependency between this patch and a
later patch, but generally it's good practice to send related patches
as a patch series. That way it's easier to see what the overall
changes are and provide feedback as a whole - as is, it's difficult to
provide feedback.

However, from what you say this information is already given.
dump_stack_print_info() shows this - e.g this bit here is printed by
where the error occurred:

| CPU: 0 PID: 484 Comm: kunit_try_catch Not tainted 5.13.0-rc3+ #7
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2
04/01/2014

And if the printk log has timestamps, then these lines are prefixed
with the timestamp where the error occurred.



Thanks, I found that information.

Since this information is at the bottom of the report, I had previously
ignored them.

I would suggest considering moving this information to the top of
the report, for example

BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa6/0x234

CPU: 0 PID: 484 Comm: kunit_try_catch Not tainted 5.13.0-rc3+ #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014

Out-of-bounds read at 0xffff8c3f2e291fff (1B left of kfence-#72):
...

This more clearly correlates this information with the occurrence of
the error.

If the timestamp of the printk is used as the timestamp of when the
error occurred, then my patch is unnecessary.


For example, use-after-free caused by a subtle race condition, in which
the time between the free and the error occur will be very close.

Time drift from different CPUs may cause it to appear in the report that
the error timestamp precedes the free timestamp.

That doesn't matter. I recommend that you go through a hypothetical
debugging scenario:
1. We are not interested in the absolute timings of events, but the
logical ordering between them.

2. The logical ordering of events is inherent from how KFENCE
operates: an error _always_ follows an allocation and/or free. From a
debugging point of view, the timestamps do not have any value here.

3. The timestamps _do_ add value when trying to figure out the logical
ordering between allocation, free, or the erroneous access _with
other_ events in the system. A stream of other events is always shown
in the kernel log (printk). Other streams of events can be obtained
via e.g. ftrace (which also uses local_clock(), see
kernel/trace/trace_clock.c).

So, the timestamp that KFENCE should show is the one that most likely
allows us to deduce the logical ordering with other events in the
system.


Thanks for the detailed explanation.

I now have a better understanding of the purpose of timestamps in
KFENCE reports.