Re: [PATCH v2] panic: Taint kernel if fault injection has been used

From: Steven Rostedt
Date: Tue Dec 06 2022 - 23:40:02 EST


On Tue, 6 Dec 2022 20:01:46 -0800
Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:


> >
> > > At this point crash dump might be necessary to debug.
> >
> > Yes. So the TAINT flag can help. Please consider that the TAINT flag
> > doesn't mean you are guilty, but this is just a hint for debugging.
> > (good for the first triage)
>
> I think you misunderstand the reason behind 'tainted' flags.
> It's 'hint for debugging' only on the surface.
> See Documentation/admin-guide/tainted-kernels.rst
> "... That's why bug reports
> from tainted kernels will often be ignored by developers, hence try to reproduce
> problems with an untainted kernel."

You conveniently left out the first part of that paragraph. Showing just a
portion of a statement can be very misleading. Let me add the whole
paragraph here:

The kernel will mark itself as 'tainted' when something occurs that
might be relevant later when investigating problems. Don't worry too much about this,
most of the time it's not a problem to run a tainted kernel; the information is
mainly of interest once someone wants to investigate some problem, as its real
cause might be the event that got the kernel tainted. That's why bug reports
from tainted kernels will often be ignored by developers, hence try to reproduce
problems with an untainted kernel.

Let me stress the very first sentence above:

The kernel will mark itself as 'tainted' when something occurs that
might be relevant later when investigating problems.

I think you are the one that is misunderstanding what a taint is. It most
definitely is about giving hints for debugging. That's why the very first
sentence of that paragraph, as well as the entire document, explicitly
states "might be relevant later when investigating problems".


>
> When 'error injection' finds a kernel bug the kernel developers need to
> look into it regardless whether it's syzbot error injection
> or whatever other mechanism.
>

And this is a very useful taint. Just like:

2 _/S 4 kernel running on an out of specification system
5 _/B 32 bad page referenced or some unexpected page flags
7 _/D 128 kernel died recently, i.e. there was an OOPS or BUG
10 _/C 1024 staging driver was loaded
11 _/I 2048 workaround for bug in platform firmware applied
14 _/L 16384 soft lockup occurred
17 _/T 131072 kernel was built with the struct randomization plugin

Any of the above should not be ignored by developers, but they are useful
hints for debugging the issue.


> To change the topic to something ... else...
>
> We've just hit this panic using rethook.
> [ 49.235708] ==================================================================
> [ 49.236243] BUG: KASAN: use-after-free in rethook_try_get+0x7e/0x380
> [ 49.236693] Read of size 8 at addr ffff888102e62c88 by task test_progs/1688
> [ 49.240398] kasan_report+0x90/0x190
> [ 49.240934] rethook_try_get+0x7e/0x380
> [ 49.244885] fprobe_handler.part.1+0x119/0x1f0
> [ 49.245505] arch_ftrace_ops_list_func+0x17d/0x1d0
> [ 49.246544] ftrace_regs_call+0x5/0x52
> [ 49.247411] bpf_fentry_test1+0x5/0x10
>
> [ 49.262578] Allocated by task 1692:
> [ 49.262804] kasan_save_stack+0x1c/0x40
> [ 49.263059] kasan_set_track+0x21/0x30
> [ 49.263335] __kasan_kmalloc+0x7a/0x90
> [ 49.263624] rethook_alloc+0x2c/0xa0
> [ 49.263879] fprobe_init_rethook+0x6d/0x170
> [ 49.264154] register_fprobe_ips+0xae/0x130
>
> [ 49.265938] Freed by task 0:
> [ 49.266153] kasan_save_stack+0x1c/0x40
> [ 49.266440] kasan_set_track+0x21/0x30
> [ 49.266705] kasan_save_free_info+0x26/0x40
> [ 49.266995] __kasan_slab_free+0x103/0x190
> [ 49.267282] __kmem_cache_free+0x1b7/0x3a0
> [ 49.267559] rcu_core+0x4d8/0xd50
>
> [ 49.268181] Last potentially related work creation:
> [ 49.268565] kasan_save_stack+0x1c/0x40
> [ 49.268898] __kasan_record_aux_stack+0xa1/0xb0
> [ 49.269260] call_rcu+0x47/0x360
> [ 49.269526] unregister_fprobe+0x47/0x80
>
> [ 49.281382] general protection fault, probably for non-canonical address 0x57e006e00000000: 0000 [#1] PREEMPT SMP KASAN
> [ 49.282226] CPU: 6 PID: 1688 Comm: test_progs Tainted: G B O 6.1.0-rc7-01508-gf0c5a2d9f234 #4343
> [ 49.283751] RIP: 0010:rethook_trampoline_handler+0xff/0x1d0
> [ 49.289900] Call Trace:
> [ 49.290083] <TASK>
> [ 49.290248] arch_rethook_trampoline_callback+0x6c/0xa0
> [ 49.290631] arch_rethook_trampoline+0x2c/0x50
> [ 49.290964] ? lock_release+0xad/0x3f0
> [ 49.291245] ? bpf_prog_test_run_tracing+0x235/0x380
> [ 49.291609] trace_clock_x86_tsc+0x10/0x10
>
> This is just running bpf selftests in parallel mode on 16-cpu VM on bpf-next.
> Notice 'Tained' flags.
> Please take a look.
>

"G - Proprietary module" - "O - out of tree module"

Can you reproduce this without those taints?

-- Steve