Re: [PATCH v2] panic: Taint kernel if fault injection has been used

From: Alexei Starovoitov
Date: Mon Dec 05 2022 - 21:17:10 EST


On Mon, Dec 05, 2022 at 07:59:21AM +0900, Masami Hiramatsu wrote:
> On Sun, 4 Dec 2022 14:30:01 -0800
> Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:
>
> > On Mon, Dec 05, 2022 at 07:22:44AM +0900, Masami Hiramatsu (Google) wrote:
> > > From: Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>
> > >
> > > Since the function error injection framework in the fault injection
> > > subsystem can change the function code flow forcibly, it may cause
> > > unexpected behavior (and that is the purpose of this feature) even
> > > if it is applied to the ALLOW_ERROR_INJECTION functions.
> > > So this feature must be used only for debugging or testing purpose.
> >
> > The whole idea of tainting for kernel debugging is questionable.
> > There are many other *inject* kconfigs and other debug flags
> > for link lists, RCU, sleeping, etc.
> > None of them taint the kernel.
> >
> > > To identify this in the kernel oops message, add a new taint flag
> >
> > Have you ever seen a single oops message because of this particular
> > error injection?
>
> No, but there is no guarantee that the FEI doesn't cause any issue
> in the future too. If it happens, we need to know the precise
> information about what FEI/bpf does.
> FEI is a kind of temporal Livepatch for testing. If Livepatch taints
> the kernel, why doesn't the FEI taint it too?

Live patching can replace an arbitrary function and the kernel has
no visibility into what KLP module is doing.
While 'bpf error injection' is predictable.
The functions marked with [BPF_]ALLOW_ERROR_INJECTION can return errors
in the normal execution. So the callers of these functions have to deal with errors.

If kernel panics on such injected error it potentially would have paniced
on it anyway. At this point crash dump might be necessary to debug.
Whether oops happened because of bpf, kprobe or normal execution
doesn't matter much. The bug is in the caller that wasn't prepared
to deal with that error.

One can still walk all bpf progs from crash dump with tool "drgn"
(it has nice scripts to examine the dumps) or "crash" or other tools.

> >
> > > for the fault injection. This taint flag will be set by either
> > > function error injection is used or the BPF use the kprobe_override
> > > on error injectable functions (identified by ALLOW_ERROR_INJECTION).
> >
> > ...
> >
> > > /* set the new array to event->tp_event and set event->prog */
> > > + if (prog->kprobe_override)
> > > + add_taint(TAINT_FAULT_INJECTED, LOCKDEP_NOW_UNRELIABLE);
> >
> > Nack for bpf bits.
>
> I think this is needed especially for bpf bits. If we see this flag,
> we can ask reporters to share the bpf programs which they used.

You can ask reporters to share bpf progs, but you can repro
the oops just as well without bpf. It's not bpf to blame, but the
bug in the caller that you should worry about.