Re: [PATCH] kasan: support panic_on_warn

From: Dmitry Vyukov
Date: Mon Oct 17 2016 - 05:00:19 EST


On Mon, Oct 17, 2016 at 10:39 AM, Andrey Ryabinin
<aryabinin@xxxxxxxxxxxxx> wrote:
>
>
> On 10/17/2016 11:18 AM, Dmitry Vyukov wrote:
>> On Mon, Oct 17, 2016 at 10:13 AM, Andrey Ryabinin
>> <aryabinin@xxxxxxxxxxxxx> wrote:
>>>
>>>
>>> On 10/14/2016 08:10 PM, Dmitry Vyukov wrote:
>>>> If user sets panic_on_warn, he wants kernel to panic if there is
>>>> anything barely wrong with the kernel. KASAN-detected errors
>>>> are definitely not less benign than an arbitrary kernel WARNING.
>>>>
>>>> Panic after KASAN errors if panic_on_warn is set.
>>>>
>>>> We use this for continuous fuzzing where we want kernel to stop
>>>> and reboot on any error.
>>>>
>>>> Signed-off-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
>>>> Cc: kasan-dev@xxxxxxxxxxxxxxxx
>>>> Cc: Andrey Ryabinin <aryabinin@xxxxxxxxxxxxx>
>>>> Cc: Alexander Potapenko <glider@xxxxxxxxxx>
>>>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>>>> Cc: linux-mm@xxxxxxxxx
>>>> Cc: linux-kernel@xxxxxxxxxxxxxxx
>>>> ---
>>>> mm/kasan/report.c | 4 ++++
>>>> 1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/mm/kasan/report.c b/mm/kasan/report.c
>>>> index 24c1211..ca0bd48 100644
>>>> --- a/mm/kasan/report.c
>>>> +++ b/mm/kasan/report.c
>>>> @@ -133,6 +133,10 @@ static void kasan_end_report(unsigned long *flags)
>>>> pr_err("==================================================================\n");
>>>> add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
>>>> spin_unlock_irqrestore(&report_lock, *flags);
>>>> + if (panic_on_warn) {
>>>> + panic_on_warn = 0;
>>>
>>> Why we need to reset panic_on_warn?
>>> I assume this was copied from __warn(). AFAIU in __warn() this protects from recursion:
>>> __warn() -> painc() ->__warn() -> panic() -> ...
>>> which is possible if WARN_ON() triggered in panic().
>>> But KASAN is protected from such recursion via kasan_disable_current().
>>
>> But we have recursion into panic via kasan->panic->warning->panic.
>
> We do, like almost every other panic() call in the kernel. But at least it's finite.
> So, if finite recursion is a problem for panic() it should be fixed in panic(), rather then on every panic() call site.


I misunderstood the comment in warning code:

502 /*
503 * This thread may hit another WARN() in the panic path.
504 * Resetting this prevents additional WARN() from
panicking the
505 * system on this thread. Other threads are blocked by the
506 * panic_mutex in panic().
507 */

I interpreted it as recursion into panic will cause a deadlock due to
recursive mutex acquisition.

But the mutex is a custom CAS that supports recursion on the same CPU.

136 this_cpu = raw_smp_processor_id();
137 old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
138
139 if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
140 panic_smp_self_stop();


Mailed v2.

Thanks!


>>
>>>> + panic("panic_on_warn set ...\n");
>>>> + }
>>>> kasan_enable_current();
>>>> }