Re: [PATCH v4 02/11] x86, kfence: enable KFENCE for x86

From: Marco Elver
Date: Fri Oct 09 2020 - 13:41:08 EST


On Wed, Oct 07, 2020 at 04:41PM +0200, Marco Elver wrote:
> On Wed, 7 Oct 2020 at 16:15, Jann Horn <jannh@xxxxxxxxxx> wrote:
[...]
> > > > > + return false;
> > > > > +
> > > > > + if (protect)
> > > > > + set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT));
> > > > > + else
> > > > > + set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT));
> > > >
> > > > Hmm... do we have this helper (instead of using the existing helpers
> > > > for modifying memory permissions) to work around the allocation out of
> > > > the data section?
> > >
> > > I just played around with using the set_memory.c functions, to remind
> > > myself why this didn't work. I experimented with using
> > > set_memory_{np,p}() functions; set_memory_p() isn't implemented, but
> > > is easily added (which I did for below experiment). However, this
> > > didn't quite work:
> > [...]
> > > For one, smp_call_function_many_cond() doesn't want to be called with
> > > interrupts disabled, and we may very well get a KFENCE allocation or
> > > page fault with interrupts disabled / within interrupts.
> > >
> > > Therefore, to be safe, we should avoid IPIs.
> >
> > set_direct_map_invalid_noflush() does that, too, I think? And that's
> > already implemented for both arm64 and x86.
>
> Sure, that works.
>
> We still want the flush_tlb_one_kernel(), at least so the local CPU's
> TLB is flushed.

Nope, sorry, set_direct_map_invalid_noflush() does not work -- this
results in potential deadlock.

================================
WARNING: inconsistent lock state
5.9.0-rc4+ #2 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/1/16 [HC0[0]:SC1[1]:HE1:SE0] takes:
ffffffff89fcf9b8 (cpa_lock){+.?.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
ffffffff89fcf9b8 (cpa_lock){+.?.}-{2:2}, at: __change_page_attr_set_clr+0x1b0/0x2510 arch/x86/mm/pat/set_memory.c:1658
{SOFTIRQ-ON-W} state was registered at:
lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:354 [inline]
__change_page_attr_set_clr+0x1b0/0x2510 arch/x86/mm/pat/set_memory.c:1658
change_page_attr_set_clr+0x333/0x500 arch/x86/mm/pat/set_memory.c:1752
change_page_attr_set arch/x86/mm/pat/set_memory.c:1782 [inline]
set_memory_nx+0xb2/0x110 arch/x86/mm/pat/set_memory.c:1930
free_init_pages+0x73/0xc0 arch/x86/mm/init.c:876
alternative_instructions+0x155/0x1a4 arch/x86/kernel/alternative.c:738
check_bugs+0x1bd0/0x1c77 arch/x86/kernel/cpu/bugs.c:140
start_kernel+0x486/0x4b6 init/main.c:1042
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
irq event stamp: 14564
hardirqs last enabled at (14564): [<ffffffff8828cadf>] __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
hardirqs last enabled at (14564): [<ffffffff8828cadf>] _raw_spin_unlock_irqrestore+0x6f/0x90 kernel/locking/spinlock.c:191
hardirqs last disabled at (14563): [<ffffffff8828d239>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (14563): [<ffffffff8828d239>] _raw_spin_lock_irqsave+0xa9/0xce kernel/locking/spinlock.c:159
softirqs last enabled at (14486): [<ffffffff8147fcff>] run_ksoftirqd kernel/softirq.c:652 [inline]
softirqs last enabled at (14486): [<ffffffff8147fcff>] run_ksoftirqd+0xcf/0x170 kernel/softirq.c:644
softirqs last disabled at (14491): [<ffffffff8147fcff>] run_ksoftirqd kernel/softirq.c:652 [inline]
softirqs last disabled at (14491): [<ffffffff8147fcff>] run_ksoftirqd+0xcf/0x170 kernel/softirq.c:644

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(cpa_lock);
<Interrupt>
lock(cpa_lock);

*** DEADLOCK ***

1 lock held by ksoftirqd/1/16:
#0: ffffffff8a067e20 (rcu_callback){....}-{0:0}, at: rcu_do_batch kernel/rcu/tree.c:2418 [inline]
#0: ffffffff8a067e20 (rcu_callback){....}-{0:0}, at: rcu_core+0x55d/0x1130 kernel/rcu/tree.c:2656

stack backtrace:
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.9.0-rc4+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x198/0x1fd lib/dump_stack.c:118
print_usage_bug kernel/locking/lockdep.c:3350 [inline]
valid_state kernel/locking/lockdep.c:3361 [inline]
mark_lock_irq kernel/locking/lockdep.c:3575 [inline]
mark_lock.cold+0x12/0x17 kernel/locking/lockdep.c:4006
mark_usage kernel/locking/lockdep.c:3905 [inline]
__lock_acquire+0x1159/0x5780 kernel/locking/lockdep.c:4380
lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:354 [inline]
__change_page_attr_set_clr+0x1b0/0x2510 arch/x86/mm/pat/set_memory.c:1658
__set_pages_np arch/x86/mm/pat/set_memory.c:2184 [inline]
set_direct_map_invalid_noflush+0xd2/0x110 arch/x86/mm/pat/set_memory.c:2189
kfence_protect_page arch/x86/include/asm/kfence.h:62 [inline]
kfence_protect+0x10e/0x120 mm/kfence/core.c:124
kfence_guarded_free+0x380/0x880 mm/kfence/core.c:375
rcu_do_batch kernel/rcu/tree.c:2428 [inline]
rcu_core+0x5ca/0x1130 kernel/rcu/tree.c:2656
__do_softirq+0x1f8/0xb23 kernel/softirq.c:298
run_ksoftirqd kernel/softirq.c:652 [inline]
run_ksoftirqd+0xcf/0x170 kernel/softirq.c:644
smpboot_thread_fn+0x655/0x9e0 kernel/smpboot.c:165
kthread+0x3b5/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294