Re: [PATCH] mm: slub: annotate kmem_cache_node->list_lock as raw_spinlock

From: Qi Zheng
Date: Wed Apr 12 2023 - 12:46:49 EST




On 2023/4/12 20:47, Peter Zijlstra wrote:
On Wed, Apr 12, 2023 at 08:50:29AM +0200, Vlastimil Babka wrote:

--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -562,10 +562,10 @@ __debug_object_init(void *addr, const struct debug_obj_descr *descr, int onstack
unsigned long flags;

/*
- * On RT enabled kernels the pool refill must happen in preemptible
+ * The pool refill must happen in preemptible
* context:
*/
- if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible())
+ if (preemptible())
fill_pool();

+CC Peterz

Aha so this is in fact another case where the code is written with
actual differences between PREEMPT_RT and !PREEMPT_RT in mind, but
CONFIG_PROVE_RAW_LOCK_NESTING always assumes PREEMPT_RT?

Ooh, tricky, yes. PROVE_RAW_LOCK_NESTING always follows the PREEMP_RT
rules and does not expect trickery like the above.

Something like the completely untested below might be of help..

---
diff --git a/include/linux/lockdep_types.h b/include/linux/lockdep_types.h
index d22430840b53..f3120d6a7d9e 100644
--- a/include/linux/lockdep_types.h
+++ b/include/linux/lockdep_types.h
@@ -33,6 +33,7 @@ enum lockdep_wait_type {
enum lockdep_lock_type {
LD_LOCK_NORMAL = 0, /* normal, catch all */
LD_LOCK_PERCPU, /* percpu */
+ LD_LOCK_WAIT, /* annotation */
LD_LOCK_MAX,
};
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 50d4863974e7..a4077f5bb75b 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2279,8 +2279,9 @@ static inline bool usage_skip(struct lock_list *entry, void *mask)
* As a result, we will skip local_lock(), when we search for irq
* inversion bugs.
*/
- if (entry->class->lock_type == LD_LOCK_PERCPU) {
- if (DEBUG_LOCKS_WARN_ON(entry->class->wait_type_inner < LD_WAIT_CONFIG))
+ if (entry->class->lock_type != LD_LOCK_NORMAL) {
+ if (entry->class->lock_type == LD_LOCK_PERCPU &&
+ DEBUG_LOCKS_WARN_ON(entry->class->wait_type_inner < LD_WAIT_CONFIG))
return false;
return true;
@@ -4752,7 +4753,8 @@ static int check_wait_context(struct task_struct *curr, struct held_lock *next)
for (; depth < curr->lockdep_depth; depth++) {
struct held_lock *prev = curr->held_locks + depth;
- u8 prev_inner = hlock_class(prev)->wait_type_inner;
+ struct lock_class *class = hlock_class(prev);
+ u8 prev_inner = class->wait_type_inner;
if (prev_inner) {
/*
@@ -4762,6 +4764,12 @@ static int check_wait_context(struct task_struct *curr, struct held_lock *next)
* Also due to trylocks.
*/
curr_inner = min(curr_inner, prev_inner);
+
+ /*
+ * Allow override for annotations.
+ */
+ if (unlikely(class->lock_type == LD_LOCK_WAIT))
+ curr_inner = prev_inner;
}
}
diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index df86e649d8be..fae71ef72a16 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -565,8 +565,16 @@ __debug_object_init(void *addr, const struct debug_obj_descr *descr, int onstack
* On RT enabled kernels the pool refill must happen in preemptible
* context:
*/
- if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible())
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) {
+ static lockdep_map dep_map = {

static struct lockdep_map dep_map = {

+ .name = "wait-type-override",
+ .wait_type_inner = LD_WAIT_SLEEP,
+ .lock_type = LD_LOCK_WAIT,
+ };
+ lock_map_acquire(&dep_map);
fill_pool();
+ lock_map_release(&dep_map);
+ }
db = get_bucket((unsigned long) addr);

I just tested the above code, and then got the following
warning:

[ 0.001000][ T0] =============================
[ 0.001000][ T0] [ BUG: Invalid wait context ]
[ 0.001000][ T0] 6.3.0-rc6-next-20230412+ #21 Not tainted
[ 0.001000][ T0] -----------------------------
[ 0.001000][ T0] swapper/0/0 is trying to lock:
[ 0.001000][ T0] ffffffff825bcb80 (wait-type-override){....}-{4:4}, at: __debug_object_init+0x0/0x590
[ 0.001000][ T0] other info that might help us debug this:
[ 0.001000][ T0] context-{5:5}
[ 0.001000][ T0] 2 locks held by swapper/0/0:
[ 0.001000][ T0] #0: ffffffff824f5178 (timekeeper_lock){....}-{2:2}, at: timekeeping_init+0xf1/0x270
[ 0.001000][ T0] #1: ffffffff824f5008 (tk_core.seq.seqcount){....}-{0:0}, at: start_kernel+0x31a/0x800
[ 0.001000][ T0] stack backtrace:
[ 0.001000][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.3.0-rc6-next-20230412+ #21
[ 0.001000][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 0.001000][ T0] Call Trace:
[ 0.001000][ T0] <TASK>
[ 0.001000][ T0] dump_stack_lvl+0x77/0xc0
[ 0.001000][ T0] __lock_acquire+0xa74/0x2960
[ 0.001000][ T0] ? save_trace+0x3f/0x320
[ 0.001000][ T0] ? add_lock_to_list+0x97/0x130
[ 0.001000][ T0] lock_acquire+0xe0/0x300
[ 0.001000][ T0] ? debug_object_active_state+0x180/0x180
[ 0.001000][ T0] __debug_object_init+0x47/0x590
[ 0.001000][ T0] ? debug_object_active_state+0x180/0x180
[ 0.001000][ T0] ? lock_acquire+0x100/0x300
[ 0.001000][ T0] hrtimer_init+0x23/0xc0
[ 0.001000][ T0] ntp_init+0x70/0x80
[ 0.001000][ T0] timekeeping_init+0x12c/0x270
[ 0.001000][ T0] ? start_kernel+0x31a/0x800
[ 0.001000][ T0] ? _printk+0x5c/0x80
[ 0.001000][ T0] start_kernel+0x31a/0x800
[ 0.001000][ T0] secondary_startup_64_no_verify+0xf4/0xfb
[ 0.001000][ T0] </TASK>

It seems that the LD_WAIT_SLEEP we set is already greater than the
LD_WAIT_SPIN of the current context.

--
Thanks,
Qi