Re: [PATCH] debugobjects: turn off debug_objects_enabled from debug_objects_oom()

From: Thomas Gleixner
Date: Tue Jun 06 2023 - 18:37:37 EST


Tetsuo!

On Mon, May 29 2023 at 23:39, Tetsuo Handa wrote:
> syzbot is reporting false positive ODEBUG message immediately after
> ODEBUG was disabled due to OOM.
>
> [ 1062.309646][T22911] ODEBUG: Out of memory. ODEBUG disabled
> [ 1062.886755][ T5171] ------------[ cut here ]------------
> [ 1062.892770][ T5171] ODEBUG: assert_init not available (active state 0) object: ffffc900056afb20 object type: timer_list hint: process_timeout+0x0/0x40
>
> This race happened because debug_objects_oom() emitted OOM message but did
> not turn off debug_objects_enabled, and debug_print_object() did not check
> debug_objects_enabled when calling WARN().
>
> CPU 0 [ T5171] CPU 1 [T22911]
> -------------- --------------
> debug_object_assert_init() {
> if (!debug_objects_enabled)
> return;
> db = get_bucket((unsigned long) addr); // Finds a bucket, but...
> debug_objects_oom() {
> pr_warn("Out of memory. ODEBUG disabled\n");
> // all buckets get emptied here, and...
> hlist_move_list(&db->list, &freelist);
> }
> lookup_object_or_alloc(addr, db, descr, false, true) {
> lookup_object(addr, b) {
> return NULL; // this bucket is already empty.
> }
> if (!descr->is_static_object || !descr->is_static_object(addr))
> return ERR_PTR(-ENOENT);
> }
> if (!obj) { // obj == ERR_PTR(-ENOENT) because non-static object.
> debug_objects_oom();
> return;
> }
> debug_print_object(&o, "assert_init") {
> // False positive due to not checking debug_objects_enabled.
> WARN(1, KERN_ERR "ODEBUG: %s %s (active state %u) "
> "object: %p object type: %s hint: %pS\n", ...);
> }
> }

The above is undecodable gibberish.

Something like this is completely sufficient:

CPU 0 [ T5171] CPU 1 [T22911]
-------------- --------------
debug_object_assert_init() {
db = get_bucket(addr);
debug_objects_oom() {
pr_warn("Out of memory. ODEBUG disabled\n");
// all buckets get emptied here
}
lookup_object_or_alloc(addr, db, ...)
// Due to OOM:
return ERR_PTR(-ENOENT);
...

// Emits assert_init message and warning
debug_print_object(&o, "assert_init");
}

And this:

> This race happened because debug_objects_oom() emitted OOM message but did
> not turn off debug_objects_enabled

is completely wrong. Why?

The place where debug_objects_enabled is set to 0 is way before
debug_objects_oom() is invoked. That place _cannot_ invoke
debug_objects_oom() because it holds a hash bucket lock.

There are exactly three places which invoke debug_objects_oom() and for
all three places the pattern is exactly the same:

lock_bucket();
obj = lookup_object_or_alloc();
unlock_bucket();
if (!obj)
debug_objects_oom();

The place which clears debug_objects_enabled is unsurprisingly
lookup_object_or_alloc() itself, which _cannot_ invoke
debug_objects_oom() because it is invoked with the hash bucket lock
held. There is even a comment to that effect:

/* Out of memory. Do the cleanup outside of the locked region */
debug_objects_enabled = 0;
return NULL;

So at the point where debug_objects_oom() is invoked
@debug_objects_enabled is already 0.

But you claim that this is required, right?

> @@ -466,6 +466,7 @@ static void debug_objects_oom(void)
> unsigned long flags;
> int i;
>
> + debug_objects_enabled = 0;
> pr_warn("Out of memory. ODEBUG disabled\n");

Q: What is setting a variable which is already 0 to 0 solving?
A: Absolutely nothing

Now this:

> @@ -502,10 +503,10 @@ static void debug_print_object(struct de
> void *hint = descr->debug_hint ?
> descr->debug_hint(obj->object) : NULL;
> limit++;
> - WARN(1, KERN_ERR "ODEBUG: %s %s (active state %u) "
> - "object: %p object type: %s hint: %pS\n",
> - msg, obj_states[obj->state], obj->astate,
> - obj->object, descr->name, hint);
> + WARN(debug_objects_enabled, KERN_ERR
> + "ODEBUG: %s %s (active state %u) object: %p object type: %s hint: %pS\n",
> + msg, obj_states[obj->state], obj->astate,
> + obj->object, descr->name, hint);
> }

Q: Why is this related to the WARN() itself?
A: It's not related at all

The obvious fix is:

--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -498,6 +498,14 @@ static void debug_print_object(struct de
const struct debug_obj_descr *descr = obj->descr;
static int limit;

+ /*
+ * OOM handling is asynchronous for performance reasons. So the
+ * call site might have raced with a concurrent OOM which cleared
+ * the hash buckets.
+ */
+ if (!debug_objects_enabled)
+ return;
+
if (limit < 5 && descr != descr_test) {
void *hint = descr->debug_hint ?
descr->debug_hint(obj->object) : NULL;

Along with a understandable changelog, no?

Thanks,

tglx