[RFC PATCH 62/86] treewide: trace: remove cond_resched()

From: Ankur Arora
Date: Tue Nov 07 2023 - 18:10:11 EST


There are broadly three sets of uses of cond_resched():

1. Calls to cond_resched() out of the goodness of our heart,
otherwise known as avoiding lockup splats.

2. Open coded variants of cond_resched_lock() which call
cond_resched().

3. Retry or error handling loops, where cond_resched() is used as a
quick alternative to spinning in a tight-loop.

When running under a full preemption model, the cond_resched() reduces
to a NOP (not even a barrier) so removing it obviously cannot matter.

But considering only voluntary preemption models (for say code that
has been mostly tested under those), for set-1 and set-2 the
scheduler can now preempt kernel tasks running beyond their time
quanta anywhere they are preemptible() [1]. Which removes any need
for these explicitly placed scheduling points.

The cond_resched() calls in set-3 are a little more difficult.
To start with, given it's NOP character under full preemption, it
never actually saved us from a tight loop.
With voluntary preemption, it's not a NOP, but it might as well be --
for most workloads the scheduler does not have an interminable supply
of runnable tasks on the runqueue.

So, cond_resched() is useful to not get softlockup splats, but not
terribly good for error handling. Ideally, these should be replaced
with some kind of timed or event wait.
For now we use cond_resched_stall(), which tries to schedule if
possible, and executes a cpu_relax() if not.

All the cond_resched() calls here are from set-1. Remove them.

[1] https://lore.kernel.org/lkml/20231107215742.363031-1-ankur.a.arora@xxxxxxxxxx/

Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
Cc: Mark Rutland <mark.rutland@xxxxxxx>
Signed-off-by: Ankur Arora <ankur.a.arora@xxxxxxxxxx>
---
kernel/trace/ftrace.c | 4 ----
kernel/trace/ring_buffer.c | 4 ----
kernel/trace/ring_buffer_benchmark.c | 13 -------------
kernel/trace/trace.c | 11 -----------
kernel/trace/trace_events.c | 1 -
kernel/trace/trace_selftest.c | 9 ---------
6 files changed, 42 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 8de8bec5f366..096ebb608610 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2723,7 +2723,6 @@ void __weak ftrace_replace_code(int mod_flags)
struct dyn_ftrace *rec;
struct ftrace_page *pg;
bool enable = mod_flags & FTRACE_MODIFY_ENABLE_FL;
- int schedulable = mod_flags & FTRACE_MODIFY_MAY_SLEEP_FL;
int failed;

if (unlikely(ftrace_disabled))
@@ -2740,8 +2739,6 @@ void __weak ftrace_replace_code(int mod_flags)
/* Stop processing */
return;
}
- if (schedulable)
- cond_resched();
} while_for_each_ftrace_rec();
}

@@ -4363,7 +4360,6 @@ match_records(struct ftrace_hash *hash, char *func, int len, char *mod)
}
found = 1;
}
- cond_resched();
} while_for_each_ftrace_rec();
out_unlock:
mutex_unlock(&ftrace_lock);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 515cafdb18d9..5c5eb6a8c7db 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1996,8 +1996,6 @@ rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned long nr_pages)
tmp_iter_page = first_page;

do {
- cond_resched();
-
to_remove_page = tmp_iter_page;
rb_inc_page(&tmp_iter_page);

@@ -2206,8 +2204,6 @@ int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
err = -ENOMEM;
goto out_err;
}
-
- cond_resched();
}

cpus_read_lock();
diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index aef34673d79d..8d1c23d135cb 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -267,19 +267,6 @@ static void ring_buffer_producer(void)
if (consumer && !(cnt % wakeup_interval))
wake_up_process(consumer);

-#ifndef CONFIG_PREEMPTION
- /*
- * If we are a non preempt kernel, the 10 seconds run will
- * stop everything while it runs. Instead, we will call
- * cond_resched and also add any time that was lost by a
- * reschedule.
- *
- * Do a cond resched at the same frequency we would wake up
- * the reader.
- */
- if (cnt % wakeup_interval)
- cond_resched();
-#endif
} while (ktime_before(end_time, timeout) && !break_test());
trace_printk("End ring buffer hammer\n");

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 0776dba32c2d..1efb69423818 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2052,13 +2052,6 @@ static int do_run_tracer_selftest(struct tracer *type)
{
int ret;

- /*
- * Tests can take a long time, especially if they are run one after the
- * other, as does happen during bootup when all the tracers are
- * registered. This could cause the soft lockup watchdog to trigger.
- */
- cond_resched();
-
tracing_selftest_running = true;
ret = run_tracer_selftest(type);
tracing_selftest_running = false;
@@ -2083,10 +2076,6 @@ static __init int init_trace_selftests(void)

tracing_selftest_running = true;
list_for_each_entry_safe(p, n, &postponed_selftests, list) {
- /* This loop can take minutes when sanitizers are enabled, so
- * lets make sure we allow RCU processing.
- */
- cond_resched();
ret = run_tracer_selftest(p->type);
/* If the test fails, then warn and remove from available_tracers */
if (ret < 0) {
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index f49d6ddb6342..91951d038ba4 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2770,7 +2770,6 @@ void trace_event_eval_update(struct trace_eval_map **map, int len)
update_event_fields(call, map[i]);
}
}
- cond_resched();
}
up_write(&trace_event_sem);
}
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 529590499b1f..07cfad8ce16f 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -848,11 +848,6 @@ trace_selftest_startup_function_graph(struct tracer *trace,
}

#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
- /*
- * These tests can take some time to run. Make sure on non PREEMPT
- * kernels, we do not trigger the softlockup detector.
- */
- cond_resched();

tracing_reset_online_cpus(&tr->array_buffer);
set_graph_array(tr);
@@ -875,8 +870,6 @@ trace_selftest_startup_function_graph(struct tracer *trace,
if (ret)
goto out;

- cond_resched();
-
ret = register_ftrace_graph(&fgraph_ops);
if (ret) {
warn_failed_init_tracer(trace, ret);
@@ -899,8 +892,6 @@ trace_selftest_startup_function_graph(struct tracer *trace,
if (ret)
goto out;

- cond_resched();
-
tracing_start();

if (!ret && !count) {
--
2.31.1