Re: [RFC PATCH] exec: Avoid recursive modprobe for binary format handlers

From: Matt Redfearn
Date: Mon Aug 07 2017 - 06:26:17 EST


Hi Luis,


On 03/08/17 00:23, Luis R. Rodriguez wrote:
On Tue, Aug 01, 2017 at 07:28:20PM -0700, Kees Cook wrote:
On Tue, Aug 1, 2017 at 5:12 PM, Luis R. Rodriguez <mcgrof@xxxxxxxxxx> wrote:
On Fri, Jul 21, 2017 at 03:05:20PM +0100, Matt Redfearn wrote:
Commit 6d7964a722af ("kmod: throttle kmod thread limit") which was
merged in v4.13-rc1 broke this behaviour since the recursive modprobe is
no longer caught, it just ends up waiting indefinitely for the kmod_wq
wait queue. Hence the kernel appears to hang silently when starting
userspace.
Indeed, the recursive issue were no longer expected to exist.
Errr, yeah, recursive binfmt loads can still happen.

The *old* implementation would also prevent a set of binaries to daisy chain
a set of 50 different binaries which require different binfmt loaders. The
current implementation enables this and we'd just wait. There's a bound to
the number of binfmd loaders though, so this would be bounded. If however
a 2nd loader loaded the first binary we'd run into the same issue I think.

If we can't think of a good way to resolve this we'll just have to revert
6d7964a722af for now.
The weird but "normal" recursive case is usually a script calling a
script calling a misc format. Getting a chain of modprobes running,
though, seems unlikely. I *think* Matt's patch is okay, but I agree,
it'd be better for the request_module() to fail.
In that case how about we just have each waiter only wait max X seconds,
if the number of concurrent ongoing modprobe calls hasn't reduced by
a single digit in X seconds we give up on request_module() for the
module and clearly indicate what happened.

Matt, can you test?

Sure - I've tested patch this on Cavium Octeon under the same conditions as before (64 bit kernel with 32bit userspace & no binfmt handler builtin).

The failing modprobe is now caught and rather than silence we get the expected kernel panic, albeit all the warnings look quite noisy.

VFS: Mounted root (ext3 filesystem) readonly on device 8:5.
devtmpfs: mounted
Freeing unused kernel memory: 372K
This architecture does not have kernel memory protection.
request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50), for module binfmt-4c46, throttling...
request_module: modprobe binfmt-4c46 cannot be processed, kmod busy with 0 threads for more than 5 seconds now
request_module: recursive modprobe call very likely!
Starting init: /sbin/init exists but couldn't execute it (error -8)
request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50), for module binfmt-4c46, throttling...
request_module: modprobe binfmt-4c46 cannot be processed, kmod busy with 0 threads for more than 5 seconds now
request_module: recursive modprobe call very likely!
Starting init: /bin/sh exists but couldn't execute it (error -8)
Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.
---[ end Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.


In any case, this is better than the 4.13-rc1 behavior, so

Tested-by: Matt Redfearn <matt.redfearn@xxxxxxxxxx>

Thanks,
Matt


Note I've used wait_event_killable_timeout() to only accept SIGKILL
for now. I've seen issues wit SIGCHILD and at modprobe this could
even be a bigger issue, so this would restrict the signals received
*only* to SIGKILL.

It would be good to come up with a simple test case for this in
tools/testing/selftests/kmod/kmod.sh

Luis

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 5b74e36c0ca8..dc19880c02f5 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -757,6 +757,43 @@ extern int do_wait_intr_irq(wait_queue_head_t *, wait_queue_entry_t *);
__ret; \
})
+#define __wait_event_killable_timeout(wq_head, condition, timeout) \
+ ___wait_event(wq_head, ___wait_cond_timeout(condition), \
+ TASK_KILLABLE, 0, timeout, \
+ __ret = schedule_timeout(__ret))
+
+/**
+ * wait_event_killable_timeout - sleep until a condition gets true or a timeout elapses
+ * @wq_head: the waitqueue to wait on
+ * @condition: a C expression for the event to wait for
+ * @timeout: timeout, in jiffies
+ *
+ * The process is put to sleep (TASK_KILLABLE) until the
+ * @condition evaluates to true or a kill signal is received.
+ * The @condition is checked each time the waitqueue @wq_head is woken up.
+ *
+ * wake_up() has to be called after changing any variable that could
+ * change the result of the wait condition.
+ *
+ * Returns:
+ * 0 if the @condition evaluated to %false after the @timeout elapsed,
+ * 1 if the @condition evaluated to %true after the @timeout elapsed,
+ * the remaining jiffies (at least 1) if the @condition evaluated
+ * to %true before the @timeout elapsed, or -%ERESTARTSYS if it was
+ * interrupted by a kill signal.
+ *
+ * Only kill signals interrupt this process.
+ */
+#define wait_event_killable_timeout(wq_head, condition, timeout) \
+({ \
+ long __ret = timeout; \
+ might_sleep(); \
+ if (!___wait_cond_timeout(condition)) \
+ __ret = __wait_event_killable_timeout(wq_head, \
+ condition, timeout); \
+ __ret; \
+})
+
#define __wait_event_lock_irq(wq_head, condition, lock, cmd) \
(void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, 0, 0, \
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 6d016c5d97c8..1b5f7bada8d2 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -71,6 +71,13 @@ static atomic_t kmod_concurrent_max = ATOMIC_INIT(MAX_KMOD_CONCURRENT);
static DECLARE_WAIT_QUEUE_HEAD(kmod_wq);
/*
+ * If modprobe can't be called after this time we assume its very likely
+ * your userspace has created a recursive dependency, and we'll have no
+ * option but to fail.
+ */
+#define MAX_KMOD_TIMEOUT 5
+
+/*
modprobe_path is set via /proc/sys.
*/
char modprobe_path[KMOD_PATH_LEN] = "/sbin/modprobe";
@@ -167,8 +174,18 @@ int __request_module(bool wait, const char *fmt, ...)
pr_warn_ratelimited("request_module: kmod_concurrent_max (%u) close to 0 (max_modprobes: %u), for module %s, throttling...",
atomic_read(&kmod_concurrent_max),
MAX_KMOD_CONCURRENT, module_name);
- wait_event_interruptible(kmod_wq,
- atomic_dec_if_positive(&kmod_concurrent_max) >= 0);
+ ret = wait_event_killable_timeout(kmod_wq,
+ atomic_dec_if_positive(&kmod_concurrent_max) >= 0,
+ MAX_KMOD_TIMEOUT * HZ);
+ if (!ret) {
+ pr_warn_ratelimited("request_module: modprobe %s cannot be processed, kmod busy with %d threads for more than %d seconds now",
+ module_name, atomic_read(&kmod_concurrent_max), MAX_KMOD_TIMEOUT);
+ pr_warn_ratelimited("request_module: recursive modprobe call very likely!");
+ return -ETIME;
+ } else if (ret == -ERESTARTSYS) {
+ pr_warn_ratelimited("request_module: sigkill sent for modprobe %s, giving up", module_name);
+ return ret;
+ }
}
trace_module_request(module_name, wait, _RET_IP_);