Re: Bug with paravirt ops and livepatches

From: Jessica Yu
Date: Mon Apr 04 2016 - 13:58:54 EST


+++ Josh Poimboeuf [04/04/16 11:14 -0500]:
On Fri, Apr 01, 2016 at 09:35:34PM +0200, Jiri Kosina wrote:
On Fri, 1 Apr 2016, Chris J Arges wrote:

> Loading, please wait...
> starting version 229
> [ 1.182869] random: udevadm urandom read with 2 bits of entropy available
> [ 1.241404] BUG: unable to handle kernel paging request at ffffffffc000f35f

Gah, we surely can't change pages with RO PTE. Thanks for such a prompt
testing. You do have CONFIG_DEBUG_SET_MODULE_RONX set, don't you?

The patch below should fix that by marking the module RO (and relevant
parts NX) only when it's guaranteed that .text is not going to be modified
any more (and includes the error handling fix Miroslav spotted as well).

Thanks.



diff --git a/kernel/module.c b/kernel/module.c
index 5f71aa6..430606d 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3211,7 +3211,7 @@ int __weak module_finalize(const Elf_Ehdr *hdr,
return 0;
}

-static int post_relocation(struct module *mod, const struct load_info *info)
+static void post_relocation(struct module *mod, const struct load_info *info)
{
/* Sort exception table now relocations are done. */
sort_extable(mod->extable, mod->extable + mod->num_exentries);
@@ -3222,9 +3222,6 @@ static int post_relocation(struct module *mod, const struct load_info *info)

/* Setup kallsyms-specific fields. */
add_kallsyms(mod, info);
-
- /* Arch-specific module finalizing. */
- return module_finalize(info->hdr, info->sechdrs, mod);
}

/* Is this module of this name done loading? No locks held. */
@@ -3441,10 +3438,6 @@ static int complete_formation(struct module *mod, struct load_info *info)
/* This relies on module_mutex for list integrity. */
module_bug_finalize(info->hdr, info->sechdrs, mod);

- /* Set RO and NX regions */
- module_enable_ro(mod);
- module_enable_nx(mod);
-
/* Mark state as coming so strong_try_module_get() ignores us,
* but kallsyms etc. can see us. */
mod->state = MODULE_STATE_COMING;
@@ -3562,9 +3555,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
if (err < 0)
goto free_modinfo;

- err = post_relocation(mod, info);
- if (err < 0)
- goto free_modinfo;
+ post_relocation(mod, info);

flush_module_icache(mod);

@@ -3589,6 +3580,15 @@ static int load_module(struct load_info *info, const char __user *uargs,
if (err)
goto bug_cleanup;

+ /* Arch-specific module finalizing. */
+ err = module_finalize(info->hdr, info->sechdrs, mod);
+ if (err)
+ goto coming_cleanup;
+
+ /* Set RO and NX regions */
+ module_enable_ro(mod);
+ module_enable_nx(mod);
+
/* Module is ready to execute: parsing args may do that. */
after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
-32768, 32767, mod,

So I think this doesn't fix the problem. Dynamic relocations are
applied to the "patch module", whereas the above code deals with the
initialization order of the "patched module". This distinction
originally confused me as well, until Jessica set me straight.

Let me try to illustrate the problem with an example. Imagine you have
a patch module P which applies a patch to module M. P replaces M's
function F with a new function F', which uses paravirt ops.

1) Patch P is loaded before module M. P's new function F' has an
instruction which is patched by apply_paravirt(), even though the
patch hasn't been applied yet.

2) Module M is loaded. Before applying the patch, livepatch tries to
apply a klp_reloc to the instruction in F' which was already patched
by apply_paravirt() in step 1. This results in undefined behavior
because it tries to patch the original instruction but instead
patches the new paravirt instruction.

So the above patch makes no difference because the paravirt module
loading order doesn't really matter.

Yup, exactly, the crux of the issue is the fact that we're hacking the
module load order and delying certain operations (e.g. applying
relocations) in order to allow patching of a not-yet-loaded module.

In this case, the problem can be briefly summarized as follows:
For any patch module, apply_paravirt/alternatives needs to come after
apply_relocate_add (recall that these operations affect the
replacement functions in the patch module). Right now we have it the
other way around: apply_paravirt() is patching our replacement
functions in module_finalize() before the to-be-patched module is
loaded and apply_relocate_add() can be called.
The hack fix I had involved delaying the apply_paravirt() call
(basically calling is_livepatch_module() in the x86 code, and skipping
the paravirt/alternative calls if true), and when the to-be-patched
module loads, having livepatch finish up those apply_{paravirt,alternative}
calls just before apply_relocate_add(). Unfortunately this involved
adding arch-specific code :-\, but I'm not sure there are really any
elegant solutions out there to this problem.

Jessica