Re: Bug with paravirt ops and livepatches

From: Jessica Yu
Date: Tue Apr 05 2016 - 15:19:17 EST


+++ Miroslav Benes [05/04/16 15:07 +0200]:
On Mon, 4 Apr 2016, Josh Poimboeuf wrote:

So I think this doesn't fix the problem. Dynamic relocations are
applied to the "patch module", whereas the above code deals with the
initialization order of the "patched module". This distinction
originally confused me as well, until Jessica set me straight.

Let me try to illustrate the problem with an example. Imagine you have
a patch module P which applies a patch to module M. P replaces M's
function F with a new function F', which uses paravirt ops.

1) Patch P is loaded before module M. P's new function F' has an
instruction which is patched by apply_paravirt(), even though the
patch hasn't been applied yet.

2) Module M is loaded. Before applying the patch, livepatch tries to
apply a klp_reloc to the instruction in F' which was already patched
by apply_paravirt() in step 1. This results in undefined behavior
because it tries to patch the original instruction but instead
patches the new paravirt instruction.

So the above patch makes no difference because the paravirt module
loading order doesn't really matter.

Hi,

we are trying really hard to understand the actual culprit here and as it
is quite confusing I have several questions/comments...

I don't have a 100% clear understanding of the whole picture either,
but I'll try to help clarify up some things..

1. can you provide dynrela sections of the patch module from
https://github.com/dynup/kpatch/issues/580? What is interesting is that
kvm_arch_vm_ioctl() function contains rela records only for trivial (==
exported) symbols from the first look. The problem should be there only if
you want to patch a function which reference some paravirt_ops unexported
symbol. For that symbol dynrela should be created.

Just to dispel some confusion over this, kpatch isn't "smart" enough
yet to differentiate between exported and non-exported symbols, as
Evgenii already mentioned. Just global and local, and whether the
symbol belongs to a module or vmlinux. So that means dynrelas are
indeed being created for the pv_*_ops symbols, despite the fact they
are exported.

So this is part of the problem, since apply_paravirt() runs first (as
part of module_finalize()), and dynrelas are written second,
livepatch is clobbering/overwriting those paravirt patch sites that
apply_paravirt had fixed up already.

If you skip generating dynrelas that affect those paravirt patch
sites, I can verify that the crash disappears, since in that case
we're not stepping over those paravirt patch sites with our dynrelas
(just to test and verify the problem, I kind of cheated and forced
kpatch to not generate dynrelas for pv_*_ops symbols, and the panics
disappear). Another temporary solution was to not apply the dynrela
if the target memory is not all zero's. These approaches aren't
reliable enough to serve as a permanent solution but they do give us
a better idea of what's happening. Here are some more relevant
comments -
https://github.com/dynup/kpatch/issues/580#issuecomment-199314636
https://github.com/dynup/kpatch/issues/580#issuecomment-202395452

Jessica