Re: TLB flushes on fixmap changes

From: Andy Lutomirski
Date: Sun Aug 26 2018 - 00:21:50 EST


On Sat, Aug 25, 2018 at 7:23 PM, Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
> On Fri, 24 Aug 2018 21:23:26 -0700
> Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>> Couldn't text_poke() use kmap_atomic()? Or, even better, just change CR3?
>
> No, since kmap_atomic() is only for x86_32 and highmem support kernel.
> In x86-64, it seems that returns just a page address. That is not
> good for text_poke, since it needs to make a writable alias for RO
> code page. Hmm, maybe, can we mimic copy_oldmem_page(), it uses ioremap_cache?
>

I just re-read text_poke(). It's, um, horrible. Not only is the
implementation overcomplicated and probably buggy, but it's SLOOOOOW.
It's totally the wrong API -- poking one instruction at a time
basically can't be efficient on x86. The API should either poke lots
of instructions at once or should be text_poke_begin(); ...;
text_poke_end();.

Anyway, the attached patch seems to boot. Linus, Kees, etc: is this
too scary of an approach? With the patch applied, text_poke() is a
fantastic exploit target. On the other hand, even without the patch
applied, text_poke() is every bit as juicy.

--Andy
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 014f214da581..811c8735b129 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -690,40 +690,15 @@ void *__init_or_module text_poke_early(void *addr, const void *opcode,
void *text_poke(void *addr, const void *opcode, size_t len)
{
unsigned long flags;
- char *vaddr;
- struct page *pages[2];
- int i;
-
- /*
- * While boot memory allocator is runnig we cannot use struct
- * pages as they are not yet initialized.
- */
- BUG_ON(!after_bootmem);
+ unsigned long old_cr0;

- if (!core_kernel_text((unsigned long)addr)) {
- pages[0] = vmalloc_to_page(addr);
- pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
- } else {
- pages[0] = virt_to_page(addr);
- WARN_ON(!PageReserved(pages[0]));
- pages[1] = virt_to_page(addr + PAGE_SIZE);
- }
- BUG_ON(!pages[0]);
local_irq_save(flags);
- set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
- if (pages[1])
- set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
- vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
- memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
- clear_fixmap(FIX_TEXT_POKE0);
- if (pages[1])
- clear_fixmap(FIX_TEXT_POKE1);
- local_flush_tlb();
- sync_core();
- /* Could also do a CLFLUSH here to speed up CPU recovery; but
- that causes hangs on some VIA CPUs. */
- for (i = 0; i < len; i++)
- BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
+ old_cr0 = read_cr0();
+ write_cr0(old_cr0 & ~X86_CR0_WP);
+
+ memcpy(addr, opcode, len);
+
+ write_cr0(old_cr0); /* also serializes */
local_irq_restore(flags);
return addr;
}