Re: [RFC KVM 18/27] kvm/isolation: function to copy page table entries for percpu buffer

From: Andy Lutomirski
Date: Tue May 14 2019 - 04:35:45 EST




> On May 14, 2019, at 1:25 AM, Alexandre Chartre <alexandre.chartre@xxxxxxxxxx> wrote:
>
>
>> On 5/14/19 9:09 AM, Peter Zijlstra wrote:
>>> On Mon, May 13, 2019 at 11:18:41AM -0700, Andy Lutomirski wrote:
>>> On Mon, May 13, 2019 at 7:39 AM Alexandre Chartre
>>> <alexandre.chartre@xxxxxxxxxx> wrote:
>>>>
>>>> pcpu_base_addr is already mapped to the KVM address space, but this
>>>> represents the first percpu chunk. To access a per-cpu buffer not
>>>> allocated in the first chunk, add a function which maps all cpu
>>>> buffers corresponding to that per-cpu buffer.
>>>>
>>>> Also add function to clear page table entries for a percpu buffer.
>>>>
>>>
>>> This needs some kind of clarification so that readers can tell whether
>>> you're trying to map all percpu memory or just map a specific
>>> variable. In either case, you're making a dubious assumption that
>>> percpu memory contains no secrets.
>> I'm thinking the per-cpu random pool is a secrit. IOW, it demonstrably
>> does contain secrits, invalidating that premise.
>
> The current code unconditionally maps the entire first percpu chunk
> (pcpu_base_addr). So it assumes it doesn't contain any secret. That is
> mainly a simplification for the POC because a lot of core information
> that we need, for example just to switch mm, are stored there (like
> cpu_tlbstate, current_task...).

I donât think you should need any of this.

>
> If the entire first percpu chunk effectively has secret then we will
> need to individually map only buffers we need. The kvm_copy_percpu_mapping()
> function is added to copy mapping for a specified percpu buffer, so
> this used to map percpu buffers which are not in the first percpu chunk.
>
> Also note that mapping is constrained by PTE (4K), so mapped buffers
> (percpu or not) which do not fill a whole set of pages can leak adjacent
> data store on the same pages.
>
>

I would take a different approach: figure out what you need and put it in its own dedicated area, kind of like cpu_entry_area.

One nasty issue youâll have is vmalloc: the kernel stack is in the vmap range, and, if you allow access to vmap memory at all, youâll need some way to ensure that *unmap* gets propagated. I suspect the right choice is to see if you can avoid using the kernel stack at all in isolated mode. Maybe you could run on the IRQ stack instead.