RE: [PATCH 3/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed()

From: Dexuan Cui
Date: Sun Nov 27 2022 - 15:28:00 EST


> From: Kirill A. Shutemov <kirill@xxxxxxxxxxxxx>
> Sent: Wednesday, November 23, 2022 11:51 PM
> > [...]
> > > Will you also adjust direct mapping to have shared bit set?
> > >
> > > If not, we will have problems with load_unaligned_zeropad() when it will
> > > access shared pages via non-shared mapping.

It looks like this is also an issue to AMD SNP?

> > > If direct mapping is adjusted, we will get direct mapping fragmentation.
> > [...]
>
> __get_free_pages() and kmalloc() returns pointer to the page in the direct
> mapping. set_memory_decrypted() adjust direct mapping to have the shared
> bit set. Everything is fine.

You're correct. Now I understand the issue.

> > BTW, I'll drop tdx_enc_status_changed_for_vmalloc() and try to enhance the
> > existing tdx_enc_status() to support both direct mapping and vmalloc().

Looks like I should not drop tdx_enc_status_changed_for_vmalloc() and have to
detect if the addr is a vmalloc address, and if yes we'll have to adjust the direct
mapping?

> > > Maybe tap into swiotlb buffer using DMA API?
> >
> > I doubt the Hyper-V vNIC driver here can call dma_alloc_coherent() to
> > get a 16MB buffer from swiotlb buffers. I'm looking at dma_alloc_coherent()
> ->
> > dma_alloc_attrs() -> dma_direct_alloc(), which typically calls
> > __dma_direct_alloc_pages() to allocate congituous memory pages (which
> > can't exceed the 4MB limit. Note there is no virtual IOMMU in a guest on
> Hyper-V).
> >
> > It looks like we can't force dma_direct_alloc() to call
> dma_direct_alloc_no_mapping(),
> > because even if we set the DMA_ATTR_NO_KERNEL_MAPPING flag,
> > force_dma_unencrypted() is still always true for a TDX guest.
>
> The point is not in reaching dma_direct_alloc_no_mapping(). The idea is
> allocate from existing swiotlb that already has shared bit set in direct
> mapping.
>
> vmap area that maps pages allocated from swiotlb also should work fine.

My understanding is that swiotlb is mainly for buffer bouncing, and the
only non-static function in kernel/dma/swiotlb.c for allocating memory
is swiotlb_alloc(), which is defined only if CONFIG_DMA_RESTRICTED_POOL=y,
which is =n on x86-64 due to CONFIG_OF=n.

If we don't adjust the direct mapping, IMO we'll have to do:
1) force the kernel to not use load_unaligned_zeropad() for a coco VM?
Or
2) make swiotlb_alloc()/free() available to x86-64 and export them for drivers?
Or
3) implement and use a custom memory pool that's pre-allocated using
__get_free_pages() and set_memory_decrypted(), and use the pool in drivers???

BTW, ideas 1) and 3) are from Michael Kelley who discussed the issue with me.
Michael can share more details and thoughts.

I'm inclined to detect a vmalloc address and adjust the direct mapping:

1) Typically IMO drivers don't use a lot of shared buffers from vmalloc(), so
direct mapping fragmentation is not a severe issue.

2) When a driver does use shared buffers from vmalloc(), typically it only
allocates the buffers once, when the device is being initialized. For a Linux
coco VM on Hyper-V, only the hv_netvsc driver uses shared buffers from
vmalloc(). While we do need to support vNIC hot add/remove, in practice
AFAIK vNIC hot add/remove happens very infrequently.

Thoughts?