Re: [PATCH RFC v9 09/51] x86/sev: Add RMP entry lookup helpers

From: Dave Hansen
Date: Fri Jun 30 2023 - 18:30:20 EST


On 6/30/23 14:57, Michael Roth wrote:
> On Mon, Jun 12, 2023 at 09:08:58AM -0700, Dave Hansen wrote:
>> On 6/11/23 21:25, Michael Roth wrote:
>>> +/*
>>> + * The RMP entry format is not architectural. The format is defined in PPR
>>> + * Family 19h Model 01h, Rev B1 processor.
>>> + */
>>> +struct rmpentry {
>>> + union {
>>> + struct {
>>> + u64 assigned : 1,
>>> + pagesize : 1,
>>> + immutable : 1,
>>> + rsvd1 : 9,
>>> + gpa : 39,
>>> + asid : 10,
>>> + vmsa : 1,
>>> + validated : 1,
>>> + rsvd2 : 1;
>>> + } info;
>>> + u64 low;
>>> + };
>>> + u64 high;
>>> +} __packed;
>>
>> What's 'high' used for? The PPR says it's reserved. Why not call it
>> reserved?
>>
>> It _looks_ like it's only used for a debugging pr_info(). It makes the
>> struct look kinda goofy. I'd much rather limit the goofiness to the
>> "dumping" code, like:
>>
>> u64 *__e = (void *)e;
>> ....
>> pr_info("RMPEntry paddr 0x%llx: [high=0x%016llx low=0x%016llx]\n",
>> pfn << PAGE_SHIFT, __e[0], __e[1]);
>>
>> BTW, why does it do any good to dump all these reserved fields?
>>
>
> The reserved bits sometimes contain information that can be useful to
> pass along to folks on the firmware side, so would definitely be helpful
> to provide the full raw contents of the RMP entry.

Ahh, OK. Could you include a comment to that effect, please?

> So maybe something like this better captures the intended usage:
>
> struct rmpentry {
> union {
> struct {
> u64 assigned : 1,
> pagesize : 1,
> immutable : 1,
> rsvd1 : 9,
> gpa : 39,
> asid : 10,
> vmsa : 1,
> validated : 1,
> rsvd2 : 1;
> u64 rsvd3;
> } info;
> u64 data[2];
> };
> } __packed;
>
> But dropping the union and casting to u64[] locally in the debug/dumping
> routine should work fine as well.

Yeah, I'd suggest doing the nasty casting in the debug function. That
makes it much more clear what the hardware is doing with the entries.
The hardware doesn't treat the struct as 2*u64's at all.

...
>>> + ret = rmptable_entry(paddr, entry);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + /* Read a large RMP entry to get the correct page level used in RMP entry. */
>>> + ret = rmptable_entry(paddr & PMD_MASK, &large_entry);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + *level = RMP_TO_X86_PG_LEVEL(rmpentry_pagesize(&large_entry));
>>> +
>>> + return 0;
>>> +}
>>
>> This is a bit weird. Should it say something like this?
>>
>> To do an 4k RMP lookup the hardware looks at two places in the RMP:
>
> I'd word this as:
>
> "To query all the relevant bit of an 4k RMP entry, the kernel must access
> 2 entries in the RMP table:"
>
> Because it's possible hardware only looks at the 2M entry for
> hardware-based lookups, depending on where the access is coming from, or
> how the memory at the PFN range is mapped.
>
> But otherwise it seems like an accurate description.

The wording you suggest is a bit imprecise. For a 2M-aligned 4k page,
there is only *one* location, *one* entry.

Also, we're not doing a lookup for an RMP entry. We're doing it for a
_pfn_ that results in an RMP entry.

How about this:

/*
* Find the authoritative RMP entry for a PFN. This can be either a 4k
* RMP entry or a special large RMP entry that is authoritative for a
* whole 2M area.
*/
...
>>> +#ifdef CONFIG_KVM_AMD_SEV
>>> +int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
>>> +#else
>>> +static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return 0; }
>>> +#endif
>>
>> Above, -ENXIO was returned when SEV-SNP was not supported. Here, 0 is
>> returned when it is compiled out. That inconsistent.
>>
>> Is snp_lookup_rmpentry() acceptable when SEV-SNP is in play? I'd like
>> to see consistency between when it is compiled out and when it is
>> compiled in but unsupported on the CPU.
>
> I really don't think anything in the kernel should be calling
> snp_lookup_rmpentry(), so I think it makes sense to adoption the -ENXIO
> convention here and in any other stubs where that applies.

Sounds good to me. Just please make them consistent.