Re: [PATCH 07/32] mm: Bring back vmalloc_exec

From: Andy Lutomirski
Date: Tue Jun 20 2023 - 16:43:12 EST


Hi all-

On Tue, Jun 20, 2023, at 11:48 AM, Dave Hansen wrote:
>>> No, I'm saying your concerns are baseless and too vague to
>>> address.
>> If you don't address them, the NAK will stand forever, or at least
>> until a different group of people take over x86 maintainership.
>> That's fine with me.
>
> I've got a specific concern: I don't see vmalloc_exec() used in this
> series anywhere. I also don't see any of the actual assembly that's
> being generated, or the glue code that's calling into the generated
> assembly.
>
> I grepped around a bit in your git trees, but I also couldn't find it in
> there. Any chance you could help a guy out and point us to some of the
> specifics of this new, tiny JIT?
>

So I had a nice discussion with Kent on IRC, and, for the benefit of everyone else reading along, I *think* the JITted code can be replaced by a table-driven approach like this:

typedef unsigned int u32;
typedef unsigned long u64;

struct uncompressed
{
u32 a;
u32 b;
u64 c;
u64 d;
u64 e;
u64 f;
};

struct bitblock
{
u64 source;
u64 target;
u64 mask;
int shift;
};

// out needs to be zeroed first
void unpack(struct uncompressed *out, const u64 *in, const struct bitblock *blocks, int nblocks)
{
u64 *out_as_words = (u64*)out;
for (int i = 0; i < nblocks; i++) {
const struct bitblock *b;
out_as_words[b->target] |= (in[b->source] & b->mask) << b->shift;
}
}

void apply_offsets(struct uncompressed *out, const struct uncompressed *offsets)
{
out->a += offsets->a;
out->b += offsets->b;
out->c += offsets->c;
out->d += offsets->d;
out->e += offsets->e;
out->f += offsets->f;
}

Which generates nice code: https://godbolt.org/z/3fEq37hf5

It would need spectre protection in two places, I think, because it's almost most certainly a great gadget if the attacker can speculatively control the 'blocks' table. This could be mitigated (I think) by hardcoding nblocks as 12 and by masking b->target.

In contrast, the JIT approach needs a retpoline on each call, which could be more expensive than my entire function :) I haven't benchmarked them lately.