Re: [PATCH 07/32] mm: Bring back vmalloc_exec

From: Andy Lutomirski
Date: Tue Jun 20 2023 - 18:33:13 EST




On Tue, Jun 20, 2023, at 1:42 PM, Andy Lutomirski wrote:
> Hi all-
>
> On Tue, Jun 20, 2023, at 11:48 AM, Dave Hansen wrote:
>>>> No, I'm saying your concerns are baseless and too vague to
>>>> address.
>>> If you don't address them, the NAK will stand forever, or at least
>>> until a different group of people take over x86 maintainership.
>>> That's fine with me.
>>
>> I've got a specific concern: I don't see vmalloc_exec() used in this
>> series anywhere. I also don't see any of the actual assembly that's
>> being generated, or the glue code that's calling into the generated
>> assembly.
>>
>> I grepped around a bit in your git trees, but I also couldn't find it in
>> there. Any chance you could help a guy out and point us to some of the
>> specifics of this new, tiny JIT?
>>
>
> So I had a nice discussion with Kent on IRC, and, for the benefit of
> everyone else reading along, I *think* the JITted code can be replaced
> by a table-driven approach like this:
>
> typedef unsigned int u32;
> typedef unsigned long u64;
>
> struct uncompressed
> {
> u32 a;
> u32 b;
> u64 c;
> u64 d;
> u64 e;
> u64 f;
> };
>
> struct bitblock
> {
> u64 source;
> u64 target;
> u64 mask;
> int shift;
> };
>
> // out needs to be zeroed first
> void unpack(struct uncompressed *out, const u64 *in, const struct
> bitblock *blocks, int nblocks)
> {
> u64 *out_as_words = (u64*)out;
> for (int i = 0; i < nblocks; i++) {
> const struct bitblock *b;
> out_as_words[b->target] |= (in[b->source] & b->mask) <<
> b->shift;
> }
> }
>
> void apply_offsets(struct uncompressed *out, const struct uncompressed *offsets)
> {
> out->a += offsets->a;
> out->b += offsets->b;
> out->c += offsets->c;
> out->d += offsets->d;
> out->e += offsets->e;
> out->f += offsets->f;
> }
>
> Which generates nice code: https://godbolt.org/z/3fEq37hf5

Thinking about this a bit more, I think the only real performance issue with my code is that it does 12 read-xor-write operations in memory, which all depend on each other in horrible ways.

If it's reversed so the stores are all in order, then this issue would go away.

typedef unsigned int u32;
typedef unsigned long u64;

struct uncompressed
{
u32 a;
u32 b;
u64 c;
u64 d;
u64 e;
u64 f;
};

struct field_piece {
int source;
int shift;
u64 mask;
};

struct field_pieces {
struct field_piece pieces[2];
u64 offset;
};

u64 unpack_one(const u64 *in, const struct field_pieces *pieces)
{
const struct field_piece *p = pieces->pieces;
return (((in[p[0].source] & p[0].mask) << p[0].shift) |
((in[p[1].source] & p[1].mask) << p[1].shift)) +
pieces->offset;
}

struct encoding {
struct field_pieces a, b, c, d, e, f;
};

void unpack(struct uncompressed *out, const u64 *in, const struct encoding *encoding)
{
out->a = unpack_one(in, &encoding->a);
out->b = unpack_one(in, &encoding->b);
out->c = unpack_one(in, &encoding->c);
out->d = unpack_one(in, &encoding->d);
out->e = unpack_one(in, &encoding->e);
out->f = unpack_one(in, &encoding->f);
}

https://godbolt.org/z/srsfcGK4j

Could be faster. Probably worth testing.