Re: [PATCH bpf-next v4 3/3] bpf, arm64: use bpf_jit_binary_pack_alloc

From: Daniel Borkmann
Date: Wed Aug 16 2023 - 09:27:36 EST


On 8/15/23 4:09 PM, Puranjay Mohan wrote:
Hi Everyone,

[+cc Björn]

On Mon, Jun 26, 2023 at 10:58 AM Puranjay Mohan <puranjay12@xxxxxxxxx> wrote:

Use bpf_jit_binary_pack_alloc for memory management of JIT binaries in
ARM64 BPF JIT. The bpf_jit_binary_pack_alloc creates a pair of RW and RX
buffers. The JIT writes the program into the RW buffer. When the JIT is
done, the program is copied to the final RX buffer
with bpf_jit_binary_pack_finalize.

Implement bpf_arch_text_copy() and bpf_arch_text_invalidate() for ARM64
JIT as these functions are required by bpf_jit_binary_pack allocator.

Signed-off-by: Puranjay Mohan <puranjay12@xxxxxxxxx>
Acked-by: Song Liu <song@xxxxxxxxxx>

[...]

+int bpf_arch_text_invalidate(void *dst, size_t len)
+{
+ __le32 *ptr;
+ int ret;
+
+ for (ptr = dst; len >= sizeof(u32); len -= sizeof(u32)) {
+ ret = aarch64_insn_patch_text_nosync(ptr++, AARCH64_BREAK_FAULT);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+

While testing the same patch for riscv bpf jit, Björn found that
./test_tag is taking a lot of
time to complete. He found that bpf_arch_text_invalidate() is calling
patch_text_nosync() in RISCV
and aarch64_insn_patch_text_nosync() here in ARM64. Both of the
implementations call these functions
in a loop for each word. The problem is that every call to
patch_text_nosync()/aarch64_insn_patch_text_nosync()
would clean the cache. This will make this
function(bpf_arch_text_invalidate) really slow.

I did some testing using the vmtest.sh script on ARM64 with and
without the patches, here are the results:

With Prog Pack patches applied:
=========================

root@(none):/# time ./root/bpf/test_tag
test_tag: OK (40945 tests)

real 3m2.001s
user 0m1.644s
sys 2m40.132s

Without Prog Pack:
===============

root@(none):/# time ./root/bpf/test_tag
test_tag: OK (40945 tests)

real 0m26.809s
user 0m1.591s
sys 0m24.106s

As you can see the current implementation of
bpf_arch_text_invalidate() is really slow. I need to
implement a new function: aarch64_insn_set_text_nosync() and use it in
bpf_arch_text_invalidate()
rather than calling aarch64_insn_patch_text_nosync() in a loop.

Ok, thanks for looking into this. I'm going to toss the current version from
patchwork in that case and will wait for a v5 from your side.

In the longer run, it would be really helpful if we have a standard
text_poke API like x86 in every architecture.

Likely independent of this series, but in case you see potential to consolidate
generic pieces, it might be worth a look after the arm64/riscv bits landed.

Thanks for working on this, Puranjay!