Re: [PATCH v4] kprobes,lib: kretprobe scalability improvement

From: wuqiang
Date: Thu Nov 03 2022 - 12:47:26 EST


On 2022/11/3 10:51, Masami Hiramatsu (Google) wrote:
Hi wuqiang (or Matt?),

Thanks for updating the patch! I have some comments below,

Thanks for your time :)

On Wed, 2 Nov 2022 10:30:12 +0800
wuqiang <wuqiang.matt@xxxxxxxxxxxxx> wrote:

...
Changes since V3:
1) build warning: unused variable in fprobe_init_rethook
Reported-by: kernel test robot <lkp@xxxxxxxxx>

Changes since V2:
1) the percpu-extended version of the freelist replaced by new percpu-
ring-array. freelist has data-contention in freelist_node (refs and
next) even after node is removed from freelist and the node could
be polluted easily (with freelist_node defined in union)
2) routines split to objpool.h and objpool.c according to cold & hot
pathes, and the latter moved to lib, as suggested by Masami
3) test module (test_objpool.ko) added to lib for functional testings

Changes since V1:
1) reformat to a single patch as Masami Hiramatsu suggested
2) use __vmalloc_node to replace vmalloc_node for vmalloc
3) a few minor fixes: typo and coding-style issues

Recording change log is very good. But if it becomes too long,
you can put a URL of the previous series on LKML instead of
appending the change logs.
You can get the URL (permlink) by "lkml.kernel.org/r/<your-message-id>"

Got it.


Signed-off-by: wuqiang <wuqiang.matt@xxxxxxxxxxxxx>
---
include/linux/freelist.h | 129 -----
include/linux/kprobes.h | 9 +-
include/linux/objpool.h | 151 ++++++
include/linux/rethook.h | 15 +-
kernel/kprobes.c | 95 ++--
kernel/trace/fprobe.c | 17 +-
kernel/trace/rethook.c | 80 +--
lib/Kconfig.debug | 11 +
lib/Makefile | 4 +-
lib/objpool.c | 480 ++++++++++++++++++
lib/test_objpool.c | 1031 ++++++++++++++++++++++++++++++++++++++
11 files changed, 1772 insertions(+), 250 deletions(-)

Hmm, this does too much things in 1 patch.
Can you split this in below 5 patches? This also makes clear that who
needs to review which part.

I was ever considering of splitting, but finally decided to mix them in a big one mostly because it's only for kretprobe improvement.

Next version I'll split to smaller patches. Thank you for the advice.


- Add generic scalable objpool
- Add objpool test
- Use objpool in kretprobe
- Use objpool in fprobe and rethook
- Remove unused freelist

delete mode 100644 include/linux/freelist.h
create mode 100644 include/linux/objpool.h
create mode 100644 lib/objpool.c
create mode 100644 lib/test_objpool.c

[...]
+
+struct objpool_slot {
+ uint32_t os_head; /* head of ring array */

If all fields have "os_" prefix, it is meaningless.

+ uint32_t os_tail; /* tail of ring array */
+ uint32_t os_size; /* max item slots, pow of 2 */
+ uint32_t os_mask; /* os_size - 1 */
+/*
+ * uint32_t os_ages[]; // ring epoch id
+ * void *os_ents[]; // objects array

"entries[]"

I'll refine the comments here to better explain the memory layout.


+ */
+};
+
+/* caller-specified object initial callback to setup each object, only called once */
+typedef int (*objpool_init_node_cb)(void *context, void *obj);
+
+/* caller-specified cleanup callback for private objects/pool/context */
+typedef int (*objpool_release_cb)(void *context, void *ptr, uint32_t flags);
+
+/* called for object releasing: ptr points to an object */
+#define OBJPOOL_FLAG_NODE (0x00000001)
+/* for user pool and context releasing, ptr could be NULL */
+#define OBJPOOL_FLAG_POOL (0x00001000)
+/* the object or pool to be released is user-managed */
+#define OBJPOOL_FLAG_USER (0x00008000)
+
+/*
+ * objpool_head: object pooling metadata
+ */
+
+struct objpool_head {
+ uint32_t oh_objsz; /* object & element size */
+ uint32_t oh_nobjs; /* total objs (pre-allocated) */
+ uint32_t oh_nents; /* max objects per cpuslot */
+ uint32_t oh_ncpus; /* num of possible cpus */

If all fields have "oh_" prefix, it is meaningless.
Also, if there is no reason to be 32bit (like align the structure size
for cache, or pack the structure for streaming etc.) use appropriate types.

And please do not align the length of field name unnaturally. e.g.

Kind of obsessive-compulsive symptom, haha :) The struct size of objpool_head doesn't matter. The size of objpool_slot does matter, as managed in a single cache-line.


size_t obj_size; /* size_t or unsigned int, I don't care. */
int nr_objs; /* I think just 'int' is enough because the value should be
checked and limited under INT_MAX */
int max_entries;
unsigned int nr_cpus;

(BTW why we need to limit the nr_cpus here? we have num_possible_cpus())

You are right that nr_cpus is unnecessary. num_possible_cpus() just keeps the same even with new hot-plugged cpus.


+ uint32_t oh_in_user:1; /* user-specified buffer */
+ uint32_t oh_in_slot:1; /* objs alloced with slots */
+ uint32_t oh_vmalloc:1; /* alloc from vmalloc zone */

Please use "bool" or "unsigned long flags" with flag bits.

+ gfp_t oh_gfp; /* k/vmalloc gfp flags */
+ uint32_t oh_sz_pool; /* user pool size in byes */

size_t pool_size

+ void *oh_pool; /* user managed memory pool */
+ struct objpool_slot **oh_slots; /* array of percpu slots */
+ uint32_t *oh_sz_slots; /* size in bytes of slots */

size_t slot_size


Will apply these changes in next version.

+ objpool_release_cb oh_release; /* resource cleanup callback */
+ void *oh_context; /* caller-provided context */
+};

Thank you,


Best regards,