Re: [PATCH v10 1/5] lib: objpool added: ring-array based lockless MPMC

From: wuqiang.matt
Date: Mon Oct 16 2023 - 11:05:04 EST


On 2023/10/16 20:18, Masami Hiramatsu (Google) wrote:
Hi Wuqiang,

On Mon, 16 Oct 2023 10:45:30 +0800
"wuqiang.matt" <wuqiang.matt@xxxxxxxxxxxxx> wrote:

On 2023/10/16 07:26, Masami Hiramatsu (Google) wrote:
On Mon, 16 Oct 2023 00:06:11 +0800
"wuqiang.matt" <wuqiang.matt@xxxxxxxxxxxxx> wrote:

On 2023/10/15 23:43, Masami Hiramatsu (Google) wrote:
On Sun, 15 Oct 2023 13:32:47 +0800
"wuqiang.matt" <wuqiang.matt@xxxxxxxxxxxxx> wrote:

objpool is a scalable implementation of high performance queue for
object allocation and reclamation, such as kretprobe instances.

With leveraging percpu ring-array to mitigate hot spots of memory
contention, it delivers near-linear scalability for high parallel
scenarios. The objpool is best suited for the following cases:
1) Memory allocation or reclamation are prohibited or too expensive
2) Consumers are of different priorities, such as irqs and threads

Limitations:
1) Maximum objects (capacity) is fixed after objpool creation
2) All pre-allocated objects are managed in percpu ring array,
which consumes more memory than linked lists


Thanks for updating! This looks good to me except 2 points.

[...]
+
+/* initialize object pool and pre-allocate objects */
+int objpool_init(struct objpool_head *pool, int nr_objs, int object_size,
+ gfp_t gfp, void *context, objpool_init_obj_cb objinit,
+ objpool_fini_cb release)
+{
+ int rc, capacity, slot_size;
+
+ /* check input parameters */
+ if (nr_objs <= 0 || nr_objs > OBJPOOL_NR_OBJECT_MAX ||
+ object_size <= 0 || object_size > OBJPOOL_OBJECT_SIZE_MAX)
+ return -EINVAL;
+
+ /* align up to unsigned long size */
+ object_size = ALIGN(object_size, sizeof(long));
+
+ /* calculate capacity of percpu objpool_slot */
+ capacity = roundup_pow_of_two(nr_objs);

This must be 'roundup_pow_of_two(nr_objs + 1)' because if nr_objs is power
of 2 and all objects are pushed on the same slot, tail == head. This
means empty and full is the same.

That won't happen. Would tail and head wrap only when >= 2^32. When all
objects are pushed to the same slot, tail will be (head + capacity).

Ah, indeed. OK.



+ if (!capacity)
+ return -EINVAL;
+
+ /* initialize objpool pool */
+ memset(pool, 0, sizeof(struct objpool_head));
+ pool->nr_cpus = nr_cpu_ids;
+ pool->obj_size = object_size;
+ pool->capacity = capacity;
+ pool->gfp = gfp & ~__GFP_ZERO;
+ pool->context = context;
+ pool->release = release;
+ slot_size = pool->nr_cpus * sizeof(struct objpool_slot);
+ pool->cpu_slots = kzalloc(slot_size, pool->gfp);
+ if (!pool->cpu_slots)
+ return -ENOMEM;
+
+ /* initialize per-cpu slots */
+ rc = objpool_init_percpu_slots(pool, nr_objs, context, objinit);
+ if (rc)
+ objpool_fini_percpu_slots(pool);
+ else
+ refcount_set(&pool->ref, pool->nr_objs + 1);
+
+ return rc;
+}
+EXPORT_SYMBOL_GPL(objpool_init);
+

[...]

+
+/* drop unused objects and defref objpool for releasing */
+void objpool_fini(struct objpool_head *pool)
+{
+ void *obj;
+
+ do {
+ /* grab object from objpool and drop it */
+ obj = objpool_pop(pool);
+
+ /*
+ * drop reference of objpool anyway even if
+ * the obj is NULL, since one extra ref upon
+ * objpool was already grabbed during pool
+ * initialization in objpool_init()
+ */
+ if (refcount_dec_and_test(&pool->ref))
+ objpool_free(pool);

Nit: you can call objpool_drop() instead of repeating the same thing here.

objpool_drop won't deref objpool if given obj is NULL. But here we need
drop objpool anyway even if obj is NULL.

I guess you decrement for the 'objpool' itself if obj=NULL, but I think
it is a bit hacky (so you added the comment).
e.g. rethook is doing something like below.

---
/* extra count for this pool itself */
count = 1;
/* make the pool empty */
while (objpool_pop(pool))
count++;

if (refcount_sub_and_test(count, &pool->ref))
objpool_free(pool);
---

Right, that's reasonable. Better one single atomic operation than multiple.

I found another comment issue about a small window which this may not work.
This is not a real issue for this series because this doesn't happen on
rethook/kretprobe, but if you apply this to other use-case, it must be
cared.

Since we use reserve-commit on 'push' operation, this 'pop' loop will miss
an object which is under 'push' op. I mean

CPU0 CPU1

objpool_fini() {
do {
objpool_push() {
update slot->tail; // reserve
obj = objpool_pop();
update slot->last; // commit
} while (obj);

In this case, the refcount can not be 0 and we can not release objpool.
To avoid this, we make sure all ongoing 'push()' must be finished.

Actually in the rethook/kretprobe, it already sync the rcu so this doesn't
happen. So you should document it the user must use RCU sync after stop
using the objpool, then call objpool_fini().

E.g.

start_using() {
objpool_init();
active = true;
}

obj_alloc() {
rcu_read_lock();
if (active)
obj = objpool_pop();
else
obj = NULL;
rcu_read_unlock();
}

/* use obj for something, it is OK to change the context */

obj_return() {
rcu_read_lock();
if (active)
objpool_push(obj);
else
objpool_drop(obj);
rcu_read_unlock();
}

/* kretprobe style */
stop_using() {
active = false;
synchronize_rcu();
objpool_fini();
}

/* rethook style */
stop_using() {
active = false;
call_rcu(objpool_fini);
}

Hmm, yeah, if we can add this 'active' flag to objpool, it is good. But
since kretprobe has different design of the interface, it is hard.
Anyway, can you add a comment that user must ensure that any 'push' including
ongoing one does not happen while 'fini'? objpool does not care that so user
must take care of that. For example using rcu_read_lock() for the 'push/pop'
operation and rcu-sync before 'fini' operation.

Sure, I'll refine the comments. I prefer that it's user's duty to make sure
there are no outstanding objpool_push on the fly when calling objpool_fini.
All usecases like kretprobe/rethook/test_objpool are using rcu to handle the
asynchronous releasing of objpool. For synchronous cases, user can just call
object_free to release the whole objpool, which is also acceptable.

Thanks,



Thank you,

+ } while (obj);
+}
+EXPORT_SYMBOL_GPL(objpool_fini);
--
2.40.1


Thanks for your time