Re: [PATCH v2 3/5] tracing/user_events: Add auto cleanup and a flag to persist events

From: Alexei Starovoitov
Date: Tue Jun 06 2023 - 21:27:14 EST


On Mon, Jun 5, 2023 at 4:39 PM Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> wrote:
> + /*
> + * When the event is not enabled for auto-delete there will always
> + * be at least 1 reference to the event. During the event creation
> + * we initially set the refcnt to 2 to achieve this. In those cases
> + * the caller must acquire event_mutex and after decrement check if
> + * the refcnt is 1, meaning this is the last reference. When auto
> + * delete is enabled, there will only be 1 ref, IE: refcnt will be
> + * only set to 1 during creation to allow the below checks to go
> + * through upon the last put. The last put must always be done with
> + * the event mutex held.
> + */
> + if (!locked) {
> + lockdep_assert_not_held(&event_mutex);
> + delete = refcount_dec_and_mutex_lock(&user->refcnt, &event_mutex);
> + } else {
> + lockdep_assert_held(&event_mutex);
> + delete = refcount_dec_and_test(&user->refcnt);
> + }
> +
> + if (!delete)
> + return;
> +
> + /* We now have the event_mutex in all cases */
> +
> + if (user->reg_flags & USER_EVENT_REG_PERSIST) {
> + /* We should not get here when persist flag is set */
> + pr_alert("BUG: Auto-delete engaged on persistent event\n");
> + goto out;
> + }
> +
> + /*
> + * Unfortunately we have to attempt the actual destroy in a work
> + * queue. This is because not all cases handle a trace_event_call
> + * being removed within the class->reg() operation for unregister.
> + */
> + INIT_WORK(&user->put_work, delayed_destroy_user_event);
> +
> + /*
> + * Since the event is still in the hashtable, we have to re-inc
> + * the ref count to 1. This count will be decremented and checked
> + * in the work queue to ensure it's still the last ref. This is
> + * needed because a user-process could register the same event in
> + * between the time of event_mutex release and the work queue
> + * running the delayed destroy. If we removed the item now from
> + * the hashtable, this would result in a timing window where a
> + * user process would fail a register because the trace_event_call
> + * register would fail in the tracing layers.
> + */
> + refcount_set(&user->refcnt, 1);

The recnt-ing scheme is quite unorthodox.
Atomically decrementing it to zero and then immediately set it back to 1?
Smells racy.
Another process can go through the same code and do another dec and set to 1
and we'll have two work queued?
Will mutex_lock help somehow? If yes, then why atomic refcnt?