Re: [PATCH] New operation for kref to help avoid locks

From: Sergey Vlasov
Date: Sat Feb 26 2005 - 15:24:17 EST


On Sat, 26 Feb 2005 09:55:41 -0600 Corey Minyard wrote:

> Greg,
>
> This is the patch for krefs that we talked about. If you don't like it
> but like the docs, feel free just to take the documentation and cut out
> the stuff at the end about the new operation.

See below for comments to the patch.

>
> Thanks,
>
> -Corey
>
>
> [kref_checked.diff text/plain (10576 bytes)]
> Add a routine to kref that allows the kref_put() routine to be
> unserialized even when the get routine attempts to kref_get()
> an object without first holding a valid reference to it. This is
> useful in situations where this happens multiple times without
> freeing the object, as it will avoid having to do a lock/semaphore
> except on the final kref_put().
>
> This also adds some kref documentation to the Documentation
> directory.
>
> Signed-off-by: Corey Minyard <minyard@xxxxxxx>
>
> Index: linux-2.6.11-rc4/include/linux/kref.h
> ===================================================================
> --- linux-2.6.11-rc4.orig/include/linux/kref.h
> +++ linux-2.6.11-rc4/include/linux/kref.h
> @@ -28,5 +28,11 @@
> void kref_get(struct kref *kref);
> void kref_put(struct kref *kref, void (*release) (struct kref *kref));
>
> +/* Get a kref if the count is not already zero. This can be used to
> + avoid a lock in the routine that calls kref_put(). Returns 1 if
> + successful or zero if the count was already zero. See
> + Documentation/kref.txt for details on how to use this. */
> +int kref_get_with_check(struct kref *kref);
> +
> #endif /* __KERNEL__ */
> #endif /* _KREF_H_ */
> Index: linux-2.6.11-rc4/lib/kref.c
> ===================================================================
> --- linux-2.6.11-rc4.orig/lib/kref.c
> +++ linux-2.6.11-rc4/lib/kref.c
> @@ -52,6 +52,21 @@
> release(kref);
> }
>
> +/**
> + * kref_get_with_check - increment refcount if the refcount is not already 0.
> + * @kref: object.
> + */
> +int kref_get_with_check(struct kref *kref)
> +{
> + atomic_inc(&kref->refcount);
> + if (atomic_read(&kref->refcount) == 1) {
> + atomic_dec(&kref->refcount);
> + return 0;
> + }
> + return 1;
> +}

Consider this situation:

initially refcount is 1

CPU1 does kref_get_with_check() CPU2 does kref_put()

atomic_inc(&kref->refcount)
(increments refcount to 2)

atomic_dec_and_test(&kref->refcount)
(decrements refcount to 1)

(kref_put() does nothing more
because the refcount did not
reach zero)

atomic_read(&kref->refcount)
(finds refcount == 1 and decides
that the entry was being deleted)

atomic_dec(&kref->refcount);
(decrements refcount to 0)

(then proceeds thinking that the
entry was about to be finalized,
thus creating a leak)

Because of this, separate atomic_inc+atomic_read sequence is unsafe -
you need atomic_inc_return() here instead:

int kref_get_with_check(struct kref *kref)
{
if (atomic_inc_return(&kref->refcount) == 1) {
atomic_dec(&kref->refcount);
return 0;
}
return 1;
}

> +
> EXPORT_SYMBOL(kref_init);
> EXPORT_SYMBOL(kref_get);
> EXPORT_SYMBOL(kref_put);
> +EXPORT_SYMBOL(kref_get_with_check);
> Index: linux-2.6.11-rc4/Documentation/kref.txt
> ===================================================================
> --- /dev/null
> +++ linux-2.6.11-rc4/Documentation/kref.txt
> @@ -0,0 +1,274 @@
> +krefs allow you to add reference counters to your objects. If you
> +have objects that are used in multiple places and passed around, and
> +you don't have refcounts, your code is almost certainly broken. If
> +you want refcounts, krefs are the way to go.
> +
> +To use a kref, add a one to your data structures like:
> +
> +struct my_data
> +{
> + .
> + .
> + struct kref refcount;
> + .
> + .
> +};
> +
> +The kref can occur anywhere within the data structure.
> +
> +You must initialize the kref after you allocate it. To do this, call
> +kref init as so:
> +
> + struct my_data *data;
> +
> + data = kmalloc(sizeof(*data), GFP_KERNEL);
> + if (!data)
> + return -ENOMEM;
> + kref_init(&data->refcount);
> +
> +This sets the refcount in the kref to 1.
> +
> +Once you have a refcount, you must follow the following rules:
> +
> +1) If you make a non-temporary copy of a pointer, especially if
> + it can be passed to another thread of execution, you must
> + increment the refcount with kref_get() before passing it off:
> + kref_get(&data->refcount);
> + If you already have a valid pointer to a kref-ed structure (the
> + refcount cannot go to zero) you may do this without a lock.
> +
> +2) When you are done with a pointer, you must call kref_put():
> + kref_put(&data->refcount, data_release);
> + If this is the last reference to the pointer, the release
> + routine will be called. If the code never tries to get
> + a valid pointer to a kref-ed structure without already
> + holding a valid pointer, it is safe to do this without
> + a lock.
> +
> +3) If the code attempts to gain a reference to a kref-ed structure
> + without already holding a valid pointer, it must serialize access
> + where a kref_put() cannot occur during the kref_get(), and the
> + structure must remain valid during the kref_get().
> +
> +For example, if you allocate some data and then pass it to another
> +thread to process:
> +
> +void data_release(struct kref *ref)
> +{
> + struct my_data *data = container_of(ref, struct my_data, refcount);
> + kfree(data);
> +}
> +
> +void more_data_handling(void *cb_data)
> +{
> + struct my_data *data = cb_data;
> + .
> + . do stuff with data here
> + .
> + kref_put(data, data_release);
> +}
> +
> +void my_data_handler(void)

Should be int - you return something from it below.

> +{
> + int rv = 0;
> + struct my_data *data;
> + data = kmalloc(sizeof(*data), GFP_KERNEL);
> + if (!data)
> + return -ENOMEM;
> + kref_init(&data->refcount);
> +
> + kref_get(&data->refcount);
> + rv = kthread_run(more_data_handling, data, "more_data_handling");
> + if (rv) {

This code is wrong - kthread_run() returns a "struct task_struct *"
(which might be ERR_PTR(-ENOMEM) if it fails).

> + kref_put(&data->refcount);
> + goto out;
> + }
> +
> + .
> + . do stuff with data here
> + .
> + out:
> + kref_put(data, data_release);
> + return rv;
> +}
> +
> +This way, it doesn't matter what order the two threads handle the
> +data, the put handles knowing when the data is free and releasing it.
> +The kref_get() does not require a lock, since we already have a valid
> +pointer that we own a refcount for. The put needs no lock because
> +nothing tries to get the data without already holding a pointer.
> +
> +Note that the "before" in rule 1 is very important. You should never
> +do something like:
> +
> + rv = kthread_run(more_data_handling, data, "more_data_handling");
> + if (rv)
> + goto out;
> + else
> + /* BAD BAD BAD - get is after the handoff */
> + kref_get(&data->refcount);
> +
> +Don't assume you know what you are doing and use the above construct.
> +First of all, you may not know what you are doing. Second, you may
> +know what you are doing (there are some situations where locking is
> +involved where the above may be legal) but someone else who doesn't
> +know what they are doing may change the code or copy the code. It's
> +bad style. Don't do it.
> +
> +There are some situations where you can optimize the gets and puts.
> +For instance, if you are done with an object and enqueuing it for
> +something else or passing it off to something else, there is no reason
> +to do a get then a put:
> +
> + /* Silly extra get and put */
> + kref_get(&obj->ref);
> + enqueue(obj);
> + kref_put(&obj->ref, obj_cleanup);
> +
> +Just do the enqueue. A comment about this is always welcome:
> +
> + enqueue(obj);
> + /* We are done with obj, so we pass our refcount off
> + to the queue. DON'T TOUCH obj AFTER HERE! */
> +
> +The last rule (rule 3) is the nastiest one to handle. Say, for
> +instance, you have a list of items that are each kref-ed, and you wish
> +to get the first one. You can't just pull the first item off the list
> +and kref_get() it. That violates rule 3 because you are not already
> +holding a valid pointer. You must add locks or semaphores. For
> +instance:
> +
> +static DECLARE_MUTEX(sem);
> +static LIST_HEAD(q);
> +struct my_data
> +{
> + struct kref refcount;
> + struct list_head link;
> +};
> +
> +static struct my_data *get_entry()
> +{
> + struct my_data *entry = NULL;
> + down(&sem);
> + if (!list_empty(&q)) {
> + entry = container_of(q.next, struct my_q_entry, link);
> + kref_get(&entry->refcount);
> + }
> + up(&sem);
> + return entry;
> +}
> +
> +static void put_entry(struct my_data *entry)
> +{
> + down(&sem);
> + kref_put(&entry->refcount);
> + list_del(&entry->link);

1) kref_put(&entry->refcount, release_entry);
2) Wrong order for kref_put() and list_del() - should be opposite.
3) In this case you could safely up(&sem) after list_del(&entry->link),
because get_entry() cannot get to this entry anymore, and kref_put()
will ensure that only one invocation will actually call the release
function. Probably the example where the call to list_del() is inside
the release_entry callback (which was omitted here) will illustrate the
rule 3 better.

static void release_entry(struct kref *ref)
{
struct my_data *entry = container_of(ref, struct my_data, refcount);

list_del(&entry->link);
kfree(entry);
}

static void put_entry(struct my_data *entry)
{
down(&sem);
kref_put(&entry->refcount, release_entry);
up(&sem);
}

> + up(&sem);
> +}
> +
> +
> +There is one exception to rule 3. kref_get_with_check() can be used
> +to avoid a lock/semaphore in your routines that call kref_put(), except
> +in the case that release is called. It is useful in situations where
> +you might be trying to get an object that might have just gone free,
> +as in our list example above.
> +
> +For instance, suppose you have a queue of items to process in your
> +driver. The head of the queue is the current item being worked on.
> +You driver has timers, interrupts, and user context threads that may
> +all attempt to claim the head of the queue.
> +
> +So you define three routines, get_working_entry(), put_entry(), and
> +entry_completed(), one to get the current queue entry that is being
> +worked on, one to tell the system you are done working on that
> +entry for now, and one to tell that the entry is completed and
> +should be dequeued. Here is some example code:
> +
> +static LIST_HEAD(q);
> +DEFINE_SPINLOCK(q_lock);
> +
> +struct my_q_entry
> +{
> + struct list_head link;
> + struct kref refcount;
> + char data[10];
> + void (*done)(struct my_q_entry *);
> +};
> +
> +struct my_q_entry *get_working_entry(void)
> +{
> + unsigned long flags;
> + struct my_q_entry *entry = NULL;
> +
> + spin_lock_irqsave(&q_lock, flags);
> + retry:
> + if (!list_empty(&q)) {
> + entry = container_of(q.next, struct my_q_entry, link);
> + if (!kref_get_with_check(&entry->refcount)) {
> + /* Entry's refcount has been deleted, but it
> + * has not yet been removed from the list. We
> + * raced with put_entry and are between the
> + * kref_put() decrementing the count and the
> + * release routine claiming the lock. Just
> + * remove it and retry. */
> + list_del(&entry->link);

This must be list_del_init(&entry->link) - otherwise release_entry()
will oops on its list_del() call.

> + entry = NULL;
> + goto retry;
> + }
> + }
> + spin_lock_irqrestore(&q_lock, flags);
> + return entry;
> +}
> +
> +void release_entry(struct kref *ref)
> +{
> + unsigned long flags;
> + struct my_q_entry *entry = container_of(ref, struct my_q_entry,
> + refcount);
> +
> + spin_lock_irqsave(&q_lock, flags);
> + list_del(&entry->link);
> + spin_lock_irqrestore(&q_lock, flags);
> + entry->done(entry);
> +
> + entry = get_working_entry();
> + if (entry) {
> + /* Start next entry here. */
> + }
> + put_entry(entry)

Obviously this call must be inside the if branch, because your example
put_entry() implementation does not accept NULL.

> +}
> +
> +void put_entry(struct my_q_entry *entry)
> +{
> + kref_put(&entry->refcount, release_entry);
> +}
> +
> +void entry_completed(struct my_q_entry *entry)
> +{
> + kref_put(&entry->refcount, release_entry);
> +}
> +
> +
> +Note that there is no lock around kref_put(), which should avoid a
> +lock on most put operations, except for the last one. This works
> +because if kref_put() sets the refcount to zero, one and only one
> +thread of execution can be doing the kref_get(). If the kref_get()
> +beats the kref_put(), everything is fine, the get routine has a
> +valid entry. If the kref_put() beats the kref_get(), the kref
> +value will be zero and kref_get_with_check() will fail, but the
> +release_entry() code will block on the spinlock. So the entry will
> +be dequeued and everything still works.
> +
> +This is generally only useful when you expect do have this happen many
> +times for a single object. If 99% of the time you have one get and
> +one put, it's probably not worth the complexity. If you generally
> +have 20 gets and puts, it saves 19 locking operations, which is
> +probably worth it.
> +
> +Other than this, normal kref rules apply. You can do a kref_get()
> +without a lock if you are holding a valid kref pointer.
> +
> +
> +Corey Minyard <minyard@xxxxxxx>
> +
> +A lot of this was lifted from Greg KH's OLS presentation on krefs.

Attachment: pgp00000.pgp
Description: PGP signature