[BUG] cdev_put() race condition

From: Hans Verkuil
Date: Mon Dec 08 2008 - 15:57:15 EST


Hi Greg,

Laurent found a race condition in the uvc driver that we traced to the
way chrdev_open and cdev_put/get work.

You need the following ingredients to reproduce it:

1) a hot-pluggable char device like an USB webcam.
2) a manually created device node for such a webcam instead of relying
on udev.

In order to easily force this situation you would also need to add a
delay to the char device's release() function. For webcams that would
be at the top of v4l2_chardev_release() in
drivers/media/video/v4l2-dev.c. But adding a delay to e.g. cdev_purge
would have the same effect.

The sequence of events in the case of a webcam is as follows:

1) The USB device is removed, causing a disconnect.

2) The webcam driver unregisters the video device which in turn calls
cdev_del().

3) When the last application using the device is closed, the cdev is
released when the kref of the cdev's kobject goes to 0.

4) If the kref's release() call takes a while due to e.g. extra cleanup
in the case of a webcam, then another application can try to open the
video device. Note that this requires a device node created with mknod,
otherwise the device nodes would already have been removed by udev.

5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this device
node was never accessed before), then all is fine since kobj_lookup
will fail because cdev_del() has been called earlier. However, if this
device node was used earlier, then the else part is called:
cdev_get(p). This 'p' is the cdev that is being released. Since the
kref count is 0 you will get a WARN message from kref_get, but the code
continues on, the f_op->open will (hopefully) return more-or-less
gracefully with an error and the cdev_put at the end will cause the
refcount to go to 0 again, which results in a SECOND call to the kref's
release function!

See this link for the original discussion on the v4l list containing
stack traces an a patch that you need if you want to (and can) test
this with the uvc driver:

http://www.spinics.net/lists/vfl/msg39967.html

Reading Documentation/kref.txt leads me to the conclusion that a mutex
should be introduced to prevent cdev_get from being called while
cdev_put is in progress. It should be a mutex instead of a spinlock
because the kref's release() can do all sorts of cleanups, some of
which might involve waits.

I think that replacing cdev_lock with a mutex, adding it to cdev_put(),
cdev_del() and removing it from cdev_purge() should do the trick (I
hope).

BTW: shouldn't cdev_del() call cdev_put() instead of kobject_put()? If I
understand it correctly then cdev_add calls cdev_get (through
exact_lock()), so cdev_del should do the reverse, right?

Regards,

Hans

--
Hans Verkuil - video4linux developer - sponsored by TANDBERG
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/