Re: [PATCH] fix race in AF_UNIX

From: Miklos Szeredi
Date: Wed Jun 06 2007 - 04:09:08 EST


> > > Holding a global mutex over recvmsg() calls under AF_UNIX is pretty
> > > much a non-starter, this will kill performance for multi-threaded
> > > apps.
> >
> > That's an rwsem held for read. It's held for write in unix_gc() only
> > for a short duration, and unix_gc() should only rarely be called. So
> > I don't think there's any performance problem here.
>
> It pulls a non-local cacheline into the local thread, that's extremely
> expensive on SMP.

OK, here's an updated patch, that uses ->readlock, and passes my
testing.

----
From: Miklos Szeredi <mszeredi@xxxxxxx>

There are races involving the garbage collector, that can throw away
perfectly good packets with AF_UNIX sockets in them.

The problems arise when a socket goes from installed to in-flight or
vice versa during garbage collection. Since gc is done with a
spinlock held, this only shows up on SMP.

Signed-off-by: Miklos Szeredi <mszeredi@xxxxxxx>
---

Index: linux-2.6.22-rc2/net/unix/garbage.c
===================================================================
--- linux-2.6.22-rc2.orig/net/unix/garbage.c 2007-06-03 23:58:11.000000000 +0200
+++ linux-2.6.22-rc2/net/unix/garbage.c 2007-06-06 09:48:36.000000000 +0200
@@ -186,7 +186,21 @@ void unix_gc(void)

forall_unix_sockets(i, s)
{
- unix_sk(s)->gc_tree = GC_ORPHAN;
+ struct unix_sock *u = unix_sk(s);
+
+ u->gc_tree = GC_ORPHAN;
+
+ /*
+ * Hold ->readlock to protect against sockets going from
+ * in-flight to installed
+ *
+ * Can't sleep on this, because
+ * a) we are under spinlock
+ * b) skb_recv_datagram() could be waiting for a packet that
+ * is to be sent by this thread
+ */
+ if (!mutex_trylock(&u->readlock))
+ goto lock_failed;
}
/*
* Everything is now marked
@@ -207,8 +221,6 @@ void unix_gc(void)

forall_unix_sockets(i, s)
{
- int open_count = 0;
-
/*
* If all instances of the descriptor are not
* in flight we are in use.
@@ -218,10 +230,20 @@ void unix_gc(void)
* In this case (see unix_create1()) we set artificial
* negative inflight counter to close race window.
* It is trick of course and dirty one.
+ *
+ * Get the inflight counter first, then the open
+ * counter. This avoids problems if racing with
+ * sendmsg
+ *
+ * If just created socket is not yet attached to
+ * a file descriptor, assume open_count of 1
*/
+ int inflight_count = atomic_read(&unix_sk(s)->inflight);
+ int open_count = 1;
+
if (s->sk_socket && s->sk_socket->file)
open_count = file_count(s->sk_socket->file);
- if (open_count > atomic_read(&unix_sk(s)->inflight))
+ if (open_count > inflight_count)
maybe_unmark_and_push(s);
}

@@ -300,6 +322,7 @@ void unix_gc(void)
spin_unlock(&s->sk_receive_queue.lock);
}
u->gc_tree = GC_ORPHAN;
+ mutex_unlock(&u->readlock);
}
spin_unlock(&unix_table_lock);

@@ -309,4 +332,19 @@ void unix_gc(void)

__skb_queue_purge(&hitlist);
mutex_unlock(&unix_gc_sem);
+ return;
+
+ lock_failed:
+ {
+ struct sock *s1;
+ forall_unix_sockets(i, s1) {
+ if (s1 == s) {
+ spin_unlock(&unix_table_lock);
+ mutex_unlock(&unix_gc_sem);
+ return;
+ }
+ mutex_unlock(&unix_sk(s1)->readlock);
+ }
+ BUG();
+ }
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/