Hello all,
I think there is a race in the RPC code in 2.4.20, related to timeout
and congestion window handling.
The problem code is in net/sunrpc/xprt.c:
static void
xprt_timer(struct rpc_task *task)
{
struct rpc_rqst *req = task->tk_rqstp;
struct rpc_xprt *xprt = req->rq_xprt;
spin_lock(&xprt->sock_lock);
if (req->rq_received)
goto out;
if (!xprt->nocong) {
if (xprt_expbackoff(task, req)) {
rpc_add_timer(task, xprt_timer);
goto out_unlock;
}
rpc_inc_timeo(&task->tk_client->cl_rtt);
xprt_adjust_cwnd(req->rq_xprt, -ETIMEDOUT);
}
req->rq_nresend++;
The call to xprt_adjust_cwnd is the problem - I experienced a panic
(null pointer dereference) in
static void
xprt_adjust_cwnd(struct rpc_xprt *xprt, int result)
{
unsigned long cwnd;
cwnd = xprt->cwnd;
if (result >= 0 && cwnd <= xprt->cong) {
Here it is the "cwnd = xprt->cwnd" that causes the panic. xprt was 0.
This means, in the xprt_timer code, that req->rq_xprt must have been 0.
I guess this can happen because of the sequence:
xprt = req->rq_xprt;
spin_lock(&xprt->sock_lock);
...
xprt_adjust_cwnd(req->rq_xprt);
We don't know whether req has been modified between the assignment and
the spin_lock.
Attached is a patch to solve the problem - please comment.
It does not solve the other potential (?) problem with:
xprt = req->rq_xprt;
spin_lock(&xprt->sock_lock);
...
if (xprt_expbackoff(task, req)) {
...
Any suggestions to that one?
I cannot test whether my patch solve the problem, because this panic has
happened once on a *heavily* loaded box which has run 2.4.20 ever since
it came out. The race is extremely rare. It is an SMP box by the way.
Thanks a lot to "baldrick" on kernel-newbies for the help!
-- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob Østergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............:
This archive was generated by hypermail 2b29 : Fri Feb 07 2003 - 22:00:23 EST