Re: [PATCH] 9p: fix crash when transaction killed

From: asmadeus
Date: Wed Nov 30 2022 - 06:07:11 EST


Schspa Shi wrote on Wed, Nov 30, 2022 at 04:14:32PM +0800:
> > - reqs are alloced in a kmem_cache created with SLAB_TYPESAFE_BY_RCU.
> > This means that if we get a req from idr_find, even if it has just been
> > freed, it either is still in the state it was freed at (hence refcount
> > 0, we ignore it) or is another req coming from the same cache (if
>
> If the req was newly alloced(It was at a new page), refcount maybe not
> 0, there will be problem in this case. It seems we can't relay on this.
>
> We need to set the refcount to zero before add it to idr in p9_tag_alloc.

Hmm, if it's reused then it's zero by definition, but if it's a new
allocation (uninitialized) then anything goes; that lookup could find
and increase it before the refcount_set, and we'd have an off by one
leading to use after free. Good catch!

Initializing it to zero will lead to the client busy-looping until after
the refcount is properly set, which should work.
Setting refcount early might have us use an re-used req before the tag
has been changed so that one cannot move.

Could you test with just that changed if syzbot still reproduces this
bug? (perhaps add a comment if you send this)

------
diff --git a/net/9p/client.c b/net/9p/client.c
index aaa37b07e30a..aa64724f6a69 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -297,6 +297,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, uint t_size, uint r_size,
p9pdu_reset(&req->rc);
req->t_err = 0;
req->status = REQ_STATUS_ALLOC;
+ refcount_set(&req->refcount, 0);
init_waitqueue_head(&req->wq);
INIT_LIST_HEAD(&req->req_list);

-----

> > refcount isn't zero, we can check its tag)
>
> As for the release case, the next request will have the same tag with
> high probability. It's better to make the tag value to be an increase
> sequence, thus will avoid very much possible req reuse.

I'd love to be able to do this, but it would break some servers that
assume tags are small (e.g. using it as an index for a tag array)
... I thought nfs-ganesha was doing this but they properly put in in
buckets, so that's one less server to worry about, but I wouldn't put
it past some simple servers to do that; having a way to lookup a given
tag for flush is an implementation requirement.

That shouldn't be a problem though as that will just lead to either fail
the guard check after lookup (m->rreq->status != REQ_STATUS_SENT) or be
processed as a normal reply if it's already been sent by the other
thread at this point.
OTOH, that m->rreq->status isn't protected by m->req_lock in trans_fd,
and that is probably another bug...

--
Dominique