Re: [PATCH] 9p: fix crash when transaction killed

From: asmadeus
Date: Tue Nov 29 2022 - 17:38:45 EST

Next message: Colin Ian King: "[PATCH][next] firmware: dmi-sysfs: make pr_info messages rate limited"
Previous message: Xiubo Li: "Re: [PATCH v4] ceph: mark directory as non-complete complete after loading key"
In reply to: Christian Schoenebeck: "Re: [PATCH] 9p: fix crash when transaction killed"
Next in thread: Schspa Shi: "Re: [PATCH] 9p: fix crash when transaction killed"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Schspa Shi wrote on Wed, Nov 30, 2022 at 12:22:51AM +0800:
> The transport layer of fs does not fully support the cancel request.
> When the request is in the REQ_STATUS_SENT state, p9_fd_cancelled
> will forcibly delete the request, and at this time p9_[read/write]_work
> may continue to use the request. Therefore, it causes UAF .
>
> There is the logs from syzbot.
>
> Corrupted memory at 0xffff88807eade00b [ 0xff 0x07 0x00 0x00 0x00 0x00
> 0x00 0x00 . . . . . . . . ] (in kfence-#110):
> p9_fcall_fini net/9p/client.c:248 [inline]
> p9_req_put net/9p/client.c:396 [inline]
> p9_req_put+0x208/0x250 net/9p/client.c:390
> p9_client_walk+0x247/0x540 net/9p/client.c:1165
> clone_fid fs/9p/fid.h:21 [inline]
> v9fs_fid_xattr_set+0xe4/0x2b0 fs/9p/xattr.c:118
> v9fs_xattr_set fs/9p/xattr.c:100 [inline]
> v9fs_xattr_handler_set+0x6f/0x120 fs/9p/xattr.c:159
> __vfs_setxattr+0x119/0x180 fs/xattr.c:182
> __vfs_setxattr_noperm+0x129/0x5f0 fs/xattr.c:216
> __vfs_setxattr_locked+0x1d3/0x260 fs/xattr.c:277
> vfs_setxattr+0x143/0x340 fs/xattr.c:309
> setxattr+0x146/0x160 fs/xattr.c:617
> path_setxattr+0x197/0x1c0 fs/xattr.c:636
> __do_sys_setxattr fs/xattr.c:652 [inline]
> __se_sys_setxattr fs/xattr.c:648 [inline]
> __ia32_sys_setxattr+0xc0/0x160 fs/xattr.c:648
> do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
> __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178
> do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203
> entry_SYSENTER_compat_after_hwframe+0x70/0x82
>
> Below is a similar scenario, the scenario in the syzbot log looks more
> complicated than this one, but the root cause seems to be the same.
>
> T21124 p9_write_work p9 read_work
> ======================== first trans =================================
> p9_client_walk
> p9_client_rpc
> p9_client_prepare_req
> /* req->refcount == 2 */
> c->trans_mod->request(c, req);
> p9_fd_request
> req move to unsent_req_list
> req->status = REQ_STATUS_SENT;
> req move to req_list
> << send to server >>
> wait_event_killable
> << get kill signal >>
> if (c->trans_mod->cancel(c, req))
> p9_client_flush(c, req);
> /* send flush request */
> req = p9_client_rpc(c, P9_TFLUSH, "w", oldtag);
> if (c->trans_mod->cancelled)
> c->trans_mod->cancelled(c, oldreq);
> /* old req was deleted from req_list */
> /* req->refcount == 1 */
> p9_req_put
> /* req->refcount == 0 */
> << preempted >>
> << get response, UAF here >>
> m->rreq = p9_tag_lookup(m->client, m->rc.tag);
> /* req->refcount == 1 */
> << do response >>
> p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD);
> /* req->refcount == 0 */
> p9_fcall_fini
> /* request have been freed */
> p9_fcall_fini
> /* double free */
> p9_req_put(m->client, m->rreq);
> /* req->refcount == 1 */
>
> To fix it, we can wait the request with status REQ_STATUS_SENT returned.

Christian replied on this (we cannot wait) but I agree with him -- the
scenario you describe is proteced by p9_tag_lookup checking for refcount
with refcount_inc_not_zero (p9_req_try_get).

The normal scenarii for flush are as follow:
- cancel before request is sent: no flush, just free
- flush is ignored and reply comes first: we get reply from original
request then reply from flush
- flush is handled and reply never comes: we only get reply from flush

Protocol-wise, we can safely reuse the tag after the flush reply got
received; and as far as I can follow the code we only ever free the tag
(last p9_call_fini) after flush has returned so the entry should be
protected.

If we receive a response on the given tag between cancelled and the main
thread going out the request has been marked as FLSHD and should be
ignored. . . here is one p9_req_put in p9_read_work() in this case but
it corresponds to the ref obtained by p9_tag_lookup() so it should be
valid.

I'm happy to believe we have a race somewhere (even if no sane server
would produce it), but right now I don't see it looking at the code.. :/

--
Dominique

Next message: Colin Ian King: "[PATCH][next] firmware: dmi-sysfs: make pr_info messages rate limited"
Previous message: Xiubo Li: "Re: [PATCH v4] ceph: mark directory as non-complete complete after loading key"
In reply to: Christian Schoenebeck: "Re: [PATCH] 9p: fix crash when transaction killed"
Next in thread: Schspa Shi: "Re: [PATCH] 9p: fix crash when transaction killed"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]