Re: [PATCH] SUNRPC: Fix svc_flush_dcache()

From: He Zhe
Date: Tue Sep 22 2020 - 23:31:05 EST




On 9/22/20 10:14 PM, Chuck Lever wrote:
>
>> On Sep 22, 2020, at 3:13 AM, He Zhe <zhe.he@xxxxxxxxxxxxx> wrote:
>>
>>
>>
>> On 9/21/20 3:51 AM, Chuck Lever wrote:
>>> On platforms that implement flush_dcache_page(), a large NFS WRITE
>>> triggers the WARN_ONCE in bvec_iter_advance():
>>>
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: Attempted to advance past end of bvec iter
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: WARNING: CPU: 0 PID: 1032 at include/linux/bvec.h:101 bvec_iter_advance.isra.0+0xa7/0x158 [sunrpc]
>>>
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: Call Trace:
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: svc_tcp_recvfrom+0x60c/0x12c7 [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? bvec_iter_advance.isra.0+0x158/0x158 [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? del_timer_sync+0x4b/0x55
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? test_bit+0x1d/0x27 [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: svc_recv+0x1193/0x15e4 [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? try_to_freeze.isra.0+0x6f/0x6f [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? refcount_sub_and_test.constprop.0+0x13/0x40 [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? svc_xprt_put+0x1e/0x29f [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? svc_send+0x39f/0x3c1 [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: nfsd+0x282/0x345 [nfsd]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? __kthread_parkme+0x74/0xba
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: kthread+0x2ad/0x2bc
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? nfsd_destroy+0x124/0x124 [nfsd]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? test_bit+0x1d/0x27
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? kthread_mod_delayed_work+0x115/0x115
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: ret_from_fork+0x22/0x30
>>>
>>> Reported-by: He Zhe <zhe.he@xxxxxxxxxxxxx>
>>> Fixes: ca07eda33e01 ("SUNRPC: Refactor svc_recvfrom()")
>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>> ---
>>> net/sunrpc/svcsock.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> Hi Zhe-
>>>
>>> If you confirm this fixes your issue and there are no other
>>> objections or regressions, I can submit this for v5.9-rc.
>> I don't quite get why we add "seek" to "size". It seems this action does not
>> reflect the actual scenario and forcedly neutralizes the WARN_ONCE check in
>> bvec_iter_advance, so that it may "advance past end of bvec iter" and thus
>> introduces overflow.
>> Why don't we avoid this problem at the very begginning like my v1? That is, call
>> svc_flush_bvec only when we have received more than we want to seek.
>>
>> len = sock_recvmsg(svsk->sk_sock, &msg, MSG_DONTWAIT);
>> - if (len > 0)
>> + if (len > 0 && (size_t)len > (seek & PAGE_MASK))
>> svc_flush_bvec(bvec, len, seek);
> Because this doesn't fix the underlying bug that triggered the
> WARN_ONCE.
>
> svc_tcp_recvfrom() attempts to assemble a possibly large RPC Call
> from a sequence of sock_recvmsg's.
>
> @seek is the running number of bytes that has been received so
> far for the RPC Call we are assembling. @size is the number of
> bytes that was just received in the most recent sock_recvmsg.
>
> We want svc_flush_bvec to flush just the area of @bvec that
> hasn't been flushed yet.
>
> Thus: the current size of the partial Call message in @bvec is
> @seek + @size. The starting location of the flush is
> @seek & PAGE_MASK. This aligns the flush so it starts on a page
> boundary.
>
> This:
>
> 230 struct bvec_iter bi = {
> 231 .bi_size = size + seek,
> 232 };
>
> 235 bvec_iter_advance(bvec, &bi, seek & PAGE_MASK);
>
> advances the bvec_iter to the part of @bvec that hasn't been
> flushed yet.
>
> This loop:
>
> 236 for_each_bvec(bv, bvec, bi, bi)
> 237 flush_dcache_page(bv.bv_page);
>
> flushes each page starting at that point to the end of the bytes
> that have been received so far
>
> In other words, ca07eda33e01 was wrong because it always flushed
> the first section of @bvec, never the later parts of it.

Thanks for clarification. I just tested the patch. It works well.

Zhe

>
>
>> Regards,
>> Zhe
>>
>>>
>>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>>> index d5805fa1d066..c2752e2b9ce3 100644
>>> --- a/net/sunrpc/svcsock.c
>>> +++ b/net/sunrpc/svcsock.c
>>> @@ -228,7 +228,7 @@ static int svc_one_sock_name(struct svc_sock *svsk, char *buf, int remaining)
>>> static void svc_flush_bvec(const struct bio_vec *bvec, size_t size, size_t seek)
>>> {
>>> struct bvec_iter bi = {
>>> - .bi_size = size,
>>> + .bi_size = size + seek,
>>> };
>>> struct bio_vec bv;
> --
> Chuck Lever
>
>
>