Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant

From: Arnd Bergmann
Date: Tue Feb 20 2018 - 16:54:42 EST


On Tue, Feb 20, 2018 at 10:14 PM, Parav Pandit <parav@xxxxxxxxxxxx> wrote:
> Hi Arnd Bergmann,
>
>> -----Original Message-----
>> From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma-
>> owner@xxxxxxxxxxxxxxx] On Behalf Of Arnd Bergmann
>> Sent: Tuesday, February 20, 2018 2:59 PM
>> To: Doug Ledford <dledford@xxxxxxxxxx>; Jason Gunthorpe <jgg@xxxxxxxx>
>> Cc: Arnd Bergmann <arnd@xxxxxxxx>; Leon Romanovsky
>> <leonro@xxxxxxxxxxxx>; Sagi Grimberg <sagi@xxxxxxxxxxx>; Bart Van Assche
>> <bart.vanassche@xxxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx; linux-
>> kernel@xxxxxxxxxxxxxxx
>> Subject: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
>>
>> The ib_wc structure has grown to much that putting 16 of them on the stack hits
>> the warning limit for dangerous kernel stack consumption:
>>
>> drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':
>> drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes is larger
>> than 1024 bytes [-Werror=frame-larger-than=]
>>
>> Using half that number brings us comfortably below that limit again.
>>
>> Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track
>> RDMA resources")
>
> It is not clear to me how above commit 02d8883f520e introduced this stack issue.

My mistake, I misread the git history.

I did a proper bisection now and ended up with the commit that added the
IB_POLL_BACK sized array on the stack, i.e. commit 246d8b184c10 ("IB/cq:
Don't force IB_POLL_DIRECT poll context for ib_process_cq_direct")

> Bodong and I came across ib_wc size increase in [1] and it was fixed in [2].
> Did you hit this error after/before applying patch [2]?
>
> [1] https://www.spinics.net/lists/linux-rdma/msg50754.html
> [2] https://patchwork.kernel.org/patch/10159623/

I did the analysis a few weeks ago when I first hit the problem but
didn't send it
out at the time. Today I saw the problem still persists on mainline (4.16-rc2),
which does contain the patch from [2].

What I see is that 'ib_wc' is now exactly 59 bytes on 32-bit ARM, plus 5 bytes
of padding, so 16 of them gets us exactly the warning limit, and then there
are a few bytes for the function itself.

Arnd