Re: [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE

From: Jason Gunthorpe
Date: Thu Jan 04 2024 - 09:56:42 EST


On Thu, Dec 07, 2023 at 06:37:13AM +0000, Daisuke Matsuda (Fujitsu) wrote:
> On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote:
> >
> > 在 2023/12/5 8:11, Jason Gunthorpe 写道:
> > > On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
> > >>
> > >> Daisuke Matsuda (7):
> > >> RDMA/rxe: Always defer tasks on responder and completer to workqueue
> > >> RDMA/rxe: Make MR functions accessible from other rxe source code
> > >> RDMA/rxe: Move resp_states definition to rxe_verbs.h
> > >> RDMA/rxe: Add page invalidation support
> > >> RDMA/rxe: Allow registering MRs for On-Demand Paging
> > >> RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
> > >> RDMA/rxe: Add support for the traditional Atomic operations with ODP
> > >
> > > What is the current situation with rxe? I don't recall seeing the bugs
> > > that were reported get fixed?
>
> Well, I suppose Jason is mentioning "blktests srp/002 hang".
> cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/
>
> It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel",
> so the hang looks not specific to rxe.
> cf. https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@xxxxxxx/T/
> I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue.

Bob? Is that what we think?

> There is another issue that causes kernel panic.
> [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size
> cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@xxxxxxxxxxxxxx/T/

This is more understandable, and the fix of matching the MTT size to
the PAGE_SIZE seems reasonable to me.

Jason