Re: [REGRESSION] v6.8 SMC-D issues

From: Wen Gu
Date: Thu Jan 25 2024 - 00:00:05 EST




On 2024/1/24 22:29, Alexandra Winter wrote:
Hello Wen Gu,

our colleague Matthew reported that SMC-D is failing in certain scenarios on
kernel v6.8 (thx Matt!). He bisected it to
b40584d ("net/smc: compatible with 128-bits extended GID of virtual ISM device")
I think the root cause could also be somewhere else in the SMC-Dv2.1 patchset.

I was able to reproduce the issue on a 6.8.0-rc1 kernel.
I tested iperf over smc-d with:
smc_run iperf3 -s
smc_run iperf3 -c <IP@>

1) Doing an iperf in a single system using 127.0.0.1 as IP@
(System A=iperf client=iperf server)
2) Doing iperf to a remote system (System A=client; System B=iperf server)

The second iperf fails with an error message like:
"iperf3: error - unable to receive cookie at server: Bad file descriptor" on the server"

If I do first 2) (iperf to remote) and then 1) (iperf to local), then the
iperf to local fails.

I can do multiple iperf to the first server without problems.

I ran it on a debug server with KASAN, but got no reports in the Logfile.

I will try to debug further, but wanted to let you all know.

Kind regards
Alexandra

Reported-by: Matthew Rosato <mjrosato@xxxxxxxxxxxxx>


Hi Alexandra and Matthew,

Thank you very much for detailed description.

I tried to reproduce this with loopback-ism, cut some checks so that the remote-system
handshake can be done. After a while debug I found an elementary mistake of mine in
b40584d ("net/smc: compatible with 128-bits extended GID of virtual ISM device")..

The operator order in smcd_lgr_match() is not as expected. It will always return
'true' in remote-system case.

static bool smcd_lgr_match(struct smc_link_group *lgr,
- struct smcd_dev *smcismdev, u64 peer_gid)
+ struct smcd_dev *smcismdev,
+ struct smcd_gid *peer_gid)
{
- return lgr->peer_gid == peer_gid && lgr->smcd == smcismdev;
+ return lgr->peer_gid.gid == peer_gid->gid && lgr->smcd == smcismdev &&
+ smc_ism_is_virtual(smcismdev) ?
+ (lgr->peer_gid.gid_ext == peer_gid->gid_ext) : 1;
}

Could you please try again with this patch? to see if this is the root cause.
Really sorry for the inconvenience.

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index da6a8d9c81ea..c6a6ba56c9e3 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -1896,8 +1896,8 @@ static bool smcd_lgr_match(struct smc_link_group *lgr,
struct smcd_gid *peer_gid)
{
return lgr->peer_gid.gid == peer_gid->gid && lgr->smcd == smcismdev &&
- smc_ism_is_virtual(smcismdev) ?
- (lgr->peer_gid.gid_ext == peer_gid->gid_ext) : 1;
+ (smc_ism_is_virtual(smcismdev) ?
+ (lgr->peer_gid.gid_ext == peer_gid->gid_ext) : 1);
}


Thanks,
Wen Gu