Re: [PATCH net-next v4 00/18] net/smc: implement virtual ISM extension and loopback-ism

From: Wen Gu
Date: Mon Oct 16 2023 - 23:49:15 EST




On 2023/10/8 15:19, Wen Gu wrote:


On 2023/10/5 16:21, Niklas Schnelle wrote:


Hi Wen Gu,

I've been trying out your series with iperf3, qperf, and uperf on
s390x. I'm using network namespaces with a ConnectX VF from the same
card in each namespace for the initial TCP/IP connection i.e. initially
it goes out to a real NIC even if that can switch internally. All of
these look great for streaming workloads both in terms of performance
and stability. With a Connect-Request-Response workload and uperf
however I've run into issues. The test configuration I use is as
follows:

Client Command:

# host=$ip_server ip netns exec client smc_run uperf -m tcp_crr.xml

Server Command:

# ip netns exec server smc_run uperf -s &> /dev/null

Uperf tcp_crr.xml:

<?xml version="1.0"?>
<profile name="TCP_CRR">
         <group nthreads="12">
                 <transaction duration="120">
                         <flowop type="connect" options="remotehost=$host protocol=tcp" />
                         <flowop type="write" options="size=200"/>
                         <flowop type="read" options="size=1000"/>
                         <flowop type="disconnect" />
                 </transaction>
         </group>
</profile>

The workload first runs fine but then after about 4 GB of data
transferred fails with "Connection refused" and "Connection reset by
peer" errors. The failure is not permanent however and re-running
the streaming workloads run fine again (with both uperf server and
client restarted). So I suspect something gets stuck in either the
client or server sockets. The same workload runs fine with TCP/IP of
course.

Thanks,
Niklas



Hi Niklas,

Thank you very much for the test. With the test example you provided, I've
reproduced the issue in my VM. And moreover, sometimes the test complains
with 'Error saying goodbye with <ip>'

I'll figure out what's going on here.

Thanks!
Wen Gu

I think that there is a common issue for SMC-R and SMC-D. I also reproduce
'connection reset by peer' and 'Error saying goodbye with <ip>' when using
SMC-R under the same test condition. They occur at the end of the test.

When the uperf test time ends, some signals are sent. At this point there
are usually some SMC connections doing CLC handshake. I catch some -EINTR(-4)
in client and -ECONNRESET(-104) in server returned from smc_clc_wait_msg,
(correspondingly handshake error counts also increase) and TCP RST packets
sent to terminate the CLC TCP connection(clcsock).

I am not sure if this should be considered as a bydesign or a bug of SMC.
From an application perspective, the conn reset behavior only happens when
using SMC.

@Wenjia, could you please take a look at this?

Thanks,
Wen Gu