Re: [PATCH net-next 14/15] net/smc: introduce loopback-ism DMB data copy control

From: Wenjia Zhang
Date: Fri Feb 23 2024 - 09:44:38 EST




On 20.02.24 04:36, Wen Gu wrote:


On 2024/2/16 22:25, Wenjia Zhang wrote:


On 11.01.24 13:00, Wen Gu wrote:
This provides a way to {get|set} whether loopback-ism device supports
merging sndbuf with peer DMB to eliminate data copies between them.

echo 0 > /sys/devices/virtual/smc/loopback-ism/dmb_copy # support
echo 1 > /sys/devices/virtual/smc/loopback-ism/dmb_copy # not support

Besides the same confusing as Niklas already mentioned, the name of the option looks not clear enough to what it means. What about:
echo 1 > /sys/devices/virtual/smc/loopback-ism/nocopy_support # merge mode
echo 0 > /sys/devices/virtual/smc/loopback-ism/nocopy_support # copy mode


OK, if we decide to keep the knobs, I will improve the name. Thanks!

The settings take effect after re-activating loopback-ism by:

echo 0 > /sys/devices/virtual/smc/loopback-ism/active
echo 1 > /sys/devices/virtual/smc/loopback-ism/active

After this, the link group related to loopback-ism will be flushed and
the sndbufs of subsequent connections will be merged or not merged with
peer DMB.

The motivation of this control is that the bandwidth will be highly
improved when sndbuf and DMB are merged, but when virtually contiguous
DMB is provided and merged with sndbuf, it will be concurrently accessed
on Tx and Rx, then there will be a bottleneck caused by lock contention
of find_vmap_area when there are many CPUs and CONFIG_HARDENED_USERCOPY
is set (see link below). So an option is provided.

Link: https://lore.kernel.org/all/238e63cd-e0e8-4fbf-852f-bc4d5bc35d5a@xxxxxxxxxxxxxxxxx/
Signed-off-by: Wen Gu <guwen@xxxxxxxxxxxxxxxxx>
---
We tried some simple workloads, and the performance of the no-copy case was remarkable. Thus, we're wondering if it is necessary to have the tunable setting in this loopback case? Or rather, why do we need the copy option? Is that because of the bottleneck caused by using the combination of the no-copy and virtually contiguours DMA? Or at least let no-copy as the default one.

Yes, it is because the bottleneck caused by using the combination of the no-copy
and virtual-DMB. If we have to use virtual-DMB and CONFIG_HARDENED_USERCOPY is
set, then we may be forced to use copy mode in many CPUs environment, to get the
good latency performance (the bandwidth performance still drop because of copy mode).

But if we agree that physical-DMB is acceptable (it costs 1 physical buffer per conn side
in loopback-ism no-copy mode, same as what sndbuf costs when using s390 ISM), then
there is no such performance issue and the two knobs can be removed. (see also the reply
for 13/15 patch [1]).

[1] https://lore.kernel.org/netdev/442061eb-107a-421d-bc2e-13c8defb0f7b@xxxxxxxxxxxxxxxxx/

Thanks!
Thank you, Wen, for the elaboration! As I said, though we did see some better performance on using the virtually contiguous memory with a simple test, the improvement was not really significant. Additionally, our environment ist very different as your 48 CPUs qemu environment, and it also depends on the workload. I think I can understand why you see better performance by using physically contiguous memory. Anyway, I don't have any objection on using physical-DMB only. But I still want to see if there is any other opinion.