Re: [RFC PATCH net-next v4 0/9] net/smc: Introduce SMC-D-based OS internal communication acceleration

From: Wen Gu
Date: Mon Apr 10 2023 - 10:31:22 EST


Hi Niklas,

On 2023/4/6 01:04, Niklas Schnelle wrote:


Let me just spell out some details here to make sure we're all on the
same page.

You're assuming that GIDs are generated randomly at cryptographic
quality. In the code I can see that you use get_random_bytes() which as
its comment explains supplies the same quality randomness as
/dev/urandom so on modern kernels that should provide cryptographic
quality randomness and be fine. Might be something to keep in mind for
backports though.

The fixed CHID of 0xFFFF makes sure this system identity confusion can
only occur between SMC-D loopback (and possibly virtio-ism?) never with
ISM based SMC-D or SMC-R as these never use this CHID value. Correct?

Yes, CHID of 0xFFFF used for SMC-D loopback ensures the GID collision
won't involve ISM based SMC-D or SMC-R.


Now for the collision scenario above. As I understand it the
probability of the case where fallback does *not* occur is equivalent
to a 128 bit hash collision. Basically the random 64 bit GID_A
concatenated with the 64 bit DMB Token_A needs to just happen to match
the concatenation of the random 64 bit GID_B with DMB Token_B.

Yes, almost like this.

A very little correction: Token_A happens to match a DMB token in B's
kernel (not necessary Token_B) and Token_B happens to match a DMB token
in A's kernel (not necessary Token_A).

With
that interpretation we can consult Wikipedia[0] for a nice table of how
many random GID+DMB Token choices are needed for a certain collision
probability. For 128 bits at least 8.2×10^11 tries would be needed just
to reach a 10^-15 collision probability. Considering the collision does
not only need to exist between two systems but these also need to try
to communicate with each other and happen to use the colliding DMBs for
things to get into the broken fallback case I think from a theoretical
point of view this sounds like neglible risk to me.

Thanks for the reference data.

That said I'm more worried about the fallback to TCP being broken due
to a code bug once the GIDs do match which is already extremely
unlikely and thus not naturally tested in the wild. Do we have a plan
how to keep testing that fallback scenario somehow. Maybe with a
selftest or something?


IIUC, you are worried about the code implementation of fallback when GID
collides but DMB token check works? If so, I think we can provide a way
to set loopback device's GID manually, so that we can inject GID collision
fault to test the code.

If we can solve the testing part then I'm personally in favor of this
approach of going with cryptograhically random GID and DMB token. It's
simple and doesn't depend on external factors and doesn't need a
protocol extension except for possibly reserving CHID 0xFFFF.

One more question though, what about the SEID why does that have to be
fixed and at least partially match what ISM devices use? I think I'm
missing some SMC protocol/design detail here. I'm guessing this would
require a protocol change?

SEID related topic will be replied in the next e-mail.

Thanks,
Niklas

[0] https://en.wikipedia.org/wiki/Birthday_attack


Thanks!
Wen Gu