Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed

From: Devid Antonio Filoni
Date: Fri Nov 18 2022 - 10:13:05 EST


On Fri, 2022-11-18 at 14:44 +0100, Oleksij Rempel wrote:
> On Fri, Nov 18, 2022 at 01:41:05PM +0100, Devid Antonio Filoni wrote:
> > On Fri, 2022-11-18 at 13:30 +0100, Oleksij Rempel wrote:
> > > On Fri, Nov 18, 2022 at 11:25:04AM +0100, Devid Antonio Filoni wrote:
> > > > On Fri, 2022-11-18 at 07:06 +0100, Oleksij Rempel wrote:
> > > > > On Thu, Nov 17, 2022 at 04:22:51PM +0100, David Jander wrote:
> > > > > > On Thu, 17 Nov 2022 15:08:20 +0100
> > > > > > Devid Antonio Filoni <devid.filoni@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > > On Fri, 2022-05-13 at 11:46 +0200, Devid Antonio Filoni wrote:
> > > > > > > > Hi David,
> > > > > > > >
> > > > > > > > On Wed, 2022-05-11 at 16:22 +0200, David Jander wrote:
> > > > > > > > > Hi Devid,
> > > > > > > > >
> > > > > > > > > On Wed, 11 May 2022 14:55:04 +0200
> > > > > > > > > Devid Antonio Filoni <
> > > > > > > > > devid.filoni@xxxxxxxxxxxxxxxxxxxxx
> > > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > On Wed, 2022-05-11 at 11:06 +0200, David Jander wrote:
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > On Wed, 11 May 2022 10:47:28 +0200
> > > > > > > > > > > Oleksij Rempel <
> > > > > > > > > > > o.rempel@xxxxxxxxxxxxxx
> > > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Hi,
> > > > > > > > > > > >
> > > > > > > > > > > > i'll CC more J1939 users to the discussion.
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the CC.
> > > > > > > > > > >
> > > > > > > > > > > > On Tue, May 10, 2022 at 01:00:41PM +0200, Devid Antonio Filoni wrote:
> > > > > > > > > > > > > Hi,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, 2022-05-10 at 06:26 +0200, Oleksij Rempel wrote:
> > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, May 09, 2022 at 09:04:06PM +0200, Kurt Van Dijck wrote:
> > > > > > > > > > > > > > > On ma, 09 mei 2022 19:03:03 +0200, Devid Antonio Filoni wrote:
> > > > > > > > > > > > > > > > This is not explicitly stated in SAE J1939-21 and some tools used for
> > > > > > > > > > > > > > > > ISO-11783 certification do not expect this wait.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It will be interesting to know which certification tool do not expect it and
> > > > > > > > > > > > > > what explanation is used if it fails?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > IMHO, the current behaviour is not explicitely stated, but nor is the opposite.
> > > > > > > > > > > > > > > And if I'm not mistaken, this introduces a 250msec delay.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1. If you want to avoid the 250msec gap, you should avoid to contest the same address.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2. It's a balance between predictability and flexibility, but if you try to accomplish both,
> > > > > > > > > > > > > > > as your patch suggests, there is slight time-window until the current owner responds,
> > > > > > > > > > > > > > > in which it may be confusing which node has the address. It depends on how much history
> > > > > > > > > > > > > > > you have collected on the bus.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'm sure that this problem decreases with increasing processing power on the nodes,
> > > > > > > > > > > > > > > but bigger internal queues also increase this window.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It would certainly help if you describe how the current implementation fails.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Would decreasing the dead time to 50msec help in such case.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Kind regards,
> > > > > > > > > > > > > > > Kurt
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > The test that is being executed during the ISOBUS compliance is the
> > > > > > > > > > > > > following: after an address has been claimed by a CF (#1), another CF
> > > > > > > > > > > > > (#2) sends a message (other than address-claim) using the same address
> > > > > > > > > > > > > claimed by CF #1.
> > > > > > > > > > > > >
> > > > > > > > > > > > > As per ISO11783-5 standard, if a CF receives a message, other than the
> > > > > > > > > > > > > address-claimed message, which uses the CF's own SA, then the CF (#1):
> > > > > > > > > > > > > - shall send the address-claim message to the Global address;
> > > > > > > > > > > > > - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> > > > > > > > > > > > > 31
> > > > > > > > > > > > >
> > > > > > > > > > > > > After the address-claim message is sent by CF #1, as per ISO11783-5
> > > > > > > > > > > > > standard:
> > > > > > > > > > > > > - If the name of the CF #1 has a lower priority then the one of the CF
> > > > > > > > > > > > > #2, the the CF #2 shall send its address-claim message and thus the CF
> > > > > > > > > > > > > #1 shall send the cannot-claim-address message or shall execute again
> > > > > > > > > > > > > the claim procedure with a new address
> > > > > > > > > > > > > - If the name of the CF #1 has higher priority then the of the CF #2,
> > > > > > > > > > > > > then the CF #2 shall send the cannot-claim-address message or shall
> > > > > > > > > > > > > execute the claim procedure with a new address
> > > > > > > > > > > > >
> > > > > > > > > > > > > Above conflict management is OK with current J1939 driver
> > > > > > > > > > > > > implementation, however, since the driver always waits 250ms after
> > > > > > > > > > > > > sending an address-claim message, the CF #1 cannot set the DTC. The DM1
> > > > > > > > > > > > > message which is expected to be sent each second (as per J1939-73
> > > > > > > > > > > > > standard) may not be sent.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Honestly, I don't know which company is doing the ISOBUS compliance
> > > > > > > > > > > > > tests on our products and which tool they use as it was choosen by our
> > > > > > > > > > > > > customer, however they did send us some CAN traces of previously
> > > > > > > > > > > > > performed tests and we noticed that the DM1 message is sent 160ms after
> > > > > > > > > > > > > the address-claim message (but it may also be lower then that), and this
> > > > > > > > > > > > > is something that we cannot do because the driver blocks the application
> > > > > > > > > > > > > from sending it.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 28401.127146 1 18E6FFF0x Tx d 8 FE 26 FF FF FF FF FF FF //Message
> > > > > > > > > > > > > with other CF's address
> > > > > > > > > > > > > 28401.167414 1 18EEFFF0x Rx d 8 15 76 D1 0B 00 86 00 A0 //Address
> > > > > > > > > > > > > Claim - SA = F0
> > > > > > > > > > > > > 28401.349214 1 18FECAF0x Rx d 8 FF FF C0 08 1F 01 FF FF //DM1
> > > > > > > > > > > > > 28402.155774 1 18E6FFF0x Tx d 8 FE 26 FF FF FF FF FF FF //Message
> > > > > > > > > > > > > with other CF's address
> > > > > > > > > > > > > 28402.169455 1 18EEFFF0x Rx d 8 15 76 D1 0B 00 86 00 A0 //Address
> > > > > > > > > > > > > Claim - SA = F0
> > > > > > > > > > > > > 28402.348226 1 18FECAF0x Rx d 8 FF FF C0 08 1F 02 FF FF //DM1
> > > > > > > > > > > > > 28403.182753 1 18E6FFF0x Tx d 8 FE 26 FF FF FF FF FF FF //Message
> > > > > > > > > > > > > with other CF's address
> > > > > > > > > > > > > 28403.188648 1 18EEFFF0x Rx d 8 15 76 D1 0B 00 86 00 A0 //Address
> > > > > > > > > > > > > Claim - SA = F0
> > > > > > > > > > > > > 28403.349328 1 18FECAF0x Rx d 8 FF FF C0 08 1F 03 FF FF //DM1
> > > > > > > > > > > > > 28404.349406 1 18FECAF0x Rx d 8 FF FF C0 08 1F 03 FF FF //DM1
> > > > > > > > > > > > > 28405.349740 1 18FECAF0x Rx d 8 FF FF C0 08 1F 03 FF FF //DM1
> > > > > > > > > > > > >
> > > > > > > > > > > > > Since the 250ms wait is not explicitly stated, IMHO it should be up to
> > > > > > > > > > > > > the user-space implementation to decide how to manage it.
> > > > > > > > > > >
> > > > > > > > > > > I think this is not entirely correct. AFAICS the 250ms wait is indeed
> > > > > > > > > > > explicitly stated.
> > > > > > > > > > > The following is taken from ISO 11783-5:
> > > > > > > > > > >
> > > > > > > > > > > In "4.4.4.3 Address violation" it states that "If a CF receives a message,
> > > > > > > > > > > other than the address-claimed message, which uses the CF’s own SA, then the
> > > > > > > > > > > CF [...] shall send the address-claim message to the Global address."
> > > > > > > > > > >
> > > > > > > > > > > So the CF shall claim its address again. But further down, in "4.5.2 Address
> > > > > > > > > > > claim requirements" it is stated that "...No CF shall begin, or resume,
> > > > > > > > > > > transmission on the network until 250 ms after it has successfully claimed an
> > > > > > > > > > > address".
> > > > > > > > > > >
> > > > > > > > > > > At this moment, the address is in dispute. The affected CFs are not allowed to
> > > > > > > > > > > send any other messages until this dispute is resolved, and the standard
> > > > > > > > > > > requires a waiting time of 250ms which is minimally deemed necessary to give
> > > > > > > > > > > all participants time to respond and eventually dispute the address claim.
> > > > > > > > > > >
> > > > > > > > > > > If the offending CF ignores this dispute and keeps sending incorrect messages
> > > > > > > > > > > faster than every 250ms, then effectively the other CF has no chance to ever
> > > > > > > > > > > resume normal operation because its address is still disputed.
> > > > > > > > > > >
> > > > > > > > > > > According to 4.4.4.3 it is also required to set a DTC, but it will not be
> > > > > > > > > > > allowed to send the DM1 message unless the address dispute is resolved.
> > > > > > > > > > >
> > > > > > > > > > > This effectively leads to the offending CF to DoS the affected CF if it keeps
> > > > > > > > > > > sending offending messages. Unfortunately neither J1939 nor ISObus takes into
> > > > > > > > > > > account adversarial behavior on the CAN network, so we cannot do anything
> > > > > > > > > > > about this.
> > > > > > > > > > >
> > > > > > > > > > > As for the ISObus compliance tool that is mentioned by Devid, IMHO this
> > > > > > > > > > > compliance tool should be challenged and fixed, since it is broken.
> > > > > > > > > > >
> > > > > > > > > > > The networking layer is prohibiting the DM1 message to be sent, and the
> > > > > > > > > > > networking layer has precedence above all superior protocol layers, so the
> > > > > > > > > > > diagnostics layer is not able to operate at this moment.
> > > > > > > > > > >
> > > > > > > > > > > Best regards,
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi David,
> > > > > > > > > >
> > > > > > > > > > I get your point but I'm not sure that it is the correct interpretation
> > > > > > > > > > that should be applied in this particular case for the following
> > > > > > > > > > reasons:
> > > > > > > > > >
> > > > > > > > > > - In "4.5.2 Address claim requirements" it is explicitly stated that
> > > > > > > > > > "The CF shall claim its own address when initializing and when
> > > > > > > > > > responding to a command to change its NAME or address" and this seems to
> > > > > > > > >
> > > > > > > > > The standard unfortunately has a track record of ignoring a lot of scenarios
> > > > > > > > > and corner cases, like in this instance the fact that there can appear new
> > > > > > > > > participants on the bus _after_ initialization has long finished, and it would
> > > > > > > > > need to claim its address again in that case.
> > > > > > > > >
> > > > > > > > > But look at point d) of that same section: "No CF shall begin, or resume,
> > > > > > > > > transmission on the network until 250 ms after it has successfully claimed an
> > > > > > > > > address (Figure 4). This does not apply when responding to a request for
> > > > > > > > > address claimed."
> > > > > > > > >
> > > > > > > > > So we basically have two situations when this will apply after the network is
> > > > > > > > > up and running and a new node suddenly appears:
> > > > > > > > >
> > > > > > > > > 1. The new node starts with a "Request for address claimed" message, to
> > > > > > > > > which your CF should respond with an "Address Claimed" message and NOT wait
> > > > > > > > > 250ms.
> > > > > > > > >
> > > > > > > > > or
> > > > > > > > >
> > > > > > > > > 2. The new node creates an addressing conflict either by claiming its address
> > > > > > > > > without first sending a "request for address claimed" message or (and this is
> > > > > > > > > your case) simply using its address without claiming it first.
> > > > > > > > >
> > > > > > > > > It is this second possibility where there is a conflict that must be resolved,
> > > > > > > > > and then you must wait 250ms after claiming the conflicting address for
> > > > > > > > > yourself.
> > > > > > > > >
> > > > > > > > > > completely ignore the "4.4.4.3 Address violation" that states that the
> > > > > > > > > > address-claimed message shall be sent also when "the CF receives a
> > > > > > > > > > message, other than the address-claimed message, which uses the CF's own
> > > > > > > > > > SA".
> > > > > > > > > > Please note that the address was already claimed by the CF, so I think
> > > > > > > > > > that the initialization requirements should not apply in this case since
> > > > > > > > > > all disputes were already resolved.
> > > > > > > > >
> > > > > > > > > Well, yes and no. The address was claimed before, yes, but then a new node came
> > > > > > > > > onto the bus and disputed that address. In that case the dispute needs to be
> > > > > > > > > resolved first. Imagine you would NOT wait 250ms, but the other CF did
> > > > > > > > > correctly claim its address, but it was you who did not receive that message
> > > > > > > > > for some reason. Now also assume that your own NAME has a lower priority than
> > > > > > > > > the other CF. In this case you can send a "claimed address" message to claim
> > > > > > > > > your address again, but it will be contested. If you don't wait for the
> > > > > > > > > contestant, it is you who will be in violation of the protocol, because you
> > > > > > > > > should have changed your own address but failed to do so.
> > > > > > > > >
> > > > > > > > > > - If the offending CF ignores the dispute, as you said, then the other
> > > > > > > > > > CF has no chance to ever resume normal operation and so the network
> > > > > > > > > > cannot be aware that the other CF is not working correctly because the
> > > > > > > > > > offending CF is spoofing its own address.
> > > > > > > > >
> > > > > > > > > Correct. And like I said in my previous reply, this is unfortunately how CAN,
> > > > > > > > > J1939 and ISObus work. The whole network must cooperate and there is no
> > > > > > > > > consideration for malign or adversarial actors.
> > > > > > > > > There are also a lot of possible corner cases that these standards
> > > > > > > > > unfortunately do not take into account. Conformance test tools seem to be even
> > > > > > > > > more problematic and tend to have bugs quite often. I am still inclined to
> > > > > > > > > think this is the case with your test tool.
> > > > > > > > >
> > > > > > > > > > This seems to make useless the
> > > > > > > > > > requirement that states to activate the DTC in "4.4.4.3 Address
> > > > > > > > > > violation".
> > > > > > > > >
> > > > > > > > > The requirement is not useless. You can still set and store the DTC, just not
> > > > > > > > > broadcast it to the network at that moment.
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > Thank you for your feedback and explanation.
> > > > > > > > I asked the customer to contact the compliance company so that we can
> > > > > > > > verify with them this particular use-case. I want to understand if there
> > > > > > > > is an application note or exception that states how to manage it or if
> > > > > > > > they implemented the test basing it on their own interpretation and how
> > > > > > > > it really works: supposing that the test does not check the DM1
> > > > > > > > presence, then the test could be passed even without sending the DM1
> > > > > > > > message during the 250ms after the adress-claimed message.
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Devid
> > > > > > >
> > > > > > > Hi David, all,
> > > > > > >
> > > > > > > I'm sorry for resuming this discussion after a long time but I noticed
> > > > > > > that the driver forces the 250 ms wait even when responding to a request
> > > > > > > for address-claimed which is against point d) of ISO 11783-5 "4.5.2
> > > > > > > Address claim requirements":
> > > > > > >
> > > > > > > No CF shall begin, or resume, transmission on the network until 250 ms
> > > > > > > after it has successfully claimed an address (see Figure 4), except
> > > > > > > when responding to a request for address-claimed.
> > > > > > >
> > > > > > > IMHO the driver shall be able to detect above condition or shall not
> > > > > > > force the 250 ms wait which should then be implemented, depending on the
> > > > > > > case, on user-space application side.
> > > > > >
> > > > > > I am a bit out of the loop with this driver, but I think what you say is
> > > > > > correct. The J1939 stack should NOT unconditionally stay silent for 250ms
> > > > > > after sending an Address Claimed message. It should specifically NOT do so if
> > > > > > it is just responding to a Request for Address Claimed message.
> > > > > >
> > > > > > So if it is indeed so, that the J1939 stack will hold off sending messages
> > > > > > forcibly after sending an Address Claimed message as a reply to a Request for
> > > > > > Address Claimed, then I'd say this is a bug.
> > > > > >
> > > > > > @Oleksij, can you confirm this?
> > > > >
> > > > > I do not see any code path inside of the j1939 stack preventing sending
> > > > > you anything by address. The only part which cares about address
> > > > > claiming is net/can/j1939/address-claim.c and it will just not be able
> > > > > to resolve name to address, because address claiming was not finished
> > > > > jet. With other words, if you need to send responding to a request for
> > > > > address-claimed, then just send it by using address instead of name.
> > > > >
> > > > > Regards,
> > > > > Oleksij
> > > >
> > > > Hi Oleksij,
> > > > I'm sorry but I think I don't understand your proposal.
> > > >
> > > > If I send an address-claimed message binding the socket without the name
> > > > (can_addr.j1939.name = J1939_NO_NAME), then the driver returns error
> > > > EPROTO.
> > > > If I send the address-claimed message binding the socket with the name,
> > > > then the address-claimed message is sent successfully but other messages
> > > > sent within 250 ms are not sent (error EADDRNOTAVAIL).
> > >
> > > What kind of other messages are your trying to send?
> > >
> > > Regards,
> > > Oleksij
> >
> > Hi,
> > the application sends each second the DM1 (0xFECA), meanwhile it
> > receives an request for address-claimed message and it answers with the
> > address-claimed message.
> > If the DM1 is sent within 250 ms after the address-claimed message, then
> > it is rejected with error EADDRNOTAVAIL.
> > Since the driver is performing the claim each time the address-claimed
> > message is sent (even if it is a response to a request for address-
> > claimed), the EADDRNOTAVAIL error is expected in the 250 ms time window.
> > So, when a request for address-claimed message is received:
> > - You cannot send an address-claimed message with the socket bound with
> > J1939_NO_NAME because it is rejected with error EPROTO
> > - You can send an address-claimed message with the socket bound with the
> > name but you won't be able to send other messages within 250 ms because
> > they are rejected with error EADDRNOTAVAIL and this is against point d)
> > of ISO 11783-5 "4.5.2 Address claim requirements".
>
> Ok, finally I understood it.
>
> If I see it correctly, it is hard to fix second part of "ISO 11783-5
> 4.5.2 d)" without breaking first part of the same point.
>
> Haw can I see the difference between AC and AC send as response for RfAC?
> Wait 250ms? What if some system starts just in this time and will send
> plain AC?
>
> Regards,
> Oleksij

Hi Oleksij,

honestly I would apply proposed patch because it is the easier solution
and makes the driver compliant with the standard for the following
reasons:
- on the first claim, the kernel will wait 250 ms as stated by the
standard
+ on successive claims with the same name, the kernel will not wait
250ms, this implies:
- it will not wait after sending the address-claimed message when the
claimed address has been spoofed, but the standard does not explicitly
states what to do in this case (see previous emails in this thread), so
it would be up to the application developer to decide how to manage the
conflict
- it will not wait after sending the address-claimed message when a
request for address-claimed message has been received as stated by the
standard

Otherwise you will have to keep track of above cases and decide if the
wait is needed or not, but this is hard do accomplish because is the
application in charge of sending the address-claimed message, so you
would have to decide how much to keep track of the request for address-
claimed message thus adding more complexity to the code of the driver.

Another solution is to let the driver send the address-claimed message
waiting or without waiting 250 ms for successive messages depending on
the case.

Best Regards,
Devid