Re: [PATCH 2/2 update] firewire: insist on successive self ID completeevents

From: Stefan Richter
Date: Fri Apr 18 2008 - 12:46:32 EST


I wrote on 2008-03-19:
The whole topology code only works if the old and new topologies which
are compared come from immediately successive self ID complete events.

If there happened bus resets without self ID complete events in the
meantime, or self ID complete events with invalid selfIDs, the topology
comparison could identify nodes wrongly, or more likely just corrupt
kernel memory or panic right away.

We new discard all nodes of the old topology and treat all current nodes
as new ones if the current self ID generation is not the previous one
plus 1.
[...]
--- linux.orig/drivers/firewire/fw-topology.c
+++ linux/drivers/firewire/fw-topology.c
@@ -513,6 +513,18 @@ fw_core_handle_bus_reset(struct fw_card fw_flush_transactions(card);
+ /*
+ * If the selfID buffer is not the immediate successor of the
+ * previously processed one, we cannot reliably compare the
+ * old and new topologies.
+ */
+ if ((generation & 0xff) != ((card->generation + 1) & 0xff) &&
+ card->local_node != NULL) {
+ fw_notify("skipped bus generations, destroying all nodes\n");
+ fw_destroy_nodes(card);
+ card->bm_retries = 0;
+ }
+
spin_lock_irqsave(&card->lock, flags);
/*


Some a-posteriori thoughts:

Situations like this happen quite regularly when camcorders are plugged in or out or switched on or off, or when bus-powered hubs are plugged in, and similar situations --- depending on the PHYs on the bus.

The conclusion that we have to discard the old topology data in this situation still stands.

However, although we have to discard node data, we should not discard _device_ data. Chances are that some devices remained on the bus. At least the local node's device will obviously still be there.

Destroying the device representations (and recreating them)
- causes unnecessary terminal connection loss for userspace drivers,
- causes unnecessary terminal connection loss to SBP-2 devices with
the possible result of data loss,
- will unnecessarily disturb hypothetical future kernelspace firewire
drivers. (There will be at least one more of those eventually, i.e.
IP over 1394.)
- is a regression relative to the current firewire-core and relative
to ieee1394.

I don't think it will be hard to prevent the premature destruction of device representations. I will work on it soon and am holding off upstream submission of the above quoted patch until I have the device preserving code implemented an tested.

Until then, firewire-core will keep panicking in rare border cases due to bogus topology comparisons.¹ But it will on the other hand not lose connection in the not quite uncommon situations outlined above.

¹) There is no proof yet that it does, but there are some suspicious reports. And I don't remember any panics by hotplugging anymore since I am using the above patch myself.
--
Stefan Richter
-=====-==--- -=-- =--=-
http://arcgraph.de/sr/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/