Re: [PATCH update] firewire: fix "kobject_add failed for fw* with -EEXIST"

From: Jarod Wilson
Date: Mon Jan 28 2008 - 17:24:47 EST


On Monday 28 January 2008 01:54:14 pm Stefan Richter wrote:
> Jarod Wilson wrote:
> > We may have another issue there though, as when this happened to me, the
> > md layer apparently never noticed (after ~6 hours) that one of the array
> > members had disappeared -- not sure if that's firewire's fault or md's
> > though... This will presumably avoid this situation entirely, but worth
> > noting that there may still be somewhere we need to better communicate
> > status to an upper layer.
>
> I don't know how md ticks, so I have no idea what might have happened
> there.

It looks like firewire is doing the right thing, unregistering the fw* device,
and the SCSI layer is subsequently removing the appropriate /dev/sd* nodes,
but for whatever reason, md hasn't a clue this has happened. I can reproduce
this particular part of the problem by bringing the array up, and then simply
pulling the firewire cable on one of the drives in the array...

> Somewhat related: What if
> - we lose connection to disk "A", represented by scsi_device "a",
> - the SCSI core sets "a" offline,
> - we gain connection to disk "A" again (i.e. it only shortly
> disappeared from the bus from firewire-core's and -sbp2's point
> of view),
> - and firewire-sbp2 adds it as scsi_device "b", even before SCSI
> core got rid of "a"?
> No big problem for stand-alone volumes (unless it happens when the
> volume is in use), but maybe trouble for md managed volumes.

That does appear to be the case. If I reconnect the drive I disconnected,
which was originally /dev/sdb, it comes back up as /dev/sdd now. So
apparently, the scsi layer is at least bright enough to see that someone (md)
is still trying to use /dev/sdb, but I'm clueless as to why md doesn't have
any idea that /dev/sdb actually went away. :\

> To smooth such issues out, my longer term goal was to allow brief
> periods of disconnection in (firewire-)sbp2. I.e. the SCSI core
> wouldn't notice that "A"/"a" went away, it would only notice that "a"
> wasn't accessible for a short time. I think the Fibre Channel drivers
> already support this. The ieee1394 driver even has a "limbo" for
> devices which went away, in order to remember them until they come back,
> but sbp2 doesn't use this feature. (Nobody did the work to enhance sbp2
> to utilize the feature.)
>
> BTW, if you unplug and replug a FireWire disk under Mac OS X fairly
> quickly, OS X will pretend that nothing happened and let the user
> continue using the disk if he hadn't "ejected" it before the brief
> connection loss.

Certainly sounds like a feature we'd benefit from having in this particular
case...

> Anyhow, we have a few more urgent problems to solve in firewire-sbp2's
> reconnection handling before we can think about such extras.

Very true... Perhaps I'll just file this one away a bit down the TODO list for
now... ;)

--
Jarod Wilson
jwilson@xxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/