Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"

From: James Bottomley
Date: Wed Jan 02 2008 - 18:33:23 EST



On Wed, 2008-01-02 at 12:45 -0800, Linus Torvalds wrote:
>
> On Wed, 2 Jan 2008, James Bottomley wrote:
> >
> > OK ... I'll revert it. However, I still think it's the wrong course of
> > action, because as far as my analysis goes, this code is functionally
> > equivalent to what went before with the exception that we now rely on
> > the request->cmd_type information in the post processing (previously we
> > just relied on the cmnd->done pointer).
>
> To say that another way:
>
> "the code is functionally equivalent, EXCEPT IT ISN'T, and it's
> known to be broken".
>
> wouldn't you say my version is more honest and correct?

No. Just because a bug appears when a particular piece of code is in
and disappears when it is reverted doesn't automatically equate to the
code in question being buggy. We seem to get a lot of these second
order effect type things; sometimes its just a problem caused by a
particular routine compiling to a longer byte sequence and pushing
something else out.

Do give us credit for thinking "functional equivalency problem" when
this bug report first came in ... I've had myself and several other
people over the code. If there's an inequivalency somewhere I'm damned
if I can spot it. The most promising other failure mode we tried was
request type changes over the lifetime of the command, but we can't make
that one fly either.

Look at the taxonomy of the bug. This is the form of the error:

buffer I/O error on device sr0, logical block 20304
attempt to access beyond end of device
sr0: rw=0, want=81224, limit=40944

The last limit is the most suggestive, that comes straight from
bdev->bd_inode->i_size>>9 and is supposed to be the size of the block
device in 512 byte blocks. For a 4.7GB DVD, it's a little small.
Nothing in the sr code sets this directly (although it does come from
get_blkdev() for the first opener). pktcdvd does set it, though ... and
probably wrongly if the drive in question isn't UDF formatted.

I have also tried on many occasions to reproduce this without success
(there's a simple recipe in the bug report, but it just doesn't work for
me). My setup is with an aic94xxx->expander->SATAPI DVD, whereas the
original reporter is ata_piix -> PATA DVD, so it could be stack
differences---but again, if it is, the bug itself can't be a simple one
in the generic code. The fact that there are no other reporters of
problems like this also indicate to me that it isn't a widespread
problem (again, pointing to something more specific in the setup of the
reporter).

The unreproduceability coupled with the lack of other error reports
leads me to be about 90% confident the problem isn't in the code you
want reverted. However, I grant that we cannot seem to find the root
cause, and reverting the code will cause our bug metrics to go down by
one (at least until something else causes it to reappear), so it is the
corporate thing to do, I suppose. I'll send in a reversion with the
sr_mod removal bug fix.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/