Re: [PATCH] Re: Corrupted XFS log replay oops.

From: Dave Chinner
Date: Thu Jan 22 2009 - 20:32:54 EST


On Fri, Jan 23, 2009 at 10:37:17AM +1100, Dave Chinner wrote:
> On Thu, Jan 22, 2009 at 11:06:48AM +0100, Eric Sesterhenn wrote:
> > * Christoph Hellwig (hch@xxxxxxxxxxxxx) wrote:
> > > On Thu, Jan 22, 2009 at 03:37:47PM +1100, Dave Chinner wrote:
> > > > xfs_buf_t *
> > > > xlog_get_bp(
> > > > xlog_t *log,
> > > > - int num_bblks)
> > > > + int nbblks)
> > >
> > > Any reason for reanming this variable? That causes quite a bit of
> > > churn.
> > >
> > > > {
> > > > - ASSERT(num_bblks > 0);
> > > > + if (nbblks <= 0 || nbblks > log->l_logBBsize) {
> > > > + xlog_warn("XFS: Invalid block length (0x%x) given for buffer", nbblks);
> > >
> > > And doesn't prevent this line from needing a linebreak to stay under 80
> > > characters :)
> > >
> > > Except for these nitpicks it looks fine to me.
> >
> > Using the image at http://www.cccmz.de/~snakebyte/xfs.254.img.bz2
> > I was able to produce a pretty similar error with the patch applied
>
> Different problem, obviously. ;)
>
> I'll have a look at this later today....

One word: Ouch.

Basically the corruption introduced adds random feature bits into the
superblock that aren't actually in use. And hence instead of having
valid superblock fields for each of those features, they are zero
and so strange stuff happens.

What is really stupid is that the fields are often checked. By
ASSERT(), not by production code so a debug kernel will pick up the
problem and panic, while a production kernel will continue onwards
until it panics. This is not going to be a small patch.....

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/