Re: corruption

From: Andrew Morton (andrewm@uow.edu.au)
Date: Sat Dec 02 2000 - 09:00:46 EST


Andrew Morton wrote:
>
> "Stephen C. Tweedie" wrote:
> >
> > Hi,
> >
> > On Fri, Dec 01, 2000 at 08:35:41AM +1100, Andrew Morton wrote:
> > >
> > > I bet this'll catch it:
> > >
> > > static __inline__ void list_del(struct list_head *entry)
> > > {
> > > __list_del(entry->prev, entry->next);
> > > + entry->next = entry->prev = 0;
> > > }
> >
> > No, because the buffer hash list is never referenced unless
> > buffer->b_inode is non-null, so we can't ever do a double-list_del on
> > the buffer.
> >
> > The patch below should fix it. It has been sent to Linus. The
> > important part is the first hunk of the inode.c diff.
>
> Testing test11-pre3 with your inode.c patch and the above list_del
> patch. x86 dual processor, IDE. Same workload as before, except
> I cut out misc001 (and the machine recovered almost immediately
> when I killed everything! Need more testing to characterise this).
>
> kernel BUG at inode.c:83! The trace is below. Now, this was
> probably triggered by my list_del change. If so it means
> that we're doing a list_empty() test on a list_head which
> has actually been deleted from a list. So it's possibly the
> actual assertion in destroy_inode() which is a little bogus.
>
> But the wierd thing is that this BUG only hit a single time,
> after three hours of intensive testing. If my theory is
> right, the BUG should hit every time. Will investigate further...
>

It appears that this problem is not fixed.

My destroy_inode() is now

static void destroy_inode(struct inode *inode)
{
        if (!list_empty(&inode->i_dirty_buffers)) {
                printk("&inode->i_dirty_buffers=0x%p\n", &inode->i_dirty_buffers);
                printk("next=0x%p\n", inode->i_dirty_buffers.next);
                printk("prev=0x%p\n", inode->i_dirty_buffers.prev);
                BUG();
        }
        kmem_cache_free(inode_cachep, (inode));
}

After 45 minutes of running the previously described tests:

Dec 2 16:14:30 mnm kernel: &inode->i_dirty_buffers=0xcfe16878
Dec 2 16:14:30 mnm kernel: next=0xc16f3678
Dec 2 16:14:30 mnm kernel: prev=0xc16f3678
Dec 2 16:14:30 mnm kernel: kernel BUG at inode.c:86!

We're throwing away an inode which has live data on its
dirty buffers list.

This is 2.4.0-test11-pre3 + list_del patch + sct's inode
patch (buffer.c, inode.c). x86 dual processor. gcc 2.91.66.
I rediffed my tree. No rogue patches.

It's not an SMP thing - the address patterns match up with the
previously reported uniprocessor corruption.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Dec 07 2000 - 21:00:08 EST