Re: [BUG]2.6.27.y some contents lost after writing to mmaped file

From: JiSheng Zhang
Date: Wed Nov 18 2009 - 08:55:48 EST


On Tue, 17 Nov 2009 20:06:35 +0100
Jan Kara <jack@xxxxxxx> wrote:

> On Tue 17-11-09 07:36:22, Chris Mason wrote:
> > On Mon, Nov 16, 2009 at 05:56:55PM -0800, Greg KH wrote:
> > > On Mon, Nov 16, 2009 at 11:38:57AM +0800, JiSheng Zhang wrote:
> > > > Hi,
> > > >
> > > > I triggered a failure in an fs test with fsx-linux from ltp. It seems that
> > > > fsx-linux failed at mmap->write sequence.
> > > >
> > > > Tested kernel is 2.6.27.12 and 2.6.27.39
> > >
> > > Does this work on any kernel you have tested? Or is it a regression?
> > >
> > > > Tested file system: ext3, tmpfs.
> > > > IMHO, it impacts all file systems.
> > > >
> > > > Some fsx-linux log is:
> > > >
> > > > READ BAD DATA: offset = 0x2771b, size = 0xa28e
> > > > OFFSET GOOD BAD RANGE
> > > > 0x287e0 0x35c9 0x15a9 0x80
> > > > operation# (mod 256) for the bad datamay be 21
> > > > ...
> > > > 7828: 1257514978.306753 READ 0x23dba thru 0x25699 (0x18e0 bytes)
> > > > 7829: 1257514978.306899 MAPWRITE 0x27eeb thru 0x2a516 (0x262c bytes)
> > > > ******WWWW
> > > > 7830: 1257514978.307504 READ 0x2771b thru 0x319a8 (0xa28e bytes)
> > > > ***RRRR***
> > > > Correct content saved for comparison
> > > > ...
> Hmm, how long does it take to reproduce? I'm running fsx-linux on tmpfs
> for a while on 2.6.27.21 and didn't hit the problem yet.

I forget to mention that the test were done on an arm board with 64M ram.
I have tested fsx-linux again on pc, it seems that failure go away.

>
> > > Are you sure that the LTP is correct? It wouldn't be the first time it
> > > wasn't...
> >
> > I'm afraid fsx usually finds bugs. I thought Jan Kara recently fixed
> > something here in ext3, does 2.6.32-rc work?
> Yeah, fsx usually finds bugs. Note that he sees the problem also on tmpfs
> so it's not ext3 problem. Anyway, trying to reproduce with 2.6.32-rc? would
> be interesting.

Currently the arm board doesn't support 2.6.32-rc. But I test with 2.6.32-rc7
On my pc box, there's no failure so far.

>
> Honza

I found this via google:
http://marc.info/?t=118026315000001&r=1&w=2

I even tried the code from
http://marc.info/?l=linux-arch&m=118030601701617&w=2
I got mostly:
firstfirstfirst
firstfirstfirst
firstfirstfirst


No change after pass "MS_SYNC|MS_INVALIDATE" to msync and make the
flush_dcache_page() call unconditional in do_generic_mapping_read.
This behavior is different from what I read from the mail thread above.

> void do_generic_mapping_read(struct address_space *mapping,
> struct file_ra_state *_ra,
> struct file *filp,
> loff_t *ppos,
> read_descriptor_t *desc,
> read_actor_t actor)
> {
> ...
> /* If users can be writing to this page using arbitrary
> * virtual addresses, take care about potential aliasing
> * before reading the page on the kernel side.
> */
> if (1 || mapping_writably_mapped(mapping))
> flush_dcache_page(page);

Then I run fsx-linux after the above modification, fsx-linux failed all
the same both on tmpfs and ext3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/