Re: 1352 NUL bytes at the end of a page? (was Re: Assertion `s &&s->tree' failed: The saga continues.)

From: Linus Torvalds
Date: Mon May 17 2004 - 10:03:54 EST




On Mon, 17 May 2004, Larry McVoy wrote:
>
> > And at some point earlier in the process you did an fflush(), or somebody
> > else had written a header of n*PAGE_SIZE + 0x4ff bytes, or something like
> > that. Since this was the ChangeSet file, I suspect that the "header" is
> > the checkin-comment section at the beginning, and the "second phase" is
> > the actual key list thing. You know how you write the ChangeSet file
> > better than I do.
>
> I don't think we flush along the way but let me look. Whoops, you're right,
> we do. Right where you thought too. But that doesn't explain there being
> 3 blocks of nulls (there should NEVER be a null in the s.ChangeSet file, we
> don't compress that, it's always ascii).

No, no, I'm not claiming that _you_ are writing the NUL bytes. I'm
claiming the kernel has a bug that triggers with non-page-aligned starting
offsets of writes (because we clear the bytes after "i_size", and we don't
synchronize those clears sufficiently), and concurrent flushes. There's
probably some other trigger needed too (CONFIG_PREEMPT being just the
thing that uncovers the race).

> But the bigger problem is that you are missing the point that I mentioned
> elsewhere, we are writing to a tmp file, the tmp file is NOT mmapped.

No, the mmap thing was Andrew's theory. My theory is that regular
"write()" calls can trigger it through the "commit_write()" function.

Of course, my theory also depended on a page unlock happening in a place
where it didn't actually happen, so the exact details of my theory are
crap. I'll need to re-think that part.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/