Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-cryptbarrier support is effective)

From: Mike Snitzer
Date: Sat Dec 04 2010 - 14:39:07 EST


On Sat, Dec 04 2010 at 2:18pm -0500,
Matt <jackdachef@xxxxxxxxx> wrote:

> On Wed, Dec 1, 2010 at 10:23 PM, Mike Snitzer <snitzer@xxxxxxxxxx> wrote:
> > Matt and Jon,
> >
> > If you'd be up to it: could you try testing your dm-crypt+ext4
> > corruption reproducers against the following two 2.6.37-rc commits:
> >
> > 1) 1de3e3df917459422cb2aecac440febc8879d410
> > then
> > 2) bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc
> >
> > Then, depending on results of no corruption for those commits, bonus
> > points for testing the same commits but with Andi and Milan's latest
> > dm-crypt cpu scalability patch applied too:
> > https://patchwork.kernel.org/patch/365542/
> >
> > Thanks!
> > Mike
> >
>
> Hi Mike,
>
> it seems like there isn't even much testing to do:
>
> I tested all 3 commits / checkouts by re-compiling gcc which was/is
> the 2nd easy way to trigger this "corruption", compiling google's
> chromium (v9) and looking at the output/existance of gcc, g++ and
> eselect opengl list

Can you be a bit more precise about what you're doing to reproduce?
What sequence? What (if any) builds are going in parallel? Etc.

> so far everything went fine
>
> After that I used the new patch (v6 or pre-v6), before that I had to
>
> replace WQ_MEM_RECLAIM with WQ_RESCUER
>
> and, re-compiled the kernels
>
> shortly after I had booted up the system with the first kernel
> (http://git.eu.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5a87b7a5da250c9be6d757758425dfeaf8ed3179)
> the output of 'eselect opengl list' did show no opengl backend
> selected
>
> so it seems to manifest itself even earlier (ext4: call
> mpage_da_submit_io() from mpage_da_map_blocks()) even if only subtly
> and over time -
> I'm still currently running that kernel and posting from it & having tests run

OK.

> I'm not sure if it's even a problem with ext4 - I haven't had the time
> to test with XFS yet - maybe it's also happening with that so it more
> likely would be dm-core, like Milan suspected
> (http://marc.info/?l=linux-kernel&m=129123636223477&w=2) :(

It'd be interesting to try to reproduce with that same kernel but using
XFS. I'll check with Milan on what he thinks would be the best next
steps. Ideally we'll be able to reproduce your results to aid in
pinpointing the issue. I think Milan will be trying to do so shortly
(if he hasn't started already -- using gentoo emerge, etc).

> even though most of the time it's compiling I don't need to do much -
> I need the box for work so if my time allows next tests would be next
> weekend and I'm back to my other partition
>
> I really do hope that this bugger can be nailed down ASAP - I like the
> improvements made in 2.6.37 but without the dm-crypt multi-cpu patch
> it's only half the "fun" ;)

Sure, we'll need to get to the bottom of this before we can have
confidence sending the dm-crypt cpu scalability patch upstream.

Thanks for your testing,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/