Re: Filesystem corruption MD (imsm) Raid0 via 2 SSD's + discard

From: Holger Kiehl
Date: Fri May 22 2015 - 14:17:45 EST




On Thu, 21 May 2015, NeilBrown wrote:

On Thu, 21 May 2015 06:44:27 +0000 (UTC) Holger Kiehl <Holger.Kiehl@xxxxxx>
wrote:

On Thu, 21 May 2015, NeilBrown wrote:

On Thu, 21 May 2015 01:32:13 +0500 Roman Mamedov <rm@xxxxxxxxxxx> wrote:

On Wed, 20 May 2015 20:12:31 +0000 (UTC)
Holger Kiehl <Holger.Kiehl@xxxxxx> wrote:

The kernel I was running when I discovered the
problem was 4.0.2 from kernel.org. However, after reinstalling from DVD
I updated to Fedora's lattest kernel, which was 3.19.? (I do not remember
the last numbers). So that kernel seems also effected, but I assume it
contains many 'fixes' from 4.0.x. As filesystem I use ext4, distribution
is Fedora 21 and hardware is: Xeon E3-1275, 16GB ECC Ram.

My system seems to be now running stable for some days with kernel.org
kernel 4.0.3 and with discard DISABLED. But I am still unsure what could
be the real cause.

It is a bug in the 4.0.2 kernel, fixed in 4.0.3.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785672
https://bbs.archlinux.org/viewtopic.php?id=197400
https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable/+/d2dc317d564a46dfc683978a2e5a4f91434e9711



I suspect that is a different bug.
I think this one is
https://bugzilla.kernel.org/show_bug.cgi?id=98501

Should there not be a big fat warning going around telling users to disable
discard on Raid 0 until this is fixed? This breaks the filesystem completely
and I believe there is absolutly no way one can get back the data.

Probably. Would you like to do that?


Is this fixed in 4.0.4? And which kernels are effected? There could be many
people running systems that have not noticed this and don't know in what
dangerous situation they are when they delete data.

The patch was only added to my tree today. I will send to Linus tomorrow so
it should appear in the next -rc.
Any -stable kernel released since mid-April probably has the bug. It was
caused by
commit 47d68979cc968535cb87f3e5f2e6a3533ea48fbd

Once the fix gets into Linus' tree, it should get into subsequent -stable releases.

The fix is here:

http://git.neil.brown.name/?p=md.git;a=commitdiff;h=a81157768a00e8cf8a7b43b5ea5cac931262374f

commit id should remain unchanged.

I would like to confirm that with this patch and discard enabled, I no longer
see any corruption.

Many thanks for the quick fix!

Regards,
Holger
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/