Re: OCFS2 Filesystem inconsistency across nodes

From: Claudio Martins
Date: Tue Feb 14 2006 - 01:14:17 EST



On Monday 13 February 2006 22:26, Mark Fasheh wrote:
> I was easily able to reproduce your problem on my cluster and was able to
> git-bisect my way to some JBD changes which seem to be causing the issue.
> Reverting those patches fixes things. Can you apply the attached patch and
> confirm that it also fixes this particular problem for you? You'll have to
> apply to all kernels in your cluster and either run fsck.ocfs2 or create a
> new file system before testing again.
>

Hi Mark,

This patch does indeed seem to fix this particular problem. Now creating and
deleting files/directories gives expected results across nodes.

The bad news is that it didn't last long. While doing some more tests I found
another problem, but judging from kernel messages I think this one is related
to the DLM code.

The test was simple:

On node 1 untar a kernel tree and wait for tar to finish.

After tar finished I ran tar on nodes 0 and 2, each one *concurrently*
creating a separate archive from *the same* kernel tree untarred on node 1.

Again, since the files are big, I've put them online:

Node0:
http://coyote.ist.utl.pt/ocfs2/Feb14/kern-iscsi-teste.log

Node1:
http://coyote.ist.utl.pt/ocfs2/Feb14/kern-orateste1.log
(this node's clock was 10 minutes off, sorry about that)

Node2:
http://coyote.ist.utl.pt/ocfs2/Feb14/kern-orateste2.log


On node 0, tar exited with:
tar:
build-AMD-linux-2.6.16-rc2-git3-jbdfix1/drivers/media/video/cx25840/cx25840-core.c:
Cannot stat: Invalid argument

On node 2, tar exited with a segmentation fault.

Anyway, after that I am still able to read and write files on all three nodes
with consistency.


Any ideas?

Thanks
Best regards

Claudio


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/