Re: Possible UDF locking error?

From: Steve Magnani
Date: Sat Mar 30 2019 - 15:49:53 EST


Jan -

On 3/25/19 11:42 AM, Jan Kara wrote:
Hi!

On Sat 23-03-19 15:14:05, Steve Magnani wrote:
I have been hunting a UDF bug that occasionally results in generation
of an Allocation Extent Descriptor with an incorrect tagLocation. So
far I haven't been able to see a path through the code that could
cause that. But, I noticed some inconsistency in locking during
AED generation and wonder if it could result in random corruption.

The function udf_update_inode() has this general pattern:

bh = udf_tgetblk(...); // calls sb_getblk()
lock_buffer(bh);
memset(bh->b_data, 0, inode->i_sb->s_blocksize);
// <snip>other code to populate FE/EFE data in the block</snip>
set_buffer_uptodate(bh);
unlock_buffer(bh);
mark_buffer_dirty(bh);

This I can understand - the lock is held for as long as the buffer
contents are being assembled.

In contrast, udf_setup_indirect_aext(), which constructs an AED,
has this sequence:

bh = udf_tgetblk(...); // calls sb_getblk()
lock_buffer(bh);
memset(bh->b_data, 0, inode->i_sb->s_blocksize);

set_buffer_uptodate(bh);
unlock_buffer(bh);
mark_buffer_dirty_inode(bh);

// <snip>other code to populate AED data in the block</snip>

In this case the population of the block occurs without
the protection of the lock.

Because the block has been marked dirty, does this mean that
writeback could occur at any point during population?
Yes. Thanks for noticing this!

There is one path through udf_setup_indirect_aext() where
mark_buffer_dirty_inode() gets called again after population is
complete, which I suppose could heal a partial writeout, but there is
also another path in which the buffer does not get marked dirty again.
Generally, we add new extents to the created indirect extent which dirties
the buffer and that should fix the problem. But you are definitely right
that the code is suspicious and should be fixed. Will you send a patch?

I did a little archaeology to see how the code evolved to this point. It's been like this a long time.

I also did some research to understand why filesystems use lock_buffer() sometimes but not others. For example, the FAT driver never calls it. I ran across this thread from 2011:

https://lkml.org/lkml/2011/5/16/402

...from which I conclude that while it is correct in a strict sense to hold a lock on a buffer any time its contents are being modified, performance considerations make it preferable (or at least reasonable) to make some modifications without a lock provided it's known that a subsequent write-out will "fix" any potential partial write out before anyone else tries to read the block. I doubt that UDF sees common use with DIF/DIX block devices, which might make a decision in favor of performance a little easier. Since the FAT driver doesn't contain Darrick's proposed changes I assume a decision was made that performance was more important there.

Certainly the call to udf_setup_indirect_aext() from udf_add_aext() meets those criteria. But udf_table_free_blocks() may not dirty the AED block.

So if this looks reasonable I will resend as a formal patch:

--- a/fs/udf/inode.c 2019-03-30 11:28:38.637759458 -0500
+++ b/fs/udf/inode.c 2019-03-30 11:33:00.357761250 -0500
@@ -1873,9 +1873,6 @@ int udf_setup_indirect_aext(struct inode
return -EIO;
lock_buffer(bh);
memset(bh->b_data, 0x00, sb->s_blocksize);
- set_buffer_uptodate(bh);
- unlock_buffer(bh);
- mark_buffer_dirty_inode(bh, inode);
aed = (struct allocExtDesc *)(bh->b_data);
if (!UDF_QUERY_FLAG(sb, UDF_FLAG_STRICT)) {
@@ -1890,6 +1887,9 @@ int udf_setup_indirect_aext(struct inode
udf_new_tag(bh->b_data, TAG_IDENT_AED, ver, 1, block,
sizeof(struct tag));
+ set_buffer_uptodate(bh);
+ unlock_buffer(bh);
+
nepos.block = neloc;
nepos.offset = sizeof(struct allocExtDesc);
nepos.bh = bh;
@@ -1913,6 +1913,8 @@ int udf_setup_indirect_aext(struct inode
} else {
__udf_add_aext(inode, epos, &nepos.block,
sb->s_blocksize | EXT_NEXT_EXTENT_ALLOCDECS, 0);
+ /* Make sure completed AED gets written out */
+ mark_buffer_dirty_inode(nepos.bh, inode);
}
brelse(epos->bh);

------------------------------------------------------------------------
Steven J. Magnani "I claim this network for MARS!
www.digidescorp.com Earthling, return my space modulator!"

#include <standard.disclaimer>