Re: possible deadlock in start_this_handle (2)

From: Tetsuo Handa
Date: Fri Feb 19 2021 - 05:18:05 EST


On 2021/02/15 23:29, Jan Kara wrote:
> On Mon 15-02-21 23:06:15, Tetsuo Handa wrote:
>> On 2021/02/15 21:45, Jan Kara wrote:
>>> On Sat 13-02-21 23:26:37, Tetsuo Handa wrote:
>>>> Excuse me, but it seems to me that nothing prevents
>>>> ext4_xattr_set_handle() from reaching ext4_xattr_inode_lookup_create()
>>>> without memalloc_nofs_save() when hitting ext4_get_nojournal() path.
>>>> Will you explain when ext4_get_nojournal() path is executed?
>>>
>>> That's a good question but sadly I don't think that's it.
>>> ext4_get_nojournal() is called when the filesystem is created without a
>>> journal. In that case we also don't acquire jbd2_handle lockdep map. In the
>>> syzbot report we can see:
>>
>> Since syzbot can test filesystem images, syzbot might have tested a filesystem
>> image created both with and without journal within this boot.
>
> a) I think that syzbot reboots the VM between executing different tests to
> get reproducible conditions. But in theory I agree the test may have
> contained one image with and one image without a journal.

syzkaller reboots the VM upon a crash.
syzkaller can test multiple filesystem images within one boot.

https://storage.googleapis.com/syzkaller/cover/ci-qemu-upstream-386.html (this
statistic snapshot is volatile) reports that ext4_get_nojournal() is partially covered
( https://github.com/google/syzkaller/blob/master/docs/coverage.md ) by syzkaller.

/* Just increment the non-pointer handle value */
static handle_t *ext4_get_nojournal(void)
{
86 handle_t *handle = current->journal_info;
unsigned long ref_cnt = (unsigned long)handle;

BUG_ON(ref_cnt >= EXT4_NOJOURNAL_MAX_REF_COUNT);

86 ref_cnt++;
handle = (handle_t *)ref_cnt;

current->journal_info = handle;
2006 return handle;
}


/* Decrement the non-pointer handle value */
static void ext4_put_nojournal(handle_t *handle)
{
unsigned long ref_cnt = (unsigned long)handle;

85 BUG_ON(ref_cnt == 0);

85 ref_cnt--;
handle = (handle_t *)ref_cnt;

current->journal_info = handle;
}


handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line,
int type, int blocks, int rsv_blocks,
int revoke_creds)
{
journal_t *journal;
int err;

2006 trace_ext4_journal_start(sb, blocks, rsv_blocks, revoke_creds,
2006 _RET_IP_);
2006 err = ext4_journal_check_start(sb);
if (err < 0)
return ERR_PTR(err);

2005 journal = EXT4_SB(sb)->s_journal;
1969 if (!journal || (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY))
2006 return ext4_get_nojournal();
1969 return jbd2__journal_start(journal, blocks, rsv_blocks, revoke_creds,
GFP_NOFS, type, line);
}

>
> *but*
>
> b) as I wrote in the email you are replying to, the jbd2_handle key is
> private per filesystem. Thus for lockdep to complain about
> jbd2_handle->fs_reclaim->jbd2_handle deadlock, those jbd2_handle lockdep
> maps must come from the same filesystem.
>
> *and*
>
> c) filesystem without journal doesn't use jbd2_handle lockdep map at all so
> for such filesystems lockdep creates no dependency for jbd2_handle map.
>

What about "EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)" case?
Does this case happen on filesystem with journal, and can this case be executed
by fuzzing a crafted (a sort of erroneous) filesystem with journal, and are
the jbd2_handle for calling ext4_get_nojournal() case and the jbd2_handle for
calling jbd2__journal_start() case the same?

Also, I worry that jbd2__journal_restart() is problematic, for it calls
stop_this_handle() (which calls memalloc_nofs_restore()) and then calls
start_this_handle() (which fails to call memalloc_nofs_save() if an error
occurs). An error from start_this_handle() from journal restart operation
might need special handling (i.e. either remount-ro or panic) ?