Re: Problem in log_do_checkpoint()?

From: Badari Pulavarty
Date: Fri Apr 08 2005 - 12:28:37 EST


I get OOPs in log_do_checkpoint() while using ext3 quotas.
Is this anyway related to what you are working on ?

Unable to handle kernel NULL pointer dereference at virtual address
00000000
printing eip:
801aeee1
*pde = 52b31001
Oops: 0002 [#1]
PREEMPT SMP
Modules linked in:
CPU: 3
EIP: 0060:[<801aeee1>] Not tainted VLI
EFLAGS: 00010213 (2.6.11-22)
EIP is at log_do_checkpoint+0x91/0x220
eax: 00000002 ebx: b7d09e0c ecx: 00000001 edx: e24a2000
esi: 00000000 edi: c4bac47c ebp: cceb726c esp: e24a2d18
ds: 007b es: 007b ss: 0068
Process rm (pid: 8694, threadinfo=e24a2000 task=f7b79040)
Stack: f7dc70e4 a1d60b3c e24a2d44 e24a2d3c e24a2d40 e24a2000 00004df4
a6062200
f7dc70e4 00000000 00000000 95447db0 95447e4c ec6c1d7c b52210e4
ec032b40
ec032b0c 936a5800 e5a262b8 95447cac eb4c4354 936a57cc 936a5798
ac0e93bc
Call Trace:
[<801ae94f>] __log_wait_for_space+0x9f/0xc0
[<801a9b42>] start_this_handle+0x132/0x3f0
[<8012f720>] autoremove_wake_function+0x0/0x60
[<8012f720>] autoremove_wake_function+0x0/0x60
[<801a9efd>] journal_start+0xad/0xe0
[<801a68b1>] ext3_dquot_initialize+0x51/0x70
[<801a2d0d>] ext3_rmdir+0x4d/0x1c0
[<8031df76>] _spin_lock+0x16/0x90
[<80168aa9>] vfs_rmdir+0x189/0x230
[<80168be9>] sys_rmdir+0x99/0xf0
[<8010272f>] syscall_call+0x7/0xb
Code: 8b 54 24 1c 89 5c 24 28 8b 40 04 89 44 24 18 8b 5a 28 8b 6b 2c 89
df 8d 76 00 89 fb b8 01 00 00 00 8b 7f 28 8b 33 e8 cf 76 f6 ff <f0> 0f
ba 2e 13 19 c0 85 c0 0f 85 3f 01 00 00 89 5c 24 04 8d 44

Thanks,
Badari



On Mon, 2005-04-04 at 02:04, Jan Kara wrote:
> Hello,
>
> I've been looking through the JBD code when trying to understand the
> assertion failure in log_do_checkpoint() (it was on old SUSE 2.6.5 kernel
> though the reporter claims to be able to get the failure even with the
> Stephen's patch fixing a race with journal_put_journal_head()) and I've
> spotted one place where I think could be a race (the code around there
> seems to be the same in latest kernels):
> In log_do_checkpoint() we go through the t_checkpoint_list of a
> transaction and call __flush_buffer() on each buffer. Suppose there is
> just one buffer on the list and it is dirty. __flush_buffer() sees it and
> puts it to an array of buffers for flushing. Then the loop finishes,
> retry=0, drop_count=0, batch_count=1. So __flush_batch() is called - we
> drop all locks and sleep. While we are sleeping somebody else comes and
> makes the buffer dirty again (OK, that is not probable, but I think it
> could be possible). Now we wake up and call __cleanup_transaction().
> It's not able to do anything and returns 0. And we fail on the assertion
> J_ASSERT(drop_count != 0 || cleanup_ret != 0).
> Am I missing something? In my opinion we should set retry=1 after we
> call __flush_batch() even if we call it outside of the "__flush_buffer-loop"...
>
> Honza

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/