Re: [PATCH 1/6] ext2 balloc: fix _with_rsv freeze

From: Mingming Cao
Date: Tue Nov 28 2006 - 14:26:45 EST


On Tue, 2006-11-28 at 17:40 +0000, Hugh Dickins wrote:
> After several days of testing ext2 with reservations, it got caught inside
> ext2_try_to_allocate_with_rsv: alloc_new_reservation repeatedly succeeding
> on the window [12cff,12d0e], ext2_try_to_allocate repeatedly failing to
> find the free block guaranteed to be included (unless there's contention).
>

Hmm, I suspect there is other issue: alloc_new_reservation should not
repeatedly allocating the same window, if ext2_try_to_allocate
repeatedly fails to find a free block in that window.
find_next_reservable_window() takes my_rsv (the old window that he
thinks there is no free block) as a guide to find a window "after" the
end block of my_rsv, so how could this happen?


> Fix the range to find_next_usable_block's memscan: the scan from "here"
> (0xcfe) up to (but excluding) "maxblocks" (0xd0e) needs to scan 3 bytes
> not 2 (the relevant bytes of bitmap in this case being f7 df ff - none
> 00, but the premature cutoff implying that the last was found 00).
>

alloc_new_reservation() reserved a window with free block, when come to
the time to claim it, it scans the window again. So it seems that the
range of the the scan is too small:

p = ((char *)bh->b_data) + (here >> 3);
r = memscan(p, 0, (maxblocks - here + 7) >> 3);
next = (r - ((char *)bh->b_data)) << 3;

---------------------> next is -1
if (next < maxblocks && next >= here)
return next;

----------------------> falls to false branch

here = bitmap_search_next_usable_block(here, bh, maxblocks);
return here;

So we failed to find a free byte in the range. That's seems fine to me.
It's only a nice thing to have -- try to allocate a block in a place
where it's neighbors are all free also. If it fails, it will search the
window bit by bit. So I don't understand why it is not being recovered
by bitmap_search_next_usable_block(), which test the bitmap bit by bit?

> Is this a problem for mainline ext2? No, because the "size" in its memscan
> is always EXT2_BLOCKS_PER_GROUP(sb), which mkfs.ext2 requires to be a
> multiple of 8. Is this a problem for ext3 or ext4? No, because they have
> an additional extN_test_allocatable test which rescues them from the error.
>
Hmm, if the error is it prematurely think there is no free block in the
range (bitmap on disk), then even in ext3/4, it will not bother checking
the jbd copy of the bitmap. I am not sure this is the cause that ext3/4
may not has the problem.

> But the bigger question is, why does the my_rsv case come here to
> find_next_usable_block at all?

Because grp_goal is -1?

> Doesn't its 64-bit boundary limit, and its
> memscan, blithely ignore what the reservation prepared?

I agree with you that the double check is urgly. But it's necessary:( If
there to prevent contention: other file make steal that free block we
reserved for this file, in the case filesystem is full of reservation...

> It's messy too,
> the complement of the memscan being that "i < 7" loop over in
> ext2_try_to_allocate. I think this ought to be cleaned up,
> in ext2+reservations and ext3 and ext4.
>
The "i<7" loop there is for non reservation case. Since
find_next_usable_block() could find a free byte, it's trying to avoid
filesystem holes by shifting the start of the free block for at most 7
times.


Thanks!

Mingming
> fs/ext2/balloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- 2.6.19-rc6-mm2/fs/ext2/balloc.c 2006-11-24 08:18:02.000000000 +0000
> +++ linux/fs/ext2/balloc.c 2006-11-27 19:28:41.000000000 +0000
> @@ -570,7 +570,7 @@ find_next_usable_block(int start, struct
> here = 0;
>
> p = ((char *)bh->b_data) + (here >> 3);
> - r = memscan(p, 0, (maxblocks - here + 7) >> 3);
> + r = memscan(p, 0, ((maxblocks + 7) >> 3) - (here >> 3));
> next = (r - ((char *)bh->b_data)) << 3;
>
> if (next < maxblocks && next >= here)
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/