Re: [RFC] per thread page reservation patch

From: Nikita Danilov
Date: Sat Jan 08 2005 - 07:46:57 EST


Andrew Morton <akpm@xxxxxxxx> writes:

> Nikita Danilov <nikita@xxxxxxxxxxxxx> wrote:
>>
>> >
>> > Why does the filesystem risk going oom during the rebalance anyway? Is it
>> > doing atomic allocations?
>>
>> No, just __alloc_pages(GFP_KERNEL, 0, ...) returns NULL. When this
>> happens, the only thing balancing can do is to panic.
>
> __alloc_pages(GFP_KERNEL, ...) doesn't return NULL. It'll either succeed
> or never return ;) That behaviour may change at any time of course, but it

Hmm... it used to, when I wrote that code.

> does make me wonder why we're bothering with this at all. Maybe it's
> because of the possibility of a GFP_IO failure under your feet or
> something?

This is what happens:

- we start inserting new item into balanced tree,

- lock nodes on the leaf level and modify them

- go to the parent level

- lock nodes on the parent level and modify them. This may require
allocating new nodes. If allocation fails---we have to panic, because
tree is in inconsistent state and there is no roll-back; if allocation
hangs forever---deadlock is on its way, because we are still keeping
locks on nodes on the leaf level.

>
> What happens if reiser4 simply doesn't use this code?

At the time I tested it, it panicked after getting NULL from
__alloc_pages(). With current `do_retry' logic in __alloc_pages() it
will deadlock, I guess.

>
>
> If we introduce this mechanism, people will end up using it all over the
> place. Probably we could remove radix_tree_preload(), which is the only
> similar code I can I can immediately think of.
>
> Page reservation is not a bad thing per-se, but it does need serious
> thought.
>
> How does reiser4 end up deciding how many pages to reserve? Gross
> overkill?

Worst-case behavior of tree algorithms is well-known. Yes it's overkill.

Nikita.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/