Re: [BUG] infinite loop in find_get_pages()

From: Andrew Morton
Date: Tue Sep 13 2011 - 19:54:16 EST


On Tue, 13 Sep 2011 21:23:21 +0200
Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:

> Linus,
>
> It seems current kernels (3.1.0-rc6) are really unreliable, or maybe I
> expect too much from them.
>
> On my 4GB x86_64 machine (2 quad-core cpus, 2 threads per core), I can
> have a cpu locked in
>
> find_get_pages -> radix_tree_gang_lookup_slot -> __lookup
>
>
> Problem is : A bisection will be very hard, since a lot of kernels
> simply destroy my disk (the PCI MRRS horror stuff).

Yes, that's hard. Quite often my bisection efforts involve moving to a
new bisection point then hand-applying a few patches to make the the
thing compile and/or work.

There have only been three commits to radix-tree.c this year, so a bit
of manual searching through those would be practical?

> Messages at console :
>
> INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by
> 11 t=60002 jiffies)
>
> perf top -C 1
>
> Events: 3K cycles
> + 43,08% bash [kernel.kallsyms] [k] __lookup
> + 41,51% bash [kernel.kallsyms] [k] find_get_pages
> + 15,31% bash [kernel.kallsyms] [k] radix_tree_gang_lookup_slot
>
> 43.08% bash [kernel.kallsyms] [k] __lookup
> |
> --- __lookup
> |
> |--97.09%-- radix_tree_gang_lookup_slot
> | find_get_pages
> | pagevec_lookup
> | invalidate_mapping_pages
> | drop_pagecache_sb
> | iterate_supers
> | drop_caches_sysctl_handler
> | proc_sys_call_handler.isra.3
> | proc_sys_write
> | vfs_write
> | sys_write
> | system_call_fastpath
> | __write
> |
>
>
> Steps to reproduce :
>
> In one terminal, kernel builds in a loop (defconfig + hpsa driver)
>
> cd /usr/src/linux
> while :
> do
> make clean
> make -j128
> done
>
>
> In another term :
>
> while :
> do
> echo 3 >/proc/sys/vm/drop_caches
> sleep 20
> done
>

This is a regression? 3.0 is OK?

Also, do you know that the hang is happening at the radix-tree level?
It might be at the filemap.c level or at the superblock level and we
just end up spending most cycles at the lower levels because they're
called so often? The iterate_supers/drop_pagecache_sb code is fairly
recent.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/