RE: PAT wc & vmap mapping count issue ?

From: Pallipadi, Venkatesh
Date: Thu Jul 30 2009 - 13:58:53 EST




>-----Original Message-----
>From: Jerome Glisse [mailto:glisse@xxxxxxxxxxxxxxx]
>Sent: Thursday, July 30, 2009 10:07 AM
>To: linux-kernel@xxxxxxxxxxxxxxx
>Cc: Pallipadi, Venkatesh
>Subject: Re: PAT wc & vmap mapping count issue ?
>
>On Thu, 2009-07-30 at 13:11 +0200, Jerome Glisse wrote:
>> Hello,
>>
>> I think i am facing a PAT issue code (at bottom of the mail) leads
>> to mapping count issue such as one at bottom of mail. Is my test
>> code buggy ? If so what is wrong with it ? Otherwise how could i
>> track this down ? (Tested with lastest Linus tree). Note that
>> the mapping count sometimes is negative, sometimes it's positive
>> but without proper mapping.
>>
>> (With AMD Athlon(tm) Dual Core Processor 4450e)
>>
>> Note that bad page might takes time to happen 256 pages is bit
>> too little either increasing that or doing memory hungry task
>> will helps triggering the bug faster.
>>
>> Cheers,
>> Jerome
>>
>> Jul 30 11:12:36 localhost kernel: BUG: Bad page state in process bash
>> pfn:6daed
>> Jul 30 11:12:36 localhost kernel: page:ffffea0001b6bb40
>> flags:4000000000000000 count:1 mapcount:1 mapping:(null) index:6d8
>> Jul 30 11:12:36 localhost kernel: Pid: 1876, comm: bash Not tainted
>> 2.6.31-rc2 #30
>> Jul 30 11:12:36 localhost kernel: Call Trace:
>> Jul 30 11:12:36 localhost kernel: [<ffffffff81098570>] bad_page
>> +0xf8/0x10d
>> Jul 30 11:12:36 localhost kernel: [<ffffffff810997aa>]
>> get_page_from_freelist+0x357/0x475
>> Jul 30 11:12:36 localhost kernel: [<ffffffff810a72e3>] ? cond_resched
>> +0x9/0xb
>> Jul 30 11:12:36 localhost kernel: [<ffffffff810a9958>] ?
>copy_page_range
>> +0x4cc/0x558
>> Jul 30 11:12:36 localhost kernel: [<ffffffff810999e0>]
>> __alloc_pages_nodemask+0x118/0x562
>> Jul 30 11:12:36 localhost kernel: [<ffffffff812a92c3>] ?
>> _spin_unlock_irq+0xe/0x11
>> Jul 30 11:12:36 localhost kernel: [<ffffffff810a9dda>]
>> alloc_pages_node.clone.0+0x14/0x16
>> Jul 30 11:12:36 localhost kernel: [<ffffffff810aa0b1>] do_wp_page
>> +0x2d5/0x57d
>> Jul 30 11:12:36 localhost kernel: [<ffffffff810aac00>]
>handle_mm_fault
>> +0x586/0x5e0
>> Jul 30 11:12:36 localhost kernel: [<ffffffff812ab635>] do_page_fault
>> +0x20a/0x21f
>> Jul 30 11:12:36 localhost kernel: [<ffffffff812a968f>] page_fault
>> +0x1f/0x30
>> Jul 30 11:12:36 localhost kernel: Disabling lock debugging
>due to kernel
>> taint
>>
>> #define NPAGEST 256
>> void test_wc(void)
>> {
>> struct page *pages[NPAGEST];
>> int i, j;
>> void *virt;
>>
>> for (i = 0; i < NPAGEST; i++) {
>> pages[i] = NULL;
>> }
>> for (i = 0; i < NPAGEST; i++) {
>> pages[i] = alloc_page(__GFP_DMA32 | GFP_USER);
>> if (pages[i] == NULL) {
>> printk(KERN_ERR "Failled allocating
>page %d\n",
>> i);
>> goto out_free;
>> }
>> if (!PageHighMem(pages[i]))
>> if (set_memory_wc((unsigned long)
>> page_address(pages[i]), 1)) {
>> printk(KERN_ERR "Failled
>setting page %d
>> wc\n", i);
>> goto out_free;
>> }
>> }
>> virt = vmap(pages, NPAGEST, 0,
>> pgprot_writecombine(PAGE_KERNEL));
>> if (virt == NULL) {
>> printk(KERN_ERR "Failled vmapping\n");
>> goto out_free;
>> }
>> vunmap(virt);
>> out_free:
>> for (i = 0; i < NPAGEST; i++) {
>> if (pages[i]) {
>> if (!PageHighMem(pages[i]))
>> set_memory_wb((unsigned long)
>> page_address(pages[i]), 1);
>> __free_page(pages[i]);
>> }
>> }
>> }
>
>vmaping doesn't seems to be involved with the corruption simply
>setting some pages with set_memory_wc is enough.
>

Hmm.. We have been able to reproduce a problem with code similar to above,
but the exact failure seems to be slightly different than one reported here.
Digging it a bit more to see what exactly is going on here. Will get back.....

Thanks,
Venki--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/