Re: [BUG] 2.6.23-git18 Kernel oops in sg helpers

From: FUJITA Tomonori
Date: Wed Oct 24 2007 - 18:23:06 EST


On Wed, 24 Oct 2007 21:38:30 +0530
Kamalesh Babulal <kamalesh@xxxxxxxxxxxxxxxxxx> wrote:

> FUJITA Tomonori wrote:
> > On Wed, 24 Oct 2007 12:54:36 +0100
> > Andy Whitcroft <apw@xxxxxxxxxxxx> wrote:
> >
> >> On Tue, Oct 23, 2007 at 08:44:20PM +0200, Jens Axboe wrote:
> >>> On Tue, Oct 23 2007, Kamalesh Babulal wrote:
> >>>> Hi,
> >>>>
> >>>> Kernel oops is triggered while running fsx-linux test, followed by cpu softlock
> >>>> over the AMD box
> >>>>
> >>>> Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP:
> >>>> [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
> >>>> PGD 10185b067 PUD 10075b067 PMD 0
> >>>> Oops: 0002 [1] SMP
> >>>> CPU 3
> >>>> Modules linked in:
> >>>> Pid: 18676, comm: fsx-linux Not tainted 2.6.23-git18-autokern1 #1
> >>>> RIP: 0010:[<ffffffff8021f2f6>] [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
> >>>> RSP: 0000:ffff810181edf948 EFLAGS: 00010002
> >>> Can you check where gart_map_sg+0x26c is at? Make sure you have
> >>> CONFIG_DEBUG_INFO defined, then do:
> >>>
> >>> $ gdb vmlinux
> >>> $ l *gart_map_sg+0x26c
> >> Ok, this problem still seems to be about in 2.6.24-rc1. Here is the gdb
> >> output from that version, the panic (also below) seems the same:
> >>
> >> (gdb) l *gart_map_sg+0x26c
> >> 0xffffffff8022011e is in gart_map_sg (arch/x86/kernel/pci-gart_64.c:433).
> >> 428 goto error;
> >> 429 out++;
> >> 430 flush_gart();
> >> 431 if (out < nents) {
> >> 432 sgmap = sg_next(sgmap);
> >> 433 sgmap->dma_length = 0;
> >> 434 }
> >> 435 return out;
> >> 436
> >> 437 error:
> >>
> >> So it seems sg_next has returned 0.
> >
> > Have you tried this?
> >
> > http://marc.info/?l=linux-kernel&m=119317981406073&w=2
> > -
> Hi,
> Thanks, this patch solves the kernel oops.

Thanks for testing!

Jens, here's the proper changelog.

-
From: FUJITA Tomonori <fujita.tomonori@xxxxxxxxxxxxx>
Subject: [PATCH] x86: pci-gart fix

map_sg could copy the last sg element to another position (if merging
some elements). It breaks sg chaining. This copies only
dma_address/length instead of the whole sg element.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@xxxxxxxxxxxxx>
---
arch/x86/kernel/pci-gart_64.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c
index c56e9ee..ae7e016 100644
--- a/arch/x86/kernel/pci-gart_64.c
+++ b/arch/x86/kernel/pci-gart_64.c
@@ -338,7 +338,6 @@ static int __dma_map_cont(struct scatterlist *start, int nelems,

BUG_ON(s != start && s->offset);
if (s == start) {
- *sout = *s;
sout->dma_address = iommu_bus_base;
sout->dma_address += iommu_page*PAGE_SIZE + s->offset;
sout->dma_length = s->length;
@@ -365,7 +364,7 @@ static inline int dma_map_cont(struct scatterlist *start, int nelems,
{
if (!need) {
BUG_ON(nelems != 1);
- *sout = *start;
+ sout->dma_address = start->dma_address;
sout->dma_length = start->length;
return 0;
}
--
1.5.2.4

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/