Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

From: Takashi Iwai
Date: Wed Aug 09 2023 - 07:46:08 EST


On Wed, 09 Aug 2023 13:42:09 +0200,
Karol Herbst wrote:
>
> On Wed, Aug 9, 2023 at 11:22 AM Takashi Iwai <tiwai@xxxxxxx> wrote:
> >
> > On Tue, 08 Aug 2023 12:39:32 +0200,
> > Karol Herbst wrote:
> > >
> > > On Mon, Aug 7, 2023 at 5:05 PM Borislav Petkov <bp@xxxxxxxxx> wrote:
> > > >
> > > > On Mon, Aug 07, 2023 at 01:49:42PM +0200, Karol Herbst wrote:
> > > > > in what way does it stop? Just not progressing? That would be kinda
> > > > > concerning. Mind tracing with what arguments `nvkm_uevent_add` is
> > > > > called with and without that patch?
> > > >
> > > > Well, me dumping those args I guess made the box not freeze before
> > > > catching a #PF over serial. Does that help?
> > > >
> > > > ....
> > > > [ 3.410135] Unpacking initramfs...
> > > > [ 3.416319] software IO TLB: mapped [mem 0x00000000a877d000-0x00000000ac77d000] (64MB)
> > > > [ 3.418227] Initialise system trusted keyrings
> > > > [ 3.432273] workingset: timestamp_bits=56 max_order=22 bucket_order=0
> > > > [ 3.439006] ntfs: driver 2.1.32 [Flags: R/W].
> > > > [ 3.443368] fuse: init (API version 7.38)
> > > > [ 3.447601] 9p: Installing v9fs 9p2000 file system support
> > > > [ 3.453223] Key type asymmetric registered
> > > > [ 3.457332] Asymmetric key parser 'x509' registered
> > > > [ 3.462236] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
> > > > [ 3.475865] efifb: probing for efifb
> > > > [ 3.479458] efifb: framebuffer at 0xf9000000, using 1920k, total 1920k
> > > > [ 3.485969] efifb: mode is 800x600x32, linelength=3200, pages=1
> > > > [ 3.491872] efifb: scrolling: redraw
> > > > [ 3.495438] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> > > > [ 3.502349] Console: switching to colour frame buffer device 100x37
> > > > [ 3.509564] fb0: EFI VGA frame buffer device
> > > > [ 3.514013] ACPI: \_PR_.CP00: Found 4 idle states
> > > > [ 3.518850] ACPI: \_PR_.CP01: Found 4 idle states
> > > > [ 3.523687] ACPI: \_PR_.CP02: Found 4 idle states
> > > > [ 3.528515] ACPI: \_PR_.CP03: Found 4 idle states
> > > > [ 3.533346] ACPI: \_PR_.CP04: Found 4 idle states
> > > > [ 3.538173] ACPI: \_PR_.CP05: Found 4 idle states
> > > > [ 3.543003] ACPI: \_PR_.CP06: Found 4 idle states
> > > > [ 3.544219] Freeing initrd memory: 8196K
> > > > [ 3.547844] ACPI: \_PR_.CP07: Found 4 idle states
> > > > [ 3.609542] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > > > [ 3.616224] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > > > [ 3.625552] serial 0000:00:16.3: enabling device (0000 -> 0003)
> > > > [ 3.633034] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
> > > > [ 3.642451] Linux agpgart interface v0.103
> > > > [ 3.647141] ACPI: bus type drm_connector registered
> > > > [ 3.653261] Console: switching to colour dummy device 80x25
> > > > [ 3.659092] nouveau 0000:03:00.0: vgaarb: deactivate vga console
> > > > [ 3.665174] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
> > > > [ 3.784585] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
> > > > [ 3.792244] nouveau 0000:03:00.0: fb: 512 MiB DDR3
> > > > [ 3.948786] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
> > > > [ 3.953755] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
> > > > [ 3.959073] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
> > > > [ 3.964808] nouveau 0000:03:00.0: DRM: DCB version 4.0
> > > > [ 3.969938] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
> > > > [ 3.976367] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
> > > > [ 3.982792] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> > > > [ 3.989223] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
> > > > [ 3.995647] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
> > > > [ 4.002076] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> > > > [ 4.008511] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
> > > > [ 4.014151] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
> > > > [ 4.021710] nvkm_uevent_add: uevent: 0xffff888100242100, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > [ 4.033680] nvkm_uevent_add: uevent: 0xffff888100242300, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > [ 4.045429] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
> > > > [ 4.052059] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> > > > [ 4.067191] nvkm_uevent_add: uevent: 0xffff888100242800, event: 0xffff888104b3e260, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > [ 4.078936] nvkm_uevent_add: uevent: 0xffff888100242900, event: 0xffff888104b3e260, id: 0x1, bits: 0x1, func: 0x0000000000000000
> > > > [ 4.090514] nvkm_uevent_add: uevent: 0xffff888100242a00, event: 0xffff888102091f28, id: 0x1, bits: 0x3, func: 0xffffffff8177b700
> > > > [ 4.102118] tsc: Refined TSC clocksource calibration: 3591.345 MHz
> > > > [ 4.108342] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c4635c383, max_idle_ns: 440795314831 ns
> > > > [ 4.108401] nvkm_uevent_add: uevent: 0xffff8881020b6000, event: 0xffff888102091f28, id: 0xf, bits: 0x3, func: 0xffffffff8177b700
> > > > [ 4.129864] clocksource: Switched to clocksource tsc
> > > > [ 4.131478] [drm] Initialized nouveau 1.3.1 20120801 for 0000:03:00.0 on minor 0
> > > > [ 4.143806] BUG: kernel NULL pointer dereference, address: 0000000000000020
> > >
> > > ahh, that would have been good to know :) Mind figuring out what's
> > > exactly NULL inside nvif_object_mthd? Or rather what line
> > > `nvif_object_mthd+0x136` belongs to, then it should be easy to figure
> > > out what's wrong here.
> >
> > FWIW, we've hit the bug on openSUSE Tumbleweed 6.4.8 kernel:
> > https://bugzilla.suse.com/show_bug.cgi?id=1214073
> > Confirmed that reverting the patch cured the issue.
> >
> > FWIW, loading nouveau showed a refcount_t warning just before the NULL
> > dereference:
> >
>
> mh, I wonder if one of those `return -EINVAL;` branches is hit where
> it wasn't before. Could some of you check if `nvkm_uconn_uevent`
> returns -EINVAL with that patch where it didn't before? I wonder if
> it's the `if (&outp->head == &conn->disp->outps) return -EINVAL;` and
> if remove that fixes the crash?

Please give a patch, then I can build a kernel and let the reporter
testing it :)


thanks,

Takashi