Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

From: Takashi Iwai
Date: Wed Aug 09 2023 - 09:13:28 EST


On Wed, 09 Aug 2023 14:19:23 +0200,
Karol Herbst wrote:
>
> On Wed, Aug 9, 2023 at 1:46 PM Takashi Iwai <tiwai@xxxxxxx> wrote:
> >
> > On Wed, 09 Aug 2023 13:42:09 +0200,
> > Karol Herbst wrote:
> > >
> > > On Wed, Aug 9, 2023 at 11:22 AM Takashi Iwai <tiwai@xxxxxxx> wrote:
> > > >
> > > > On Tue, 08 Aug 2023 12:39:32 +0200,
> > > > Karol Herbst wrote:
> > > > >
> > > > > On Mon, Aug 7, 2023 at 5:05 PM Borislav Petkov <bp@xxxxxxxxx> wrote:
> > > > > >
> > > > > > On Mon, Aug 07, 2023 at 01:49:42PM +0200, Karol Herbst wrote:
> > > > > > > in what way does it stop? Just not progressing? That would be kinda
> > > > > > > concerning. Mind tracing with what arguments `nvkm_uevent_add` is
> > > > > > > called with and without that patch?
> > > > > >
> > > > > > Well, me dumping those args I guess made the box not freeze before
> > > > > > catching a #PF over serial. Does that help?
> > > > > >
> > > > > > ....
> > > > > > [ 3.410135] Unpacking initramfs...
> > > > > > [ 3.416319] software IO TLB: mapped [mem 0x00000000a877d000-0x00000000ac77d000] (64MB)
> > > > > > [ 3.418227] Initialise system trusted keyrings
> > > > > > [ 3.432273] workingset: timestamp_bits=56 max_order=22 bucket_order=0
> > > > > > [ 3.439006] ntfs: driver 2.1.32 [Flags: R/W].
> > > > > > [ 3.443368] fuse: init (API version 7.38)
> > > > > > [ 3.447601] 9p: Installing v9fs 9p2000 file system support
> > > > > > [ 3.453223] Key type asymmetric registered
> > > > > > [ 3.457332] Asymmetric key parser 'x509' registered
> > > > > > [ 3.462236] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
> > > > > > [ 3.475865] efifb: probing for efifb
> > > > > > [ 3.479458] efifb: framebuffer at 0xf9000000, using 1920k, total 1920k
> > > > > > [ 3.485969] efifb: mode is 800x600x32, linelength=3200, pages=1
> > > > > > [ 3.491872] efifb: scrolling: redraw
> > > > > > [ 3.495438] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> > > > > > [ 3.502349] Console: switching to colour frame buffer device 100x37
> > > > > > [ 3.509564] fb0: EFI VGA frame buffer device
> > > > > > [ 3.514013] ACPI: \_PR_.CP00: Found 4 idle states
> > > > > > [ 3.518850] ACPI: \_PR_.CP01: Found 4 idle states
> > > > > > [ 3.523687] ACPI: \_PR_.CP02: Found 4 idle states
> > > > > > [ 3.528515] ACPI: \_PR_.CP03: Found 4 idle states
> > > > > > [ 3.533346] ACPI: \_PR_.CP04: Found 4 idle states
> > > > > > [ 3.538173] ACPI: \_PR_.CP05: Found 4 idle states
> > > > > > [ 3.543003] ACPI: \_PR_.CP06: Found 4 idle states
> > > > > > [ 3.544219] Freeing initrd memory: 8196K
> > > > > > [ 3.547844] ACPI: \_PR_.CP07: Found 4 idle states
> > > > > > [ 3.609542] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > > > > > [ 3.616224] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > > > > > [ 3.625552] serial 0000:00:16.3: enabling device (0000 -> 0003)
> > > > > > [ 3.633034] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
> > > > > > [ 3.642451] Linux agpgart interface v0.103
> > > > > > [ 3.647141] ACPI: bus type drm_connector registered
> > > > > > [ 3.653261] Console: switching to colour dummy device 80x25
> > > > > > [ 3.659092] nouveau 0000:03:00.0: vgaarb: deactivate vga console
> > > > > > [ 3.665174] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
> > > > > > [ 3.784585] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
> > > > > > [ 3.792244] nouveau 0000:03:00.0: fb: 512 MiB DDR3
> > > > > > [ 3.948786] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
> > > > > > [ 3.953755] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
> > > > > > [ 3.959073] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
> > > > > > [ 3.964808] nouveau 0000:03:00.0: DRM: DCB version 4.0
> > > > > > [ 3.969938] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
> > > > > > [ 3.976367] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
> > > > > > [ 3.982792] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> > > > > > [ 3.989223] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
> > > > > > [ 3.995647] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
> > > > > > [ 4.002076] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> > > > > > [ 4.008511] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
> > > > > > [ 4.014151] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
> > > > > > [ 4.021710] nvkm_uevent_add: uevent: 0xffff888100242100, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > [ 4.033680] nvkm_uevent_add: uevent: 0xffff888100242300, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > [ 4.045429] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
> > > > > > [ 4.052059] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> > > > > > [ 4.067191] nvkm_uevent_add: uevent: 0xffff888100242800, event: 0xffff888104b3e260, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > [ 4.078936] nvkm_uevent_add: uevent: 0xffff888100242900, event: 0xffff888104b3e260, id: 0x1, bits: 0x1, func: 0x0000000000000000
> > > > > > [ 4.090514] nvkm_uevent_add: uevent: 0xffff888100242a00, event: 0xffff888102091f28, id: 0x1, bits: 0x3, func: 0xffffffff8177b700
> > > > > > [ 4.102118] tsc: Refined TSC clocksource calibration: 3591.345 MHz
> > > > > > [ 4.108342] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c4635c383, max_idle_ns: 440795314831 ns
> > > > > > [ 4.108401] nvkm_uevent_add: uevent: 0xffff8881020b6000, event: 0xffff888102091f28, id: 0xf, bits: 0x3, func: 0xffffffff8177b700
> > > > > > [ 4.129864] clocksource: Switched to clocksource tsc
> > > > > > [ 4.131478] [drm] Initialized nouveau 1.3.1 20120801 for 0000:03:00.0 on minor 0
> > > > > > [ 4.143806] BUG: kernel NULL pointer dereference, address: 0000000000000020
> > > > >
> > > > > ahh, that would have been good to know :) Mind figuring out what's
> > > > > exactly NULL inside nvif_object_mthd? Or rather what line
> > > > > `nvif_object_mthd+0x136` belongs to, then it should be easy to figure
> > > > > out what's wrong here.
> > > >
> > > > FWIW, we've hit the bug on openSUSE Tumbleweed 6.4.8 kernel:
> > > > https://bugzilla.suse.com/show_bug.cgi?id=1214073
> > > > Confirmed that reverting the patch cured the issue.
> > > >
> > > > FWIW, loading nouveau showed a refcount_t warning just before the NULL
> > > > dereference:
> > > >
> > >
> > > mh, I wonder if one of those `return -EINVAL;` branches is hit where
> > > it wasn't before. Could some of you check if `nvkm_uconn_uevent`
> > > returns -EINVAL with that patch where it didn't before? I wonder if
> > > it's the `if (&outp->head == &conn->disp->outps) return -EINVAL;` and
> > > if remove that fixes the crash?
> >
> > Please give a patch, then I can build a kernel and let the reporter
> > testing it :)
> >
>
> attached a patch.

Thanks. Now I'm building a test kernel and asked the reporter for
testing it.

> Anyway, I'll be on PTO for the rest of the week and I kinda wished
> somebody else would have time to figure out what's going wrong there,
> or at least simply figuring out what the difference is. Not having
> direct access to such a GPU also makes it a bit harder. Once I'm back
> I'll check with all my GPUs if there is one hitting a difference here,
> but the ones I've tested it with so far were all fine sadly.

If this can't be fixed quickly, I suppose it's safer to revert it from
6.4.y for now. 6.5 is still being cooked, but 6.4.x is already in
wide deployment, hence the regression has to be addressed quickly.


Takashi

>
> >
> > thanks,
> >
> > Takashi
> >
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/uconn.c b/drivers/gpu/drm/nouveau/nvkm/engine/disp/uconn.c
> index 46b057fe1412e..3666dfb7ecbf4 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/uconn.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/uconn.c
> @@ -85,8 +85,8 @@ nvkm_uconn_uevent(struct nvkm_object *object, void *argv, u32 argc, struct nvkm_
> break;
> }
>
> - if (&outp->head == &conn->disp->outps)
> - return -EINVAL;
> +// if (&outp->head == &conn->disp->outps)
> +// return -EINVAL;
>
> if (outp->dp.aux && !outp->info.location) {
> if (args->v0.types & NVIF_CONN_EVENT_V0_PLUG ) bits |= NVKM_I2C_PLUG;