Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

From: Oleksandr Natalenko
Date: Mon Oct 02 2023 - 07:03:15 EST


/cc Matthew, Andrew (please see below)

On pondělí 2. října 2023 12:42:42 CEST Bagas Sanjaya wrote:
> On Mon, Oct 02, 2023 at 08:20:15AM +0200, Oleksandr Natalenko wrote:
> > Hello.
> >
> > On pondělí 2. října 2023 1:45:44 CEST Bagas Sanjaya wrote:
> > > On Sun, Oct 01, 2023 at 06:32:34PM +0200, Oleksandr Natalenko wrote:
> > > > Hello.
> > > >
> > > > I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:
> > > >
> > > > ```
> > > > BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
> > > >
> > > > Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
> > > > drm_gem_put_pages+0x186/0x250
> > > > drm_gem_shmem_put_pages_locked+0x43/0xc0
> > > > drm_gem_shmem_object_vunmap+0x83/0xe0
> > > > drm_gem_vunmap_unlocked+0x46/0xb0
> > > > drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> > > > drm_fb_helper_damage_work+0x96/0x170
> > > > process_one_work+0x254/0x470
> > > > worker_thread+0x55/0x4f0
> > > > kthread+0xe8/0x120
> > > > ret_from_fork+0x34/0x50
> > > > ret_from_fork_asm+0x1b/0x30
> > > >
> > > > kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k
> > > >
> > > > allocated by task 51 on cpu 0 at 14.668667s:
> > > > drm_gem_get_pages+0x94/0x2b0
> > > > drm_gem_shmem_get_pages+0x5d/0x110
> > > > drm_gem_shmem_object_vmap+0xc4/0x1e0
> > > > drm_gem_vmap_unlocked+0x3c/0x70
> > > > drm_client_buffer_vmap+0x23/0x50
> > > > drm_fbdev_generic_helper_fb_dirty+0xae/0x310
> > > > drm_fb_helper_damage_work+0x96/0x170
> > > > process_one_work+0x254/0x470
> > > > worker_thread+0x55/0x4f0
> > > > kthread+0xe8/0x120
> > > > ret_from_fork+0x34/0x50
> > > > ret_from_fork_asm+0x1b/0x30
> > > >
> > > > freed by task 51 on cpu 0 at 14.668697s:
> > > > drm_gem_put_pages+0x186/0x250
> > > > drm_gem_shmem_put_pages_locked+0x43/0xc0
> > > > drm_gem_shmem_object_vunmap+0x83/0xe0
> > > > drm_gem_vunmap_unlocked+0x46/0xb0
> > > > drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> > > > drm_fb_helper_damage_work+0x96/0x170
> > > > process_one_work+0x254/0x470
> > > > worker_thread+0x55/0x4f0
> > > > kthread+0xe8/0x120
> > > > ret_from_fork+0x34/0x50
> > > > ret_from_fork_asm+0x1b/0x30
> > > >
> > > > CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e
> > > > Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
> > > > Workqueue: events drm_fb_helper_damage_work
> > > > ```
> > > >
> > > > This repeats a couple of times and then stops.
> > > >
> > > > Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.
> > > >
> > > > The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446
> > > >
> > >
> > > Do you have this issue on v6.4?
> >
> > No, I did not have this issue with v6.4.
> >
>
> Then proceed with kernel bisection. You can refer to
> Documentation/admin-guide/bug-bisect.rst in the kernel sources for the
> process.

Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?

In the git log between v6.4 and v6.5 I see this:

```
commit 3291e09a463870610b8227f32b16b19a587edf33
Author: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
Date: Wed Jun 21 17:45:49 2023 +0100

drm: convert drm_gem_put_pages() to use a folio_batch

Remove a few hidden compound_head() calls by converting the returned page
to a folio once and using the folio APIs.
```

Thanks.

--
Oleksandr Natalenko (post-factum)

Attachment: signature.asc
Description: This is a digitally signed message part.