Re: [REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

From: Oleksandr Natalenko
Date: Thu Oct 05 2023 - 12:15:24 EST


Hello.

On čtvrtek 5. října 2023 14:19:44 CEST Matthew Wilcox wrote:
> On Thu, Oct 05, 2023 at 09:56:03AM +0200, Oleksandr Natalenko wrote:
> > Hello.
> >
> > On čtvrtek 5. října 2023 9:44:42 CEST Thomas Zimmermann wrote:
> > > Hi
> > >
> > > Am 02.10.23 um 17:38 schrieb Oleksandr Natalenko:
> > > > On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:
> > > >> On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:
> > > >>>>>>> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
> > > >>>>>>>
> > > >>>>>>> Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
> > > >>>>>>> drm_gem_put_pages+0x186/0x250
> > > >>>>>>> drm_gem_shmem_put_pages_locked+0x43/0xc0
> > > >>>>>>> drm_gem_shmem_object_vunmap+0x83/0xe0
> > > >>>>>>> drm_gem_vunmap_unlocked+0x46/0xb0
> > > >>>>>>> drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
> > > >>>>>>> drm_fb_helper_damage_work+0x96/0x170
> > > >>>
> > > >>> Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?
> > > >>
> > > >> Yes, entirely plausible. I think you have two useful points to look at
> > > >> before delving into a full bisect -- 863a8e and the parent of 0b62af.
> > > >> If either of them work, I think you have no more work to do.
> > > >
> > > > OK, I've did this against v6.5.5:
> > > >
> > > > ```
> > > > git log --oneline HEAD~3..
> > > > 7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec"
> > > > 8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()"
> > > > fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch"
> > > > ```
> > > >
> > > > then rebooted the host multiple times, and the issue is not seen any more.
> > > >
> > > > So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.
> > >
> > > Ignore my other email. It's apparently been fixed already. Thanks!
> >
> > Has it? I think I was able to identify offending commit, but I'm not aware of any fix to that.
>
> I don't understand; you said reverting those DRM commits fixed the
> problem, so 863a8eb3f270 is the solution. No?

No-no, sorry for possible confusion. Let me explain again:

1. we had an issue with i915, which was introduced by 0b62af28f249, and later was fixed by 863a8eb3f270
2. now I've discovered another issue, which looks very similar to 1., but in a VM with Cirrus VGA, and it happens even while having 863a8eb3f270 applied
3. I've tried reverting 3291e09a4638, after which I cannot reproduce the issue with Cirrus VGA, but clearly there was no fix for it discussed

IOW, 863a8eb3f270 is the fix for 0b62af28f249, but not for 3291e09a4638. It looks like 3291e09a4638 requires a separate fix.

Hope this gets clear.

Thanks.

--
Oleksandr Natalenko (post-factum)

Attachment: signature.asc
Description: This is a digitally signed message part.