Re: [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty()

From: Michal Hocko
Date: Wed Feb 01 2023 - 03:07:49 EST


On Mon 30-01-23 11:30:47, Yang Shi wrote:
> On Mon, Jan 30, 2023 at 4:20 AM Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote:
> >
> >
> >
> > On 2023/1/30 16:48, Michal Hocko wrote:
> > > On Mon 30-01-23 09:16:13, Kefeng Wang wrote:
> > >>
> > >>
> > >> On 2023/1/30 5:48, Andrew Morton wrote:
> > >>> On Sun, 29 Jan 2023 10:44:51 +0800 Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote:
> > >>>
> > >>>> As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"),
> > >>>
> > >>> Merged in 2017.
> > >>>
> > >>>> hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg
> > >>>> could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could
> > >>>> occurs a NULL pointer dereference, let's do not record the foreign
> > >>>> writebacks for folio memcg is null in mem_cgroup_track_foreign() to
> > >>>> fix it.
> > >>>>
> > >>>> Reported-by: Ma Wupeng <mawupeng1@xxxxxxxxxx>
> > >>>> Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing")
> > >>>
> > >>> Merged in 2019.
> > >>>
> > ...
> > >
> > > Just to make sure I understand. The page has been hwpoisoned, uncharged
> > > but stayed in the page cache so a next page fault on the address has blowned
> > > up?
> > >
> > > Say we address the NULL memcg case. What is the resulting behavior?
> > > Doesn't userspace access a poisoned page and get a silend memory
> > > corruption?
> >
> > + Yang Shi
> >
> > Check previous link[1], seems that it is a known issue, and there is a
> > TODO list for storage backed filesystems from Yang.
>
> For tmpfs and hugetlbfs, the page cache still stay in page cache, the
> later page fault will handle the case gracefully. Other real storage
> backed filesystem will have page cache truncated.
>
> The page cache will be uncharged before truncate. If the truncate
> fails, we may end up in this case.

This would be a good addendum to the changelog. What would be a typical
failure in the truncation path?

> >
> >
> > [1]
> > https://lore.kernel.org/all/20211020210755.23964-6-shy828301@xxxxxxxxx/T/#m1d40559ca2dcf94396df5369214288f69dec379b

--
Michal Hocko
SUSE Labs