Re: [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty()

From: Yang Shi
Date: Wed Feb 01 2023 - 12:21:53 EST


On Wed, Feb 1, 2023 at 12:07 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Mon 30-01-23 11:30:47, Yang Shi wrote:
> > On Mon, Jan 30, 2023 at 4:20 AM Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote:
> > >
> > >
> > >
> > > On 2023/1/30 16:48, Michal Hocko wrote:
> > > > On Mon 30-01-23 09:16:13, Kefeng Wang wrote:
> > > >>
> > > >>
> > > >> On 2023/1/30 5:48, Andrew Morton wrote:
> > > >>> On Sun, 29 Jan 2023 10:44:51 +0800 Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote:
> > > >>>
> > > >>>> As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"),
> > > >>>
> > > >>> Merged in 2017.
> > > >>>
> > > >>>> hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg
> > > >>>> could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could
> > > >>>> occurs a NULL pointer dereference, let's do not record the foreign
> > > >>>> writebacks for folio memcg is null in mem_cgroup_track_foreign() to
> > > >>>> fix it.
> > > >>>>
> > > >>>> Reported-by: Ma Wupeng <mawupeng1@xxxxxxxxxx>
> > > >>>> Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing")
> > > >>>
> > > >>> Merged in 2019.
> > > >>>
> > > ...
> > > >
> > > > Just to make sure I understand. The page has been hwpoisoned, uncharged
> > > > but stayed in the page cache so a next page fault on the address has blowned
> > > > up?
> > > >
> > > > Say we address the NULL memcg case. What is the resulting behavior?
> > > > Doesn't userspace access a poisoned page and get a silend memory
> > > > corruption?
> > >
> > > + Yang Shi
> > >
> > > Check previous link[1], seems that it is a known issue, and there is a
> > > TODO list for storage backed filesystems from Yang.
> >
> > For tmpfs and hugetlbfs, the page cache still stay in page cache, the
> > later page fault will handle the case gracefully. Other real storage
> > backed filesystem will have page cache truncated.
> >
> > The page cache will be uncharged before truncate. If the truncate
> > fails, we may end up in this case.
>
> This would be a good addendum to the changelog. What would be a typical
> failure in the truncation path?

For memory failure path, there may be a couple of cases, for example,
page is not for a regular file (maybe directory), fail to release
buffers, etc.

>
> > >
> > >
> > > [1]
> > > https://lore.kernel.org/all/20211020210755.23964-6-shy828301@xxxxxxxxx/T/#m1d40559ca2dcf94396df5369214288f69dec379b
>
> --
> Michal Hocko
> SUSE Labs