Re: [Linaro-mm-sig] [PATCH 1/2] mm: replace BUG_ON in vm_insert_page with a return of an error

From: Christian König
Date: Thu Feb 04 2021 - 03:18:05 EST


Am 03.02.21 um 22:41 schrieb Suren Baghdasaryan:
[SNIP]
How many semi-unrelated buffer accounting schemes does google come up with?

We're at three with this one.

And also we _cannot_ required that all dma-bufs are backed by struct
page, so requiring struct page to make this work is a no-go.

Second, we do not want to all get_user_pages and friends to work on
dma-buf, it causes all kinds of pain. Yes on SoC where dma-buf are
exclusively in system memory you can maybe get away with this, but
dma-buf is supposed to work in more places than just Android SoCs.
I just realized that vm_inser_page doesn't even work for CMA, it would
upset get_user_pages pretty badly - you're trying to pin a page in
ZONE_MOVEABLE but you can't move it because it's rather special.
VM_SPECIAL is exactly meant to catch this stuff.
Thanks for the input, Daniel! Let me think about the cases you pointed out.

IMHO, the issue with PSS is the difficulty of calculating this metric
without struct page usage. I don't think that problem becomes easier
if we use cgroups or any other API. I wanted to enable existing PSS
calculation mechanisms for the dmabufs known to be backed by struct
pages (since we know how the heap allocated that memory), but sounds
like this would lead to problems that I did not consider.

Yeah, using struct page indeed won't work. We discussed that multiple times now and Daniel even has a patch to mangle the struct page pointers inside the sg_table object to prevent abuse in that direction.

On the other hand I totally agree that we need to do something on this side which goes beyong what cgroups provide.

A few years ago I came up with patches to improve the OOM killer to include resources bound to the processes through file descriptors. I unfortunately can't find them of hand any more and I'm currently to busy to dig them up.

In general I think we need to make it possible that both the in kernel OOM killer as well as userspace processes and handlers have access to that kind of data.

The fdinfo approach as suggested in the other thread sounds like the easiest solution to me.

Regards,
Christian.

Thanks,
Suren.