Re: [PATCH v19 22/30] drm/shmem-helper: Add common memory shrinker

From: Dmitry Osipenko
Date: Mon Jan 29 2024 - 01:16:23 EST


On 1/26/24 21:12, Boris Brezillon wrote:
> On Fri, 26 Jan 2024 19:27:49 +0300
> Dmitry Osipenko <dmitry.osipenko@xxxxxxxxxxxxx> wrote:
>
>> On 1/26/24 12:55, Boris Brezillon wrote:
>>> On Fri, 26 Jan 2024 00:56:47 +0300
>>> Dmitry Osipenko <dmitry.osipenko@xxxxxxxxxxxxx> wrote:
>>>
>>>> On 1/25/24 13:19, Boris Brezillon wrote:
>>>>> On Fri, 5 Jan 2024 21:46:16 +0300
>>>>> Dmitry Osipenko <dmitry.osipenko@xxxxxxxxxxxxx> wrote:
>>>>>
>>>>>> +static bool drm_gem_shmem_is_evictable(struct drm_gem_shmem_object *shmem)
>>>>>> +{
>>>>>> + return (shmem->madv >= 0) && shmem->base.funcs->evict &&
>>>>>> + refcount_read(&shmem->pages_use_count) &&
>>>>>> + !refcount_read(&shmem->pages_pin_count) &&
>>>>>> + !shmem->base.dma_buf && !shmem->base.import_attach &&
>>>>>> + !shmem->evicted;
>>>>>
>>>>> Are we missing
>>>>>
>>>>> && dma_resv_test_signaled(shmem->base.resv,
>>>>> DMA_RESV_USAGE_BOOKKEEP)
>>>>>
>>>>> to make sure the GPU is done using the BO?
>>>>> The same applies to drm_gem_shmem_is_purgeable() BTW.
>>>>>
>>>>> If you don't want to do this test here, we need a way to let drivers
>>>>> provide a custom is_{evictable,purgeable}() test.
>>>>>
>>>>> I guess we should also expose drm_gem_shmem_shrinker_update_lru_locked()
>>>>> to let drivers move the GEMs that were used most recently (those
>>>>> referenced by a GPU job) at the end of the evictable LRU.
>>>>
>>>> We have the signaled-check in the common drm_gem_evict() helper:
>>>>
>>>> https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/gpu/drm/drm_gem.c#L1496
>>>
>>> Ah, indeed. I'll need DMA_RESV_USAGE_BOOKKEEP instead of
>>> DMA_RESV_USAGE_READ in panthor, but I can add it in the driver specific
>>> ->evict() hook (though that means calling dma_resv_test_signaled()
>>> twice, which is not great, oh well).
>>
>> Maybe we should change drm_gem_evict() to use BOOKKEEP. The
>> test_signaled(BOOKKEEP) should be a "stronger" check than
>> test_signaled(READ)?
>
> It is, just wondering if some users have a good reason to want
> READ here.
>
>>
>>> The problem about the evictable LRU remains though: we need a way to let
>>> drivers put their BOs at the end of the list when the BO has been used
>>> by the GPU, don't we?
>>
>> If BO is use, then it won't be evicted, while idling BOs will be
>> evicted. Hence, the used BOs will be naturally moved down the LRU list
>> each time shrinker is invoked.
>>
>
> That only do the trick if the BOs being used most often are busy when
> the shrinker kicks in though. Let's take this scenario:
>
>
> BO 1 BO 2 shinker
>
> busy
> idle (first-pos-in-evictable-LRU)
>
> busy
> idle (second-pos-in-evictable-LRU)
>
> busy
> idle
>
> busy
> idle
>
> busy
> idle
>
> find a BO to evict
> pick BO 2
>
> busy (swapin)
> idle
>
> If the LRU had been updated at each busy event, BO 1 should have
> been picked for eviction. But we evicted the BO that was first
> recorded idle instead of the one that was least recently
> recorded busy.

You have to swapin(BO) every time BO goes to busy state, and swapin does drm_gem_lru_move_tail(BO). Hence, each time BO goes idle->busy, it's moved down the LRU list.

For example, please see patch #29 where virtio-gpu invokes swapin for each job's BO in the submit()->virtio_gpu_array_prepare() code path.

--
Best regards,
Dmitry