Re: [PATCH 2/5] mm: correct calculation of cgroup wb's bg_thresh in wb_over_bg_thresh

From: Kemeng Shi
Date: Tue Jan 23 2024 - 21:02:18 EST




on 1/24/2024 4:43 AM, Tejun Heo wrote:
> On Wed, Jan 24, 2024 at 02:33:29AM +0800, Kemeng Shi wrote:
>> The wb_calc_thresh will calculate wb's share in global wb domain. We need
>> to wb's share in mem_cgroup_wb_domain for mdtc. Call __wb_calc_thresh
>> instead of wb_calc_thresh to fix this.
>
> That function is calculating the wb's portion of wb portion in the whole
> system so that threshold can be distributed accordingly. So, it has to be
> compared in the global domain. If you look at the comment on top of struct
> wb_domain, it says:
>
> /*
> * A wb_domain represents a domain that wb's (bdi_writeback's) belong to
> * and are measured against each other in. There always is one global
> * domain, global_wb_domain, that every wb in the system is a member of.
> * This allows measuring the relative bandwidth of each wb to distribute
> * dirtyable memory accordingly.
> */
>
Hi Tejun, thanks for reply. For cgroup wb, it will belongs to a global wb
domain and a cgroup domain. I agree the way how we calculate wb's threshold
in global domain as you described above. This patch tries to fix calculation
of wb's threshold in cgroup domain which now is wb_calc_thresh(mdtc->wb,
mdtc->bg_thresh)), means:
(wb bandwidth) / (system bandwidth) * (*cgroup domain threshold*)
The cgroup domain threshold is
(memory of cgroup domain) / (memory of system) * (system threshold).
Then the wb's threshold in cgroup will be smaller than expected.

Consider following domain hierarchy:
global domain (100G)
/ \
cgroup domain1(50G) cgroup domain2(50G)
| |
bdi wb1 wb2
Assume wb1 and wb2 has the same bandwidth.
We have global domain bg_thresh 10G, cgroup domain bg_thresh 5G.
Then we have:
wb's thresh in global domain = 10G * (wb bandwidth) / (system bandwidth)
= 10G * 1/2 = 5G
wb's thresh in cgroup domain = 5G * (wb bandwidth) / (system bandwidth)
= 5G * 1/2 = 2.5G
At last, wb1 and wb2 will be limited at 2.5G, the system will be limited
at 5G which is less than global domain bg_thresh 10G.

After the fix, threshold in cgroup domain will be:
(wb bandwidth) / (cgroup bandwidth) * (cgroup domain threshold)
The wb1 and wb2 will be limited at 5G, the system will be limited at
10G which equals to global domain bg_thresh 10G.

As I didn't take a deep look into memory cgroup, please correct me if
anything is wrong. Thanks!
> Also, how is this tested? Was there a case where the existing code
> misbehaved that's improved by this patch? Or is this just from reading code?
This is jut from reading code. Would the case showed above convince you
a bit. Look forward to your reply, thanks!.
>
> Thanks.
>