Re: [PATCH 26/51] writeback: let balance_dirty_pages() work on the matching cgroup bdi_writeback

From: Jan Kara
Date: Tue Jun 30 2015 - 10:31:33 EST


On Fri 22-05-15 17:13:40, Tejun Heo wrote:
> Currently, balance_dirty_pages() always work on bdi->wb. This patch
> updates it to work on the wb (bdi_writeback) matching memcg and blkcg
> of the current task as that's what the inode is being dirtied against.
>
> balance_dirty_pages_ratelimited() now pins the current wb and passes
> it to balance_dirty_pages().
>
> As no filesystem has FS_CGROUP_WRITEBACK yet, this doesn't lead to
> visible behavior differences.
...
> void balance_dirty_pages_ratelimited(struct address_space *mapping)
> {
> - struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
> - struct bdi_writeback *wb = &bdi->wb;
> + struct inode *inode = mapping->host;
> + struct backing_dev_info *bdi = inode_to_bdi(inode);
> + struct bdi_writeback *wb = NULL;
> int ratelimit;
> int *p;
>
> if (!bdi_cap_account_dirty(bdi))
> return;
>
> + if (inode_cgwb_enabled(inode))
> + wb = wb_get_create_current(bdi, GFP_KERNEL);
> + if (!wb)
> + wb = &bdi->wb;
> +

So this effectively adds a radix tree lookup (of wb belonging to memcg) for
every set_page_dirty() call. That seems relatively costly to me. And all
that just to check wb->dirty_exceeded. Cannot we just use inode_to_wb()
instead? I understand results may be different if multiple memcgs share an
inode and that's the reason why you use wb_get_create_current(), right?
But for dirty_exceeded check it may be good enough?

Honza

> ratelimit = current->nr_dirtied_pause;
> if (wb->dirty_exceeded)
> ratelimit = min(ratelimit, 32 >> (PAGE_SHIFT - 10));
> @@ -1616,7 +1622,9 @@ void balance_dirty_pages_ratelimited(struct address_space *mapping)
> preempt_enable();
>
> if (unlikely(current->nr_dirtied >= ratelimit))
> - balance_dirty_pages(mapping, current->nr_dirtied);
> + balance_dirty_pages(mapping, wb, current->nr_dirtied);
> +
> + wb_put(wb);
> }
> EXPORT_SYMBOL(balance_dirty_pages_ratelimited);
>
> --
> 2.4.0
>
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/