Re: RFC: default group_isolation to 1, remove option

From: Justin TerAvest
Date: Mon Mar 07 2011 - 18:42:20 EST


On Mon, Mar 7, 2011 at 12:47 PM, Jens Axboe <axboe@xxxxxxxxx> wrote:
> On 2011-03-07 21:46, Vivek Goyal wrote:
>> On Mon, Mar 07, 2011 at 09:32:54PM +0100, Jens Axboe wrote:
>>
>> [..]
>>>> So given then fact that per-ioc-per-disk accounting of request descriptors
>>>> makes the accounting complicated and also makes it hard for block IO
>>>> controller to use it, the other approach of implementing per group limit
>>>> and per-group-per-bdi congested might be reasonable. Having said that, the
>>>> patch I had written for per group descritor was also not necessarily very
>>>> simple.
>>>
>>> So before all of this gets over designed a lot... If we get rid of the
>>> one remaining direct buffered writeback in bdp(), then only the flusher
>>> threads should be sending huge amounts of IO. So if we attack the
>>> problem from that end instead, have it do that accounting in the bdi.
>>> With that in place, I'm fairly confident that we can remove the request
>>> limits.
>>>
>>> Basically just replace the congestion_wait() in there with a bit of
>>> accounting logic. Since it's per bdi anyway, we don't even have to
>>> maintain that state in the bdi itself. It can remain in the thread
>>> stack.
>>
>> Moving the accounting up sounds interesting. For cgroup stuff we again
>> shall have to do something additional like having per cgroup per bdi
>> flusher threads or mainting the number of pending IO per group and not
>> flusher thread does not submitting IOs for groups which have lots of
>> pending IOs (to avoid faster group getting blocked behind slower one).
>
> So since there are at least two use cases, we could easily provide
> helpers to do that sort of blocking to not throw too much work at it.
>
> I think we are making progress :-)

This generally sounds good to me, though I didn't think per-cgroup limits
were terribly complicated.

I wanted to make a quick note-- it sounds like part of the intent here is to
avoid doing any page tracking in the page_cgroup structure, but I think that
we will inevitably have to do some tracking there for css ids, to provide
isolation between buffered writers. I'd like to send out a patchset soon
to track buffered writers, but we should probably work out the request
descriptor limits first.


>
> --
> Jens Axboe
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/