Re: [PATCH 5/7] vfs: Add wbcflush sysfs knob to disable storagedevice writeback cache flushes

From: Jens Axboe
Date: Tue Mar 31 2009 - 06:38:29 EST


On Tue, Mar 31 2009, Fernando Luis Vázquez Cao wrote:
> Jens Axboe wrote:
>> On Mon, Mar 30 2009, Fernando Luis Vázquez Cao wrote:
>>> Jens Axboe wrote:
>>>> On Mon, Mar 30 2009, Fernando Luis Vázquez Cao wrote:
>>>>> Add a sysfs knob to disable storage device writeback cache flushes.
>>>>>
>>>>> Signed-off-by: Fernando Luis Vazquez Cao <fernando@xxxxxxxxxxxxx>
>>>>> ---
>>>>>
>>>>> diff -urNp linux-2.6.29-orig/block/blk-barrier.c linux-2.6.29/block/blk-barrier.c
>>>>> --- linux-2.6.29-orig/block/blk-barrier.c 2009-03-24 08:12:14.000000000 +0900
>>>>> +++ linux-2.6.29/block/blk-barrier.c 2009-03-30 17:08:28.000000000 +0900
>>>>> @@ -318,6 +318,9 @@ int blkdev_issue_flush(struct block_devi
>>>>> if (!q)
>>>>> return -ENXIO;
>>>>>
>>>>> + if (blk_queue_nowbcflush(q))
>>>>> + return -EOPNOTSUPP;
>>>>> +
>>>>> bio = bio_alloc(GFP_KERNEL, 0);
>>>>> if (!bio)
>>>>> return -ENOMEM;
>>>>> diff -urNp linux-2.6.29-orig/block/blk-core.c linux-2.6.29/block/blk-core.c
>>>>> --- linux-2.6.29-orig/block/blk-core.c 2009-03-24 08:12:14.000000000 +0900
>>>>> +++ linux-2.6.29/block/blk-core.c 2009-03-30 17:08:28.000000000 +0900
>>>>> @@ -1452,7 +1452,8 @@ static inline void __generic_make_reques
>>>>> goto end_io;
>>>>> }
>>>>> if (bio_barrier(bio) && bio_has_data(bio) &&
>>>>> - (q->next_ordered == QUEUE_ORDERED_NONE)) {
>>>>> + (blk_queue_nowbcflush(q) ||
>>>>> + q->next_ordered == QUEUE_ORDERED_NONE)) {
>>>>> err = -EOPNOTSUPP;
>>>>> goto end_io;
>>>>> }
>>>> This (and the above hunk) should be changed. -EOPNOTSUPP means the
>>>> target does not support barriers, that is a different thing to flushes
>>>> not being needed. A file system issuing a barrier and getting
>>>> -EOPNOTSUPP back will disable barriers, since it now thinks that
>>>> ordering cannot be guaranteed.
>>> The reason I decided to use -EOPNOTSUPP was that I wanted to keep
>>> barriers and device flushes from entering the block layer when
>>> they are not needed. I feared that if we pass them down the block
>>> stack (knowing in advance they will not be actually submitted to
>>> disk) we may end up slowing things down unnecessarily.
>>
>> But that's just wrong, you need to make sure that the block layer / io
>> scheduler doesn't reorder as well. It's a lot more complex than just the
>> device end. So just returning -EOPNOTSUPP and pretending that you need
>> not use barriers at the fs end is just wrong.
>
> I should have mentioned that in this patch set I was trying to tackle the
> blkdev_issue_flush() case only. As you pointed out, with the code above
> requests may get silently reordered across barriers inside the block layer.
>
> The follow-up patch I am working on implements blkdev_issue_empty_barrier(),
> which should be used by filesystems that want to emit an empty barrier (as
> opposed to just triggering a device flush). Doing this we can optimize
> fsync() flushes (block_flush_device()) and filesystem-originated barriers
> (blkdev_issue_empty_barrier()) independently in the block layer.

Not sure it makes sense to abstract that out into an api, it's basically
just a bio_alloc(gfp, 0); with setting the bio fields and then
submitting. Otherwise you'd have to either pass a ton of parameters, the
caller will want to set end_io, bdev, etc anyway. And after that it's
just submit_bio().

> I agree with you that the we should pass barriers down in
> __generic_make_request, but the optimization above for fsync()-originated
> blkdev_issue_flush()'s seems valid to me.

Of course, we need to do that. Anything else would be broken. The
blkdev_issue_flush() should be changed to return 0, with the -EOPNOTSUPP
being flag cached.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/