Re: [PATCH 5/7] vfs: Add wbcflush sysfs knob to disable storagedevice writeback cache flushes

From: Fernando Luis Vázquez Cao
Date: Tue Mar 31 2009 - 02:49:22 EST


Jens Axboe wrote:
On Mon, Mar 30 2009, Fernando Luis Vázquez Cao wrote:
Jens Axboe wrote:
On Mon, Mar 30 2009, Fernando Luis Vázquez Cao wrote:
Add a sysfs knob to disable storage device writeback cache flushes.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@xxxxxxxxxxxxx>
---

diff -urNp linux-2.6.29-orig/block/blk-barrier.c linux-2.6.29/block/blk-barrier.c
--- linux-2.6.29-orig/block/blk-barrier.c 2009-03-24 08:12:14.000000000 +0900
+++ linux-2.6.29/block/blk-barrier.c 2009-03-30 17:08:28.000000000 +0900
@@ -318,6 +318,9 @@ int blkdev_issue_flush(struct block_devi
if (!q)
return -ENXIO;

+ if (blk_queue_nowbcflush(q))
+ return -EOPNOTSUPP;
+
bio = bio_alloc(GFP_KERNEL, 0);
if (!bio)
return -ENOMEM;
diff -urNp linux-2.6.29-orig/block/blk-core.c linux-2.6.29/block/blk-core.c
--- linux-2.6.29-orig/block/blk-core.c 2009-03-24 08:12:14.000000000 +0900
+++ linux-2.6.29/block/blk-core.c 2009-03-30 17:08:28.000000000 +0900
@@ -1452,7 +1452,8 @@ static inline void __generic_make_reques
goto end_io;
}
if (bio_barrier(bio) && bio_has_data(bio) &&
- (q->next_ordered == QUEUE_ORDERED_NONE)) {
+ (blk_queue_nowbcflush(q) ||
+ q->next_ordered == QUEUE_ORDERED_NONE)) {
err = -EOPNOTSUPP;
goto end_io;
}
This (and the above hunk) should be changed. -EOPNOTSUPP means the
target does not support barriers, that is a different thing to flushes
not being needed. A file system issuing a barrier and getting
-EOPNOTSUPP back will disable barriers, since it now thinks that
ordering cannot be guaranteed.
The reason I decided to use -EOPNOTSUPP was that I wanted to keep
barriers and device flushes from entering the block layer when
they are not needed. I feared that if we pass them down the block
stack (knowing in advance they will not be actually submitted to
disk) we may end up slowing things down unnecessarily.

But that's just wrong, you need to make sure that the block layer / io
scheduler doesn't reorder as well. It's a lot more complex than just the
device end. So just returning -EOPNOTSUPP and pretending that you need
not use barriers at the fs end is just wrong.

I should have mentioned that in this patch set I was trying to tackle the
blkdev_issue_flush() case only. As you pointed out, with the code above
requests may get silently reordered across barriers inside the block layer.

The follow-up patch I am working on implements blkdev_issue_empty_barrier(),
which should be used by filesystems that want to emit an empty barrier (as
opposed to just triggering a device flush). Doing this we can optimize
fsync() flushes (block_flush_device()) and filesystem-originated barriers
(blkdev_issue_empty_barrier()) independently in the block layer.

I agree with you that the we should pass barriers down in
__generic_make_request, but the optimization above for fsync()-originated
blkdev_issue_flush()'s seems valid to me.

Does it make sense now?

As you mentioned, filesystems such as ext3/4 will disable
barriers if they get -EOPNOTSUPP when issuing one. I was planning
to add a notifier mechanism so that we can notify filesystems has
been a change in the barrier settings. This might be
over-engineering, though. Especially considering that "-o
remount,barrier=1" will bring us the barriers back.

I think that is over-engineering.

Yep, we certainly agree on that point.

A more appropriate change would be to successfully complete a flush
without actually sending it down to the device if blk_queue_nowbcflush()
is true. Then blkdev_issue_flush() would just work as well. It also
needs to take stacking into account, or stacked drivers will have to
propagate the settings up the stack. If you allow simply the barrier to
be passed down, you get that for free.
Aren't we risking slowing things down? Does the small optimization above
make sense (especially taking the remount trick into account)?

It's not, I think you are missing the bigger picture.

Sorry for not explaining myself properly. I will add a changelog and better
documentation for the patches.

Thank you for your feedback!

- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/