Re: [PATCH 04/16] DRBD: bitmap

From: Lars Ellenberg
Date: Sat May 02 2009 - 13:29:47 EST


On Sat, May 02, 2009 at 10:41:58AM -0500, James Bottomley wrote:
> On Thu, 2009-04-30 at 13:26 +0200, Philipp Reisner wrote:
> > DRBD maintains a dirty bitmap in case it has to run without peer node or
> > without local disk. Writes to the on disk dirty bitmap are minimized by the
> > activity log (=AL). Each time an extent is evicted from the AL the part of
> > the bitmap no longer covered by the AL is written to disk.
> >
> > Signed-off-by: Philipp Reisner <philipp.reisner@xxxxxxxxxx>
> > Signed-off-by: Lars Ellenberg <lars.ellenberg@xxxxxxxxxx>
>
> The way the bitmap and activity log work are very similar to the way the
> md bitmap works (and are implemented for almost exactly the same
> reason). Is there any way we could combine them?

in principle yes.
the DRBD bitmap has a granularity of 4 kB per bit,
and the "activity log" covers 4 MB per what we call "al extent".

though there is a very important difference.

in MD, when the bitmap is in use, I think the approach is:

for each write queued to the lower level devices,
dirty bits in memory
for every newly dirtied bitmap page,
flush bitmap pages to disk
wait for these bitmap writes to complete
then unplug the lowe level devices

in background: periodically try to clean some pages,
and write them to disk

the DRBD approach is:
if target "al extent" of this write request
is NOT in the in-memory "lru_cache" already,
get it into the cache,
if that means we have to kick an
old element from the cache, and
the associated bitmap is dirty
write that part of the bitmap
write an "al transaction" (synchonous single sector write)
else
FAST PATH, no additional "meta data" write needed.

submit to lower level device.


MD most of the time just _needs_ the additional "meta data" writes.
DRBD most of the time does not (unless you have completely random
writes, always requesting an extent not yet/anymore in the activity log.

I'm in the process of generalizing DRBDs approach to allow more than one
"al extent" to change during a "prepare" step, and cover several such changes
in one "al transaction", so the number of meta data updates can be
reduced even further.

adopting this "activity log" approach would make MD even better, IMO.

Thanks,

Lars
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/