[PATCH 0/2] Avoid memory allocation for O_DIRECT IO.

From: NeilBrown
Date: Wed Mar 04 2015 - 18:58:24 EST


Hi Al,
I wonder if you would consider these two patches.

They extend the functionality of mlockall(MCL_FUTURE) to apply
to memory allocations when performing O_DIRECT io.
i.e. The first read or write to an O_DIRECT file descriptor will,
if MCL_FUTURE is in effect, cache any allocated memory so that
it doesn't need to be allocated on subsequent reads or writes.

This is needed for reliable handling of RAID metadata in userspace.
When a device fails, it is necessary to record this failure in the
metadata before further writes are allowed to complete.
As a GFP_KERNEL allocation may block waiting for arbitrary writes
to complete, we must not allow any GFP_KERNEL allocation while
updating the metadata.

The approach I have taken to avoiding GFP_KERNEL allocations in
O_DIRECT handling is to cache the necessary data structures the first
time they are allocated.

There are two data structures, "struct dio" and "struct bio".
I have seen a host in a memory deadlock where mdmon (which does the
metadata management) was stuck waiting to allocate a 'struct dio',
but couldn't until writeout was allowed to proceed - which it
couldn't.

I have not need a machine deadlocking waiting for a bio. That is a
much less likely deadlock scenario. The bio is allocated from a
mempool so the allocation will very often succeed. Exhausting the
mempool is unlikely but I believe it is theoretically possible as the
mempool is shared over multiple devices.

Thanks,
NeilBrown

---

NeilBrown (2):
block_dev/DIO: Optionally allocate single 'struct dio' per file.
block_dev/DIO - cache one bio allocation when caching a DIO.


fs/block_dev.c | 7 +++++-
fs/direct-io.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++------
include/linux/fs.h | 6 +++++
3 files changed, 66 insertions(+), 8 deletions(-)

--
Signature

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/