[RFC v2 4/5] mm/swapfile: refcount block and queue before using blkcg_schedule_throttle()

From: Luis Chamberlain
Date: Thu Apr 09 2020 - 17:45:41 EST

Next message: Luis Chamberlain: "[RFC v2 0/5] blktrace: fix use after free"
Previous message: Luis Chamberlain: "[RFC v2 1/5] block: move main block debugfs initialization to its own file"
In reply to: Bart Van Assche: "Re: [RFC v2 1/5] block: move main block debugfs initialization to its own file"
Next in thread: Bart Van Assche: "Re: [RFC v2 4/5] mm/swapfile: refcount block and queue before using blkcg_schedule_throttle()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

block device are refcounted so to ensure once its final user goes away it
can be cleaned up by the lower layers properly. The block device's
request_queue structure is also refcounted, however if the last
blk_put_queue() is called under atomic context the block layer has
to defer removal.

By refcounting the block device during the use of blkcg_schedule_throttle(),
we ensure ensure two things:

1) the block device remains available during the call
2) we ensure avoid having to deal with the fact we're using the
request_queue structure in atomic context, since the last
blk_put_queue() will be called upon disk_release(), *after*
our own bdput().

This means this code path is *not* going to remove the request_queue
structure, as we are ensuring some later upper layer disk_release()
will be the one to release the request_queue structure for us.

Cc: Bart Van Assche <bvanassche@xxxxxxx>
Cc: Omar Sandoval <osandov@xxxxxx>
Cc: Hannes Reinecke <hare@xxxxxxxx>
Cc: Nicolai Stange <nstange@xxxxxxx>
Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxxxx>
Cc: yu kuai <yukuai3@xxxxxxxxxx>
Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx>
---
mm/swapfile.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 6659ab563448..5f6f3a61b5c0 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3753,6 +3753,7 @@ static void free_swap_count_continuations(struct swap_info_struct *si)
void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node,
gfp_t gfp_mask)
{
+ struct block_device *bdev;
struct swap_info_struct *si, *next;
if (!(gfp_mask & __GFP_IO) || !memcg)
return;
@@ -3771,8 +3772,18 @@ void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node,
plist_for_each_entry_safe(si, next, &swap_avail_heads[node],
avail_lists[node]) {
if (si->bdev) {
+ bdev = bdgrab(si->bdev);
+ if (!bdev)
+ continue;
+ /*
+ * By adding our own bdgrab() we ensure the queue
+ * sticks around until disk_release(), and so we ensure
+ * our release of the request_queue does not happen in
+ * atomic context.
+ */
blkcg_schedule_throttle(bdev_get_queue(si->bdev),
true);
+ bdput(bdev);
break;
}
}
--
2.25.1

Next message: Luis Chamberlain: "[RFC v2 0/5] blktrace: fix use after free"
Previous message: Luis Chamberlain: "[RFC v2 1/5] block: move main block debugfs initialization to its own file"
In reply to: Bart Van Assche: "Re: [RFC v2 1/5] block: move main block debugfs initialization to its own file"
Next in thread: Bart Van Assche: "Re: [RFC v2 4/5] mm/swapfile: refcount block and queue before using blkcg_schedule_throttle()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]