Re: 2.6.27-rc4: lots of 'in_atomic():1, irqs_disabled():0' with software-raid1

From: Jens Axboe
Date: Fri Aug 29 2008 - 03:12:12 EST


On Fri, Aug 29 2008, Neil Brown wrote:
> On Friday August 29, neilb@xxxxxxx wrote:
> >
> > Here is my (untested yet) patch to address the problem.
> > I'll try to get some testing done and push it out early next week, but
> > if anyone could review and/or test that would be a great help.
>
> Actually, that was my "haven't even compiled it yet" patch. The one I
> meant to send was this one.
>
> Sorry for the noise.
>
> NeilBrown
>
>
> From 3a0646137016c69dbcaeed0114558f67daa4a6f0 Mon Sep 17 00:00:00 2001
> From: NeilBrown <neilb@xxxxxxx>
> Date: Fri, 29 Aug 2008 15:46:38 +1000
> Subject: [PATCH] Fix problem with waiting while holding rcu read lock in md/bitmap.c
>
> A recent patch to protect the rdev list with rcu locking leaves us
> with a problem because we can sleep on memalloc while holding the
> rcu lock.
>
> The rcu lock is only needed while walking the linked list as
> uninteresting devices (failed or spares) can be removed at any time.
>
> So only take the rcu lock while actually walking the linked list.
> Take a refcount on the rdev during the time when we drop the lock
> and do the memalloc to start IO.
> When we return to the locked code, all the interesting devices
> on the list will not have moved, so we can simply use
> list_for_each_continue_rcu to pick up where we left off.
>
> Signed-off-by: NeilBrown <neilb@xxxxxxx>
> ---
> drivers/md/bitmap.c | 44 +++++++++++++++++++++++++++++++++++++-------
> 1 files changed, 37 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
> index 7e65bad..c17eb4f 100644
> --- a/drivers/md/bitmap.c
> +++ b/drivers/md/bitmap.c
> @@ -238,15 +238,46 @@ static struct page *read_sb_page(mddev_t *mddev, long offset, unsigned long inde
>
> }
>
> +static mdk_rdev_t *next_active_rdev(mdk_rdev_t *rdev, mddev_t *mddev)
> +{
> + /* Iterate the disks of an mddev, using rcu to protect access to the
> + * linked list, and raising the refcount of devices we return to ensure
> + * they don't disappear while in use.
> + * As devices are only added or removed when raid_disk is < 0 and
> + * nr_pending is 0 and In_sync is clear, the entries we return will
> + * still be in the same position on the list when we re-enter
> + * list_for_each_continue_rcu.
> + */
> + struct list_head *pos;
> + rcu_read_lock();
> + if (rdev == NULL)
> + /* start at the beginning */
> + pos = &mddev->disks;
> + else
> + /* release the previous rdev */
> + rdev_dec_pending(rdev, mddev);
> +
> + list_for_each_continue_rcu(pos, &mddev->disks) {
> + rdev = list_entry(pos, mdk_rdev_t, same_set);
> + if (rdev->raid_disk >= 0 &&
> + test_bit(In_sync, &rdev->flags) &&
> + !test_bit(Faulty, &rdev->flags)) {
> + /* this is a usable devices */
> + atomic_inc(&rdev->nr_pending);
> + rcu_read_unlock();
> + return rdev;
> + }
> + }
> + rcu_read_unlock();
> + return NULL;
> +}
> +
> static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
> {
> - mdk_rdev_t *rdev;
> + mdk_rdev_t *rdev = NULL;
> mddev_t *mddev = bitmap->mddev;
>
> - rcu_read_lock();
> - rdev_for_each_rcu(rdev, mddev)
> - if (test_bit(In_sync, &rdev->flags)
> - && !test_bit(Faulty, &rdev->flags)) {
> + while ((rdev = next_active_rdev(rdev, mddev)) != NULL) {
> int size = PAGE_SIZE;
> if (page->index == bitmap->file_pages-1)
> size = roundup(bitmap->last_page_size,
> @@ -281,8 +312,7 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
> + page->index * (PAGE_SIZE/512),
> size,
> page);
> - }
> - rcu_read_unlock();
> + }
>
> if (wait)
> md_super_wait(mddev);
> --
> 1.5.6.5

Looks like an elegant solution to the problem Neil, good stuff!

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/