Re: Lockdep complain for zram

From: Jerome Marchand
Date: Fri Nov 23 2012 - 09:47:04 EST


On 11/23/2012 12:34 AM, Minchan Kim wrote:
> On Thu, Nov 22, 2012 at 12:13:24PM +0100, Jerome Marchand wrote:
>> On 11/21/2012 09:37 AM, Minchan Kim wrote:
>>> Hi alls,
>>>
>>> Today, I saw below complain of lockdep.
>>> As a matter of fact, I knew it long time ago but forgot that.
>>> The reason lockdep complains is that now zram uses GFP_KERNEL
>>> in reclaim path(ex, __zram_make_request) :(
>>> I can fix it via replacing GFP_KERNEL with GFP_NOIO.
>>> But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
>>> Of course, I can change it with __vmalloc which can receive gfp_t.
>>> But still we have a problem. Althoug __vmalloc can handle gfp_t, it calls
>>> allocation of GFP_KERNEL. That's why I sent the patch.
>>> https://lkml.org/lkml/2012/4/23/77
>>> Since then, I forgot it, saw the bug today and poped the question again.
>>>
>>> Yes. Fundamental problem is utter crap API vmalloc.
>>> If we can fix it, everyone would be happy. But life isn't simple like seeing
>>> my thread of the patch.
>>>
>>> So next option is to move zram_init_device into setting disksize time.
>>> But it makes unnecessary metadata waste until zram is used really(That's why
>>> Nitin move zram_init_device from disksize setting time to make_request) and
>>> it makes user should set the disksize before using, which are behavior change.
>>>
>>> I would like to clean up this issue before promoting because it might change
>>> usage behavior.
>>>
>>> Do you have any idea?
>>
>> This is a false positive due to the memory allocation in
>> zram_init_device() called from zram_make_request(). It appears to
>> lockdep that the allocation might trigger a request on the device that
>> would try to take init_lock again, but in fact it doesn't. The device
>> is not initialized yet, even less swapped on.
>
> That's not a only swap case.
> Let's think following usecase.
>
> 1) Booting
> 2) echo $((DISKSIZE)) > /sys/block/zram0/disksize
> 3) dd if=/dev/zero of=/dev/zram0 bs=4K count=1
> 4) Written 4K page(page-A) is still page cache and isn't submitted
> to zram block device.
> 5) Memory pressure happen by some memory hogger.
> 6) VM start to reclaim and write page-A to zram0.
> 7) zram_init_device is called at last.
> 8) allocate GFP_KERNEL in zram_init_device
> 9) goto reclaim path again.
> 10) deadlock.
>
> So I think it's not false positive.

I guess you're right. That's a scenario I haven't imagined. At any rate, my
patch fixes that.

> Even if it is, I think lock split isn't a good idea to just avoid
> lockdep warn. It makes code unnecessary complicated and it would be more
> error-prone. Let's not add another lock without performance trouble report
> by the lock.
>
> As I discussed with Nitin in this thread, lazy initialization don't have
> much point and disksize setting option isn't consistent for user behavior.
> And I expect Nitin will send patch "diet of table" soonish.
>
> So just moving the initialzation part from reclaim context to process's one
> is simple and clear solution, I believe.

Although that would avoid deadlocks (I guess, I'm not sure anymore...), it
won't stop lockdep from complaining. It still makes an allocation while
holding a lock that is also taken in a reclaim context.
Anyway, I like the idea to removes the lazy initialization. It makes things
more complicated without any actual advantage.

Jerome

>
>>
>> The following (quickly tested) patch should prevent lockdep complain.
>>
>> Jerome
>>
>> ---
>> >From ebb3514c4ee18276da7c5ca08025991b493ac204 Mon Sep 17 00:00:00 2001
>> From: Jerome Marchand <jmarchan@xxxxxxxxxx>
>> Date: Thu, 22 Nov 2012 09:07:40 +0100
>> Subject: [PATCH] staging: zram: Avoid lockdep warning
>>
>> zram triggers a lockdep warning. The cause of it is the call to
>> zram_init_device() from zram_make_request(). The memory allocation in
>> zram_init_device() could start a memory reclaim which in turn could
>> cause swapout and (as it appears to lockdep) a call to
>> zram_make_request(). However this is a false positive: an
>> unititialized device can't be used as swap.
>> A solution is to split init_lock in two lock. One mutex that protects
>> init, reset and size setting and a rw_semaphore that protects requests
>> and reset. Thus init and request would be protected by different locks
>> and lockdep will be happy.
>>
>> Signed-off-by: Jerome Marchand <jmarchan@xxxxxxxxxx>
>> ---
>> drivers/staging/zram/zram_drv.c | 41 +++++++++++++++++++-----------------
>> drivers/staging/zram/zram_drv.h | 16 ++++++++++---
>> drivers/staging/zram/zram_sysfs.c | 20 +++++++++---------
>> 3 files changed, 44 insertions(+), 33 deletions(-)
>>
>> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
>> index fb4a7c9..b3bc3c4 100644
>> --- a/drivers/staging/zram/zram_drv.c
>> +++ b/drivers/staging/zram/zram_drv.c
>> @@ -470,11 +470,11 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
>> {
>> struct zram *zram = queue->queuedata;
>>
>> - if (unlikely(!zram->init_done) && zram_init_device(zram))
>> + if (unlikely(!is_initialized(zram)) && zram_init_device(zram))
>> goto error;
>>
>> - down_read(&zram->init_lock);
>> - if (unlikely(!zram->init_done))
>> + down_read(&zram->req_lock);
>> + if (unlikely(!is_initialized(zram)))
>> goto error_unlock;
>>
>> if (!valid_io_request(zram, bio)) {
>> @@ -483,12 +483,12 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
>> }
>>
>> __zram_make_request(zram, bio, bio_data_dir(bio));
>> - up_read(&zram->init_lock);
>> + up_read(&zram->req_lock);
>>
>> return;
>>
>> error_unlock:
>> - up_read(&zram->init_lock);
>> + up_read(&zram->req_lock);
>> error:
>> bio_io_error(bio);
>> }
>> @@ -497,7 +497,7 @@ void __zram_reset_device(struct zram *zram)
>> {
>> size_t index;
>>
>> - zram->init_done = 0;
>> + atomic_set(&zram->init_done, 0);
>>
>> /* Free various per-device buffers */
>> kfree(zram->compress_workmem);
>> @@ -529,9 +529,12 @@ void __zram_reset_device(struct zram *zram)
>>
>> void zram_reset_device(struct zram *zram)
>> {
>> - down_write(&zram->init_lock);
>> - __zram_reset_device(zram);
>> - up_write(&zram->init_lock);
>> + mutex_lock(&zram->init_lock);
>> + down_write(&zram->req_lock);
>> + if (is_initialized(zram))
>> + __zram_reset_device(zram);
>> + up_write(&zram->req_lock);
>> + mutex_unlock(&zram->init_lock);
>> }
>>
>> int zram_init_device(struct zram *zram)
>> @@ -539,10 +542,10 @@ int zram_init_device(struct zram *zram)
>> int ret;
>> size_t num_pages;
>>
>> - down_write(&zram->init_lock);
>> + mutex_lock(&zram->init_lock);
>>
>> - if (zram->init_done) {
>> - up_write(&zram->init_lock);
>> + if (is_initialized(zram)) {
>> + mutex_unlock(&zram->init_lock);
>> return 0;
>> }
>>
>> @@ -583,8 +586,8 @@ int zram_init_device(struct zram *zram)
>> goto fail;
>> }
>>
>> - zram->init_done = 1;
>> - up_write(&zram->init_lock);
>> + atomic_set(&zram->init_done, 1);
>> + mutex_unlock(&zram->init_lock);
>>
>> pr_debug("Initialization done!\n");
>> return 0;
>> @@ -594,7 +597,7 @@ fail_no_table:
>> zram->disksize = 0;
>> fail:
>> __zram_reset_device(zram);
>> - up_write(&zram->init_lock);
>> + mutex_unlock(&zram->init_lock);
>> pr_err("Initialization failed: err=%d\n", ret);
>> return ret;
>> }
>> @@ -619,7 +622,8 @@ static int create_device(struct zram *zram, int device_id)
>> int ret = 0;
>>
>> init_rwsem(&zram->lock);
>> - init_rwsem(&zram->init_lock);
>> + mutex_init(&zram->init_lock);
>> + init_rwsem(&zram->req_lock);
>> spin_lock_init(&zram->stat64_lock);
>>
>> zram->queue = blk_alloc_queue(GFP_KERNEL);
>> @@ -672,7 +676,7 @@ static int create_device(struct zram *zram, int device_id)
>> goto out;
>> }
>>
>> - zram->init_done = 0;
>> + atomic_set(&zram->init_done, 0);
>>
>> out:
>> return ret;
>> @@ -755,8 +759,7 @@ static void __exit zram_exit(void)
>> zram = &zram_devices[i];
>>
>> destroy_device(zram);
>> - if (zram->init_done)
>> - zram_reset_device(zram);
>> + zram_reset_device(zram);
>> }
>>
>> unregister_blkdev(zram_major, "zram");
>> diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
>> index df2eec4..f6bcead 100644
>> --- a/drivers/staging/zram/zram_drv.h
>> +++ b/drivers/staging/zram/zram_drv.h
>> @@ -96,9 +96,12 @@ struct zram {
>> * against concurrent read and writes */
>> struct request_queue *queue;
>> struct gendisk *disk;
>> - int init_done;
>> - /* Prevent concurrent execution of device init, reset and R/W request */
>> - struct rw_semaphore init_lock;
>> + atomic_t init_done;
>> + /* Prevent concurrent execution of device init, reset and
>> + * disksize_store */
>> + struct mutex init_lock;
>> + /* Prevent concurent execution device reset and R/W requests */
>> + struct rw_semaphore req_lock;
>> /*
>> * This is the limit on amount of *uncompressed* worth of data
>> * we can store in a disk.
>> @@ -108,6 +111,11 @@ struct zram {
>> struct zram_stats stats;
>> };
>>
>> +static inline int is_initialized(struct zram *zram)
>> +{
>> + return atomic_read(&zram->init_done);
>> +}
>> +
>> extern struct zram *zram_devices;
>> unsigned int zram_get_num_devices(void);
>> #ifdef CONFIG_SYSFS
>> @@ -115,6 +123,6 @@ extern struct attribute_group zram_disk_attr_group;
>> #endif
>>
>> extern int zram_init_device(struct zram *zram);
>> -extern void __zram_reset_device(struct zram *zram);
>> +extern void zram_reset_device(struct zram *zram);
>>
>> #endif
>> diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
>> index de1eacf..b300881 100644
>> --- a/drivers/staging/zram/zram_sysfs.c
>> +++ b/drivers/staging/zram/zram_sysfs.c
>> @@ -62,16 +62,19 @@ static ssize_t disksize_store(struct device *dev,
>> if (!disksize)
>> return -EINVAL;
>>
>> - down_write(&zram->init_lock);
>> - if (zram->init_done) {
>> - up_write(&zram->init_lock);
>> + mutex_lock(&zram->init_lock);
>> + down_write(&zram->req_lock);
>> + if (is_initialized(zram)) {
>> + up_write(&zram->req_lock);
>> + mutex_unlock(&zram->init_lock);
>> pr_info("Cannot change disksize for initialized device\n");
>> return -EBUSY;
>> }
>>
>> zram->disksize = PAGE_ALIGN(disksize);
>> set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
>> - up_write(&zram->init_lock);
>> + up_write(&zram->req_lock);
>> + mutex_unlock(&zram->init_lock);
>>
>> return len;
>> }
>> @@ -81,7 +84,7 @@ static ssize_t initstate_show(struct device *dev,
>> {
>> struct zram *zram = dev_to_zram(dev);
>>
>> - return sprintf(buf, "%u\n", zram->init_done);
>> + return sprintf(buf, "%u\n", atomic_read(&zram->init_done));
>> }
>>
>> static ssize_t reset_store(struct device *dev,
>> @@ -110,10 +113,7 @@ static ssize_t reset_store(struct device *dev,
>> if (bdev)
>> fsync_bdev(bdev);
>>
>> - down_write(&zram->init_lock);
>> - if (zram->init_done)
>> - __zram_reset_device(zram);
>> - up_write(&zram->init_lock);
>> + zram_reset_device(zram);
>>
>> return len;
>> }
>> @@ -186,7 +186,7 @@ static ssize_t mem_used_total_show(struct device *dev,
>> u64 val = 0;
>> struct zram *zram = dev_to_zram(dev);
>>
>> - if (zram->init_done)
>> + if (is_initialized(zram))
>> val = zs_get_total_size_bytes(zram->mem_pool);
>>
>> return sprintf(buf, "%llu\n", val);
>> --
>> 1.7.7.6
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/