Re: [RFC 3/4] block: set mapping order for the block cache in set_init_blocksize

From: Pankaj Raghav
Date: Wed Jun 21 2023 - 06:45:43 EST


>>       bdev->bd_inode->i_blkbits = blksize_bits(bsize);
>> +    order = bdev->bd_inode->i_blkbits - PAGE_SHIFT;
>> +    folio_order = mapping_min_folio_order(bdev->bd_inode->i_mapping);
>> +
>> +    if (!IS_ENABLED(CONFIG_BUFFER_HEAD)) {
>> +        /* Do not allow changing the folio order after it is set */
>> +        WARN_ON_ONCE(folio_order && (folio_order != order));
>> +        mapping_set_folio_orders(bdev->bd_inode->i_mapping, order, 31);
>> +    }
>>   }
>>     int set_blocksize(struct block_device *bdev, int size)
> This really has nothing to do with buffer heads.
>
> In fact, I've got a patchset to make it work _with_ buffer heads.
>
> So please, don't make it conditional on CONFIG_BUFFER_HEAD.
>
> And we should be calling into 'mapping_set_folio_order()' only if the 'order' argument is larger
> than PAGE_ORDER, otherwise we end up enabling
> large folio support for _every_ block device.
> Which I doubt we want.
>

Hmm, which aops are you using for the block device? If you are using the old aops, then we will be
using helpers from buffer.c and mpage.c which do not support large folios. I am getting a BUG_ON
when I don't use iomap based aops for the block device:

[ 11.596239] kernel BUG at fs/buffer.c:2384!


[ 11.596609] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[ 11.597064] CPU: 3 PID: 10 Comm: kworker/u8:0 Not tainted
6.4.0-rc7-next-20230621-00010-g87171074c649-dirty #183
[ 11.597934] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014


[ 11.598882] Workqueue: nvme-wq nvme_scan_work [nvme_core]
[ 11.599370] RIP: 0010:block_read_full_folio+0x70d/0x8f0

Let me know what you think!

> Cheers,
>
> Hannes
>
>