Re: [czhong@xxxxxxxxxx: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278]

From: Christian Brauner
Date: Wed Aug 23 2023 - 04:50:06 EST


On Wed, Aug 23, 2023 at 12:06:14PM +0800, Ming Lei wrote:
>
> Looks the issue is more related with vfs, so forward to vfs list.
>
> ----- Forwarded message from Changhui Zhong <czhong@xxxxxxxxxx> -----
>
> Date: Wed, 23 Aug 2023 11:17:55 +0800
> From: Changhui Zhong <czhong@xxxxxxxxxx>
> To: linux-scsi@xxxxxxxxxxxxxxx
> Cc: Ming Lei <ming.lei@xxxxxxxxxx>
> Subject: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278
>
> Hello,
>
> triggered below warning issue with branch
> "
> Tree: mainline.kernel.org-clang
> Repository: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> @ master
> Commit Hash: 89bf6209cad66214d3774dac86b6bbf2aec6a30d
> Commit Name: v6.5-rc7-18-g89bf6209cad6
> Kernel information:
> Commit message: Merge tag 'devicetree-fixes-for-6.5-2' of
> git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
> "
> for more detail,please check
> https://datawarehouse.cki-project.org/kcidb/tests/9232643
>
> #modprobe scsi_debug virtual_gb=128
> #echo none > /sys/block/sdb/queue/scheduler
> #fio --bs=4k --ioengine=libaio --iodepth=1 --numjobs=4 --rw=randrw
> --name=sdb-libaio-randrw-4k --filename=/dev/sdb --direct=1 --size=60G
> --runtime=60

Looking at this issue it seems unlikely that this is a vfs bug.
We should see this all over the place and specifically not just on arm64.

The sequence here seems to be:

echo 4 > /proc/sys/vm/drop_caches
rmmod scsi_debug > /dev/null 3>&1

[ 3117.059778] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278
[ 3117.067601] Modules linked in: scsi_debug nvme nvme_core nvme_common null_blk pktcdvd ipmi_watchdog ipmi_poweroff rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu igb ipmi_devintf ipmi_msghandler arm_cmn arm_dmc620_pmu cppc_cpufreq arm_dsu_pmu acpi_tad loop fuse zram xfs crct10dif_ce polyval_ce polyval_generic ghash_ce sbsa_gwdt ast onboard_usb_hub i2c_algo_bit xgene_hwmon [last unloaded: scsi_debug]

So my money is on some device that gets removed still having an
increased refcount and pinning the dentry. Immediate suspects would be:

7882541ca06d ("of/platform: increase refcount of fwnode")

but that part is complete speculation on my part.