Re: WARNING: CPU: 1 PID: 14735 at fs/dcache.c:365 dentry_free+0x100/0x128

From: Helge Deller
Date: Sat Jul 30 2022 - 16:22:47 EST


On 7/21/22 05:54, Helge Deller wrote:
> On 7/21/22 01:15, Sam James wrote:
>>> On 20 Jul 2022, at 18:06, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> On Wed, Jul 20, 2022 at 07:00:32PM +0800, Hillf Danton wrote:
>>>
>>>> To help debug it, de-union d_in_lookup_hash with d_alias and add debug
>>>> info after dentry is killed. If any warning hits, we know where to add
>>>> something like
>>>>
>>>> WARN_ON(dentry->d_flags & DCACHE_DENTRY_KILLED);
>>>>
>>>> before hlist_bl_add or hlist_add.
>>
>>> [snip]
>>> I wonder if anyone had seen anything similar outside of parisc...
>
> Me too.
> Of course it could be caused by the platform code, as we have had
> issues with caches, spinlocks and so on.
> On older kernels we also have seen RCU stalls in d_alloc_parallel().
>
>>> I don't know if I have any chance to reproduce it here - the only
>>> parisc box I've got is a 715/100 (assuming the disk is still alive)
>>> and it's 32bit, unlike the reported setups and, er, not fast.
>
> It's fun to boot it, but it will be too slow for actual testing.
>
>>> qemu seems to have some parisc support, but it's 32bit-only at the
>>> moment...
>
> Yes. I think it will be hard to reproduce it in the VM.
>
>> I don't think I've seen this on parisc either, but I don't think
>> I've used tmpfs that heavily. I'll try it in case it's somehow more
>> likely to trigger it.
>
> It happened on the debian buildd server with tmpfs. To rule out tmpfs
> I switched to ext4 (on SATA SSD) and it happened there as well.
> I assume Dave's report is on ext3/ext4 with SCSI discs.
>
>> Helge, were there any particular steps to reproduce this? Or just
>> start doing your normal Debian builds on a tmpfs and it happens
>> soon enough?
>
> Currently it's not easy to reproduce for me either.
> It happens on the debian buildd server (4-way c8000 machine) while building
> the webkit2gtk package. I think it happens at the end when sbuild
> cleans the build directories by deleting all files.
> Maybe there is a filesystem test toolkit which you could try which hammers
> the fs by deleting lots of files in parallel?

I currently can't reproduce the issue any longer.
In case it pops up again, I'll follow up here again.

Helge