Re: [PATCH v2] lockdep: Allow tuning tracing capacity constants.

From: Tetsuo Handa
Date: Sun Sep 27 2020 - 20:25:21 EST


On 2020/09/16 21:14, Dmitry Vyukov wrote:
> On Wed, Sep 16, 2020 at 1:51 PM <peterz@xxxxxxxxxxxxx> wrote:
>>
>> On Wed, Sep 16, 2020 at 01:28:19PM +0200, Dmitry Vyukov wrote:
>>> On Fri, Sep 4, 2020 at 6:05 PM Tetsuo Handa
>>> <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>>>>
>>>> Hello. Can we apply this patch?
>>>>
>>>> This patch addresses top crashers for syzbot, and applying this patch
>>>> will help utilizing syzbot's resource for finding other bugs.
>>>
>>> Acked-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
>>>
>>> Peter, do you still have concerns with this?
>>
>> Yeah, I still hate it with a passion; it discourages thinking. A bad
>> annotation that blows up the lockdep storage, no worries, we'll just
>> increase this :/
>>
>> IIRC the issue with syzbot is that the current sysfs annotation is
>> pretty terrible and generates a gazillion classes, and syzbot likes
>> poking at /sys a lot and thus floods the system.
>>
>> I don't know enough about sysfs to suggest an alternative, and haven't
>> exactly had spare time to look into it either :/
>>
>> Examples of bad annotations is getting every CPU a separate class, that
>> leads to nr_cpus! chains if CPUs arbitrarily nest (nr_cpus^2 if there's
>> only a single nesting level).
>
> Maybe on "BUG: MAX_LOCKDEP_CHAINS too low!" we should then aggregate,
> sort and show existing chains so that it's possible to identify if
> there are any worst offenders and who they are.
>
> Currently we only have a hypothesis that there are some worst
> offenders vs lots of normal load. And we can't point fingers which
> means that, say, sysfs, or other maintainers won't be too inclined to
> fix anything.
>
> If we would know for sure that lock class X is guilty. That would make
> the situation much more actionable.
>

Dmitry is thinking that we need to use CONFIG_LOCKDEP=n temporary until lockdep
problems are resolved. ( https://github.com/google/syzkaller/issues/2140 )

But I think it is better to apply this patch (and revert this patch when it became
possible to identify if there are any worst offenders and who they are) than using
CONFIG_LOCKDEP=n.

CONFIG_LOCKDEP=n causes "#syz test" request to cause false response regarding locking
related issues, for we are not ready to enforce "retest without proposed patch
when test with proposed patch did not reproduce the crash".

I think that "not detecting lock related problems introduced by new patches" costs
more than "postpone fixing lock related problems in existing code".