Re: [PATCHv7 2/2] watchdog/softlockup: report the most frequent interrupts

From: Thomas Gleixner
Date: Mon Feb 19 2024 - 17:15:25 EST


On Mon, Feb 19 2024 at 17:12, Bitao Hu wrote:
> On 2024/2/15 19:30, Thomas Gleixner wrote:
>> On Wed, Feb 14 2024 at 10:14, Bitao Hu wrote:
>>> + for (i = 0; i < NUM_HARDIRQ_REPORT; i++) {
>>> + if (irq_counts_sorted[i].irq == -1)
>>> + break;
>>> +
>>> + desc = irq_to_desc(irq_counts_sorted[i].irq);
>>> + if (desc && desc->action)
>>> + printk(KERN_CRIT "\t#%u: %-10u\tirq#%d(%s)\n",
>>> + i + 1, irq_counts_sorted[i].counts,
>>> + irq_counts_sorted[i].irq, desc->action->name);
>>
>> You cannot dereference desc->action here:
>>
>> 1) It can be NULL'ed between check and dereference.
>>
>> 2) Both 'action' and 'action->name' can be freed in parallel
>>
>> And no, you cannot take desc->lock here to prevent this. Stop fiddling
>> in the internals of interrupt descriptors.
>
> Thanks for your analysis. However, I have a question. 'action->name'
> cannot be accessed here, and it seems that merely outputting the
> irq number provides insufficient information?

That's what you can access without risk. It's better than nothing, no?

>> and the analysis boils down to:
>>
>> u64 cnt, sorted[3] = {};
>> unsigned int irq, i;
>>
>> for_each_active_irq(irq) {
>> cnt = kstat_get_irq_since_snapshot(irq);
>>
>> if (cnt) {
>> for (cnt = (cnt << 32) + irq, i = 0; i < 3; i++) {
>> if (cnt > sorted[i])
>> swap(cnt, sorted[i]);
> Hmm, I think the approach here isn't optimal. If some interrupts
> have the same count, then it effectively results in sorting by the
> irq number. Is my understanding correct?

Sure, but what's the problem? If two interrupts have the same count then
the ordering is pretty much irrelevant, no?

Thanks,

tglx