UI messages in event thread hangs perf top

From: David Miller
Date: Sun Oct 28 2018 - 00:11:07 EST



If I run perf top with a "make -j128" kernel build, I get ring buffer event
processing timeouts which results in:

ui__warning("Too slow to read ring buffer.\n"
"Please try increasing the period (-c) or\n"
"decreasing the freq (-F) or\n"
"limiting the number of CPUs (-C)\n");

from perf_top__mmap_read().

This hangs the main event thread. Only the display thread runs after
this point.

We can't issue UI messages from the event thread, because those will
hang waiting for a keypress. The display thread will eat any keys
we press and the event thread thus hangs forever.

I can tell this is what has happened because the histogram entries
continue to decay, yet the event count stops increasing.

If I put a gdb on the perf process, indeed the backtrace in the event
processing thread is in the select() call done by ui__getch().

Adding insult to injury, the display thread immediately overwrites the
warning message printed by the event thread, and thus the user has no
chance to even see it.

I really wonder how this was tested.

Perhaps we should mark the event thread in a special way and trigger
assertions if UI messages are printed from it. Again, any such
operation will hang the thread and stop all event processing.