Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())

From: Mirsad Goran Todorovac
Date: Sun Nov 20 2022 - 23:05:04 EST


On 20. 11. 2022. 20:21, Paul E. McKenney wrote:

And what about the Mr. Robert Elliott's observation about calling conf_recshed()?

How big can these readahead sizes be? Should one of the loops include
cond_resched() calls?

That is IMHO better than allowing 21000 milisecond stalls on a core (or more of them).

I don't think it is correct to stay in kernel mode for more than an timer unit
without yielding the CPU. It creates stalls in multimedia and audio (chirps like on scratched
CD-ROMs). This is especially noticeable with a KASAN build.

Since Firefox and most snaps are using squashfs as compressed ROFS, the Firefox appears
to perform poorer since snaps are introduced than Chrome.

IMHO, if we want something like realtime and multimedia processing (which is the specific
area of my research), it seems that anything trying to hold processor for 21000 ms (21 secs)
is either buggy or deliberately malicious. 20 ms is quite enough of work for a threat
in one allotted timeslot.

I do not agree with Mr. Lougher's observation that I am thrashing my laptop. I think that
a system has to endure stress and torture testing. I was raised on Digital MicroVAX systems
on Ultrix which compiled lab at a time in memory that would today sound funny. :)

I personally think that it would be great if you were to work to decrease
the Linux kernel's latency. Doing so would not be fixing a regression,
but I personally would welcome it. Others might have different opinions,
but please do CC me on any resulting patches.

And I will see your MicroVAX and raise you a videogame written on a
PDP-12 whose fastest instruction executed in 1.6 microseconds (-not-
nanoseconds!). ;-)

I'm afraid that I would lose in Far Cry miserably if my cores
decided to all lock up for 21 secs. :-(

You can can see a couple of people playing the game on a PDP-12 in
a computer museum: https://www.rcsri.org/collection/pdp-12/

Besides, this is the very idea behind the MG-LRU algorithm commit, to test eviction of
memory pages in the system with heavy load and low on memory.

I will probably test your commits, but now I have to do my own evening ritual, unwinding,
and knowledge and memory consolidation (called "sleep").

And yes, sleep is often one of the best debugging tools available.

I appreciate your lots of commits on the kernel.org and I hope I do not sound like
I am thinking you are a village idiot :(

I am trying to adhere to the Code of Conduct with mutual respect and politeness.

Skepticism is not necessarily a bad thing, especially given that I
am not immune from errors and confusion. Me, I just thought you were
forcefully reporting the regression, so I forcefully pointed you at the
fix for that regression.

Again, I have absolutely no objection to your improving the kernel's
response time.

This is at present just the wishful thinking, as I lack your 30 years of
experience with the kernel and RCU update system. I am only beginning to realise
why it is more efficient than the traditional locking, and IMHO it should
avoid locking up cores instead of increasing the number of complaints.

But even if the Linux kernel source is magically "memory mapped" into my
mind, I still do not see how it could be done. My Linux kernel learning curve
had not yet got that up, but I have no doubts that it is designed by
Intelligent Designers who are very witty people, and not village idiots ;-)

I know that the Linux kernel is about 30 million lines by now, and by the security experts
we should expect 30,000 bugs in such a solid piece of written code (one per thousand of
lines). Only Mr. Thorsten mentioned 950 unresolved in the "open" list.

At least 30,000 bugs, of which we know of maybe 950. ;-)

So I need no point in banning the kernel from screaming to logs that it had
core stalls that needed a physical NMI to recover from, or they would potentially
last much longer.

Knowing all of this is difficult, but I still believe in open source and open systems
interconnected.

If it was easy, where would be the challenge?

AFAIK, the point I was taught in life was obedience, not overcoming challenges.

Of course, I always remember a proverb "Who hath despised the day of the small beginnings?"

Hope this helps. My $0.02.

I think we are good. ;-)

Yes, you guys do an amasing job of keeping 30 million lines of code organised
and making some sense. I will cut the smalltalk as I know you are a busy man.
If I make a progress to actually produce any patches fixing these lockups and
stalls, I will be sure to include you into CC: as you requested.

Have a nice day!

Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
--
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union