Re: regression/bisected/6.8 commit 5d390df3bdd13d178eb2e02e60e9a480f7103f7b prevents the system going into suspend mode

From: Alexey Dobriyan
Date: Fri Mar 08 2024 - 12:04:04 EST


On Fri, Mar 08, 2024 at 05:48:04PM +0500, Mikhail Gavrilov wrote:
> On Fri, Mar 8, 2024 at 11:15 AM Alexey Dobriyan <adobriyan@xxxxxxxxx> wrote:
> >
> > What? Deleting unused defines breaks suspend?
> >
> > Collect fs/smb/client/smbencrypt.o with and without patch and
> > see them being identical.
> >
> > Enum in stddef.h are
> >
> > enum {
> > false = 0,
> > true = 1,
> > };
> >
> > so if defines were used somehow they would expand to same values of
> > same type.
> >
> > Something else is going on.
>
> I understand your confusion.
> But I didn't come up with it. And moreover, I saw what the revert does.

> Why did this really help is a question to which I would like to find an answer.

OK, lets exclude newbie mistakes.

Exclude CIFS:

* start with clean compile into out-of-tree directory

mkdir ../obj-001
cp .config ../obj-001/.config
make -k -j$(nproc) O=../obj-001 # buggy kernel
sudo rm -rf /lib/modules/$(uname -r) # no mixed module copies
sudo make O=../obj-001 modules_install
sudo make O=../obj-001 install

[patch]

mkdir ../obj-002
...

This is what I use in Production(tm):

#!/bin/sh -x
sudo rm -rf /lib/modules/$(uname -r) &&\
sudo make modules_install &&\
sudo make install &&\
sudo emerge @module-rebuild &&\
sudo grub-mkconfig -o /boot/grub/grub.cfg &&\
sync &&\
sudo nvme flush /dev/nvme*n1

* After rebooting double check that build number in /proc/version
matches .version in the ../obj directory:

$ cat /proc/version
Linux version 6.7.4-100.fc38.x86_64 (mockbuild@68dbdffd8a2b4619991006cfcbec2871) (gcc (GCC) 13.2.1 20231011 (Red Hat 13.2.1-4), GNU ld version 2.39-16.fc38) [[[[[ ===> #1 <=== ]]]]] SMP PREEMPT_DYNAMIC Mon Feb 5 22:19:06 UTC 2024

$ cat ../obj/.version
1

This verifies that you've rebooted into correct kernel.

* keep both full kernel trees in two separate directories

if both vmlinux are identical, you may try to find which modules
are different

* disassemble fs/smb/client/smbencrypt.o or (cifs.ko) for both kernels

objdump -M intel -dr $(find ../obj-001 -type f -name cifs.ko) >000.s
objdump -M intel -dr $(find ../obj-002 -type f -name cifs.ko) >001.s
diff -u0 000.s 001.s

For your experiment, number should be 1 (first clean recompile from
scratch) and then 2 (after applying 1 patch).

If the bug is not 100% reproducible, then bisecting gets more
entertaining because you can't be really sure each step is in the right
direction.

> The most interesting thing is that I have two identical systems:
> Identical:
> - M/B - MSI MPG B650I EDGE WIFI
> - CPU - AMD Ryzen 7950x
> - GPU - AMD Radeon 7900XTX
> - SSD1 for system - Intel Optane 905P SSDPE21D480GAM3
> - SSD2 for data - Intel D5 P5316 Series SSDPF2NV307TZN1
> - PSU - Asus ROG LOKI SFX-L 1000W Platinum
> - Mouse - Logitech MX Master 3s
> - Keyboard - MX Keys Mini
> - Linux distro (identical version of all software) - Fedora Rawhide
> On one system this bug is present, on the other it is not.
>
> Affected system: https://linux-hardware.org/?probe=9a5a8c0338
> Not affected system: https://linux-hardware.org/?probe=37c62300bb