Re: [PATCH v3 1/1] ACPICA: utdebug: use spinlocks to fix the data-races reported by the KCSAN

From: Mirsad Todorovac
Date: Thu Sep 21 2023 - 15:38:30 EST




On 9/20/23 19:03, Rafael J. Wysocki wrote:
On Wed, Sep 20, 2023 at 6:30 PM Mirsad Goran Todorovac
<mirsad.todorovac@xxxxxxxxxxxx> wrote:

KCSAN reported hundreds of instances of data-races in ACPICA like this one:

[ 6.994149] ==================================================================
[ 6.994443] BUG: KCSAN: data-race in acpi_ut_status_exit / acpi_ut_trace

[ 6.994795] write to 0xffffffffbae5a884 of 4 bytes by task 0 on cpu 2:
[ 6.994944] acpi_ut_status_exit (/home/marvin/linux/kernel/torvalds2/drivers/acpi/acpica/utdebug.c:467)
[ 6.994957] acpi_hw_register_read (/home/marvin/linux/kernel/torvalds2/drivers/acpi/acpica/hwregs.c:563)
[ 6.994968] acpi_read_bit_register (/home/marvin/linux/kernel/torvalds2/drivers/acpi/acpica/hwxface.c:171)
[ 6.994980] acpi_idle_bm_check (/home/marvin/linux/kernel/torvalds2/drivers/acpi/processor_idle.c:511)
[ 6.994990] acpi_idle_enter_bm (/home/marvin/linux/kernel/torvalds2/drivers/acpi/processor_idle.c:644 (discriminator 1))
[ 6.995000] acpi_idle_enter (/home/marvin/linux/kernel/torvalds2/drivers/acpi/processor_idle.c:695)
[ 6.995010] cpuidle_enter_state (/home/marvin/linux/kernel/torvalds2/drivers/cpuidle/cpuidle.c:267)
[ 6.995019] cpuidle_enter (/home/marvin/linux/kernel/torvalds2/drivers/cpuidle/cpuidle.c:390)
[ 6.995027] call_cpuidle (/home/marvin/linux/kernel/torvalds2/kernel/sched/idle.c:135)
[ 6.995038] do_idle (/home/marvin/linux/kernel/torvalds2/kernel/sched/idle.c:219 /home/marvin/linux/kernel/torvalds2/kernel/sched/idle.c:282)
[ 6.995046] cpu_startup_entry (/home/marvin/linux/kernel/torvalds2/kernel/sched/idle.c:378 (discriminator 1))
[ 6.995055] start_secondary (/home/marvin/linux/kernel/torvalds2/arch/x86/kernel/smpboot.c:210 /home/marvin/linux/kernel/torvalds2/arch/x86/kernel/smpboot.c:294)
[ 6.995066] secondary_startup_64_no_verify (/home/marvin/linux/kernel/torvalds2/arch/x86/kernel/head_64.S:433)

[ 6.995121] read to 0xffffffffbae5a884 of 4 bytes by task 0 on cpu 9:
[ 6.995267] acpi_ut_trace (/home/marvin/linux/kernel/torvalds2/drivers/acpi/acpica/utdebug.c:263)
[ 6.995279] acpi_hw_validate_io_request (/home/marvin/linux/kernel/torvalds2/drivers/acpi/acpica/hwvalid.c:101)
[ 6.995291] acpi_hw_read_port (/home/marvin/linux/kernel/torvalds2/drivers/acpi/acpica/hwvalid.c:202)
[ 6.995303] acpi_hw_read (/home/marvin/linux/kernel/torvalds2/drivers/acpi/acpica/hwregs.c:251)
[ 6.995313] acpi_hw_register_read (/home/marvin/linux/kernel/torvalds2/drivers/acpi/acpica/hwregs.c:725 /home/marvin/linux/kernel/torvalds2/drivers/acpi/acpica/hwregs.c:499)
[ 6.995325] acpi_read_bit_register (/home/marvin/linux/kernel/torvalds2/drivers/acpi/acpica/hwxface.c:171)
[ 6.995336] acpi_idle_bm_check (/home/marvin/linux/kernel/torvalds2/drivers/acpi/processor_idle.c:511)
[ 6.995346] acpi_idle_enter_bm (/home/marvin/linux/kernel/torvalds2/drivers/acpi/processor_idle.c:644 (discriminator 1))
[ 6.995356] acpi_idle_enter (/home/marvin/linux/kernel/torvalds2/drivers/acpi/processor_idle.c:695)
[ 6.995366] cpuidle_enter_state (/home/marvin/linux/kernel/torvalds2/drivers/cpuidle/cpuidle.c:267)
[ 6.995375] cpuidle_enter (/home/marvin/linux/kernel/torvalds2/drivers/cpuidle/cpuidle.c:390)
[ 6.995383] call_cpuidle (/home/marvin/linux/kernel/torvalds2/kernel/sched/idle.c:135)
[ 6.995394] do_idle (/home/marvin/linux/kernel/torvalds2/kernel/sched/idle.c:219 /home/marvin/linux/kernel/torvalds2/kernel/sched/idle.c:282)
[ 6.995402] cpu_startup_entry (/home/marvin/linux/kernel/torvalds2/kernel/sched/idle.c:378 (discriminator 1))
[ 6.995411] start_secondary (/home/marvin/linux/kernel/torvalds2/arch/x86/kernel/smpboot.c:210 /home/marvin/linux/kernel/torvalds2/arch/x86/kernel/smpboot.c:294)
[ 6.995422] secondary_startup_64_no_verify (/home/marvin/linux/kernel/torvalds2/arch/x86/kernel/head_64.S:433)

[ 6.995476] value changed: 0x00000004 -> 0x00000002

[ 6.995629] Reported by Kernel Concurrency Sanitizer on:
[ 6.995748] CPU: 9 PID: 0 Comm: swapper/9 Not tainted 6.6.0-rc2-kcsan-00003-g16819584c239-dirty #21
[ 6.995758] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
[ 6.995765] ==================================================================

Please find the complete list at: https://domac.alu.unizg.hr/~mtodorov/linux/patches/acpica_utdebug/acpi_ut_status_exit.log.xz

A number of unprotected increments:

acpi_gbl_nesting_level++;

and conditional statements:

if (acpi_gbl_nesting_level) {
acpi_gbl_nesting_level--;
}

no longer work in SMP environment.

Proper locking like

spin_lock(&acpi_utdebug_lock);
acpi_gbl_nesting_level++;
spin_unlock(&acpi_utdebug_lock);

and

spin_lock(&acpi_utdebug_lock);
if (acpi_gbl_nesting_level) {
acpi_gbl_nesting_level--;
}
spin_unlock(&acpi_utdebug_lock);

makes these data-races go away.

Additionally, READ_ONCE() or WRITE_ONCE() is required with the global variable
acpi_gbl_nesting_level to prevent unwanted read or write reordering or other funny
stuff the optmisers do.

The patch eliminates KCSAN BUG warnings.

Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@xxxxxxxxxxxx>
Fixes: 6be2d72b18649 ("ACPICA: Update for a few debug output statements")
Fixes: bf9b448ef8430 ("ACPICA: Debug output: Do not emit function nesting level for kernel build.")
Fixes: 6e875fa0480c1 ("ACPICA: Debugger: fix slight indentation issue")
Fixes: ^1da177e4c3f4 ("Initial git repository build.")
Cc: Jung-uk Kim <jkim@xxxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Erik Kaneda <erik.kaneda@xxxxxxxxx>
Cc: Bob Moore <robert.moore@xxxxxxxxx>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@xxxxxxxxx>
Cc: Len Brown <lenb@xxxxxxxxxx>
Cc: linux-acpi@xxxxxxxxxxxxxxx
Cc: acpica-devel@xxxxxxxxxxxxxxxxxxxxxxxxx
Reported-by: kernel test robot <lkp@xxxxxxxxx>
Closes: https://lore.kernel.org/oe-kbuild-all/202309201331.S2c1JL2h-lkp@xxxxxxxxx/
Closes: https://lore.kernel.org/oe-kbuild-all/202309201308.5ZBJFbjh-lkp@xxxxxxxxx/
Link: https://github.com/acpica/acpica/pull/893
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@xxxxxxxxxxxx>
---
v2 -> v3:
Made the pull request at the ACPICA github.
Fixed DEFINE_SPINLOCK() undefined warning by putting it inside #ifdef ACPI_DEBUG_OUTPUT
... #endif
Capitalised ACPICA according to the convention.

v1 -> v2:
Moved the declaration of 'u32 nesting_level' inside #ifdef ACPI_APPLICATION ... #endif
according to the unused variable warning of the kernel test robot.

v1:
Preliminary RFC version of the patch.

drivers/acpi/acpica/utdebug.c | 49 +++++++++++++++++++++++++----------
1 file changed, 35 insertions(+), 14 deletions(-)

diff --git a/drivers/acpi/acpica/utdebug.c b/drivers/acpi/acpica/utdebug.c
index c5f6c85a3a09..594eee294f0f 100644
--- a/drivers/acpi/acpica/utdebug.c
+++ b/drivers/acpi/acpica/utdebug.c
@@ -15,8 +15,9 @@

#define _COMPONENT ACPI_UTILITIES
ACPI_MODULE_NAME("utdebug")
-
#ifdef ACPI_DEBUG_OUTPUT
+static DEFINE_SPINLOCK(acpi_utdebug_lock);

No, you can't do that in the ACPICA code. Likewise for other
Linux-specific pieces of code in the patch below.

I'd very much prefer you to send a proper problem report instead of
sending patches like this.

I was so enthusiastic about the patch that cleared all the KCSAN data-race bug warnings,
that I must have forgot about sending one.

Patching of hundreds of annoying KCSAN acpi warnings seemed so good that I forgot about the
procedure, sorry.

I am aware that the maintainers have to carefully evaluate any contribution in order for the
kernel not becoming bloated and unstable ...

As it stands, I'm not even sure what problem exactly this is
attempting to address.

I hope the attached excerpt from the dmesg log will help.

Thanks!

Anytime.

Best regards,
Mirsad Todorovac

Attachment: acpi-kcsan-20230921-01.log.xz
Description: application/xz

Attachment: config-6.6.0-rc2-kcsan-00143-gb5cbe7c00aa0.xz
Description: application/xz