[REPORT] Request for reviewing crypto code wrt wait_for_completion()

From: Byungchul Park
Date: Fri Aug 06 2021 - 04:04:39 EST


Hello crypto folks,

I developed a tool for tracking waiters and reporting if any of the
events that the waiters are waiting for would never happen, say, a
deadlock. Yes, it would look like Lockdep but more inclusive.

While I ran the tool(Dept: Dependency Tracker) on v5.4.96, I got some
reports from the tool. One of them is related to crypto subsystem.
Because I'm not that familiar with the code, I'd like to ask you guys to
review the related code.

If I understand correctly, it doesn't actually cause deadlock but looks
like a problematic code. I know you are not used to the format of the
report from Dept so.. let me summerize the result.

The simplified call trace looks like when the problem araised :

THREAD A
--------
A1 crypto_alg_mod_lookup()
A2 crypto_probing_notify(CRYPTO_MSG_ALG_REQUEST)
A3 cryptomgr_schedule_probe()
A4 kthread_run(cyptomgr_probe) ---> Start THREAD B

A5 crypto_larval_wait()
A6 wait_for_completion_killable_timeout(c) /* waiting for B10 */

THREAD B
--------
B1 cryptomgr_probe()
B2 pkcslpad_create()
B3 crypto_wait_for_test()
B4 crypto_probing_notify(CRYPTO_MSG_ALG_REGISTER)
B5 cryptomgr_schedule_test()
B6 kthread_run(cyptomgr_test) ---> Start THREAD C

B7 tmpl->alloc()
B8 crupto_register_instance()
B9 wait_for_completion_killable(c) /* waiting for C3 */
B10 complete_all(c)

THREAD C
--------
C1 cryptomgr_test()
C2 crypto_alg_tested()
C3 complete_all(c)

---

For example, in this situation, I think C3 could wake up both A6 and B9
before THREAD B reaches B10 which is not desired by A6. Say, is it okay
to wake up A6 with B7 ~ B9 having yet to complete?

Sorry if I misunderstand the code. It looks so complicated to me. Could
you check if the code is good?

Just FYI, the below is the report from the tool, Dept, I developed.

Thanks,
Byungchul

---

[ 10.520128 ] ===================================================
[ 10.526037 ] Dept: Circular dependency has been detected.
[ 10.531337 ] 5.4.96-242 #1 Tainted: G W
[ 10.536375 ] ---------------------------------------------------
[ 10.542280 ] summary
[ 10.544366 ] ---------------------------------------------------
[ 10.550271 ] *** AA DEADLOCK ***
[ 10.550271 ]
[ 10.554875 ] context A
[ 10.557136 ] [S] (unknown)(&larval->completion:0)
[ 10.562087 ] [W] wait_for_completion_killable(&larval->completion:0)
[ 10.568688 ] [E] complete_all(&larval->completion:0)
[ 10.573898 ]
[ 10.575377 ] [S]: start of the event context
[ 10.579546 ] [W]: the wait blocked
[ 10.582848 ] [E]: the event not reachable
[ 10.586757 ] ---------------------------------------------------
[ 10.592662 ] context A's detail
[ 10.595703 ] ---------------------------------------------------
[ 10.601608 ] context A
[ 10.603868 ] [S] (unknown)(&larval->completion:0)
[ 10.608819 ] [W] wait_for_completion_killable(&larval->completion:0)
[ 10.615419 ] [E] complete_all(&larval->completion:0)
[ 10.620630 ]
[ 10.622109 ] [S] (unknown)(&larval->completion:0):
[ 10.626799 ] (N/A)
[ 10.628712 ]
[ 10.630191 ] [W] wait_for_completion_killable(&larval->completion:0):
[ 10.636537 ] [<ffffffc0104dfc20>] crypto_wait_for_test+0x40/0x80
[ 10.642443 ] stacktrace:
[ 10.644881 ] wait_for_completion_killable+0x34/0x160
[ 10.650267 ] crypto_wait_for_test+0x40/0x80
[ 10.654871 ] crypto_register_instance+0xb0/0xe0
[ 10.659824 ] akcipher_register_instance+0x30/0x38
[ 10.664950 ] pkcs1pad_create+0x238/0x2b0
[ 10.669295 ] cryptomgr_probe+0x40/0xd0
[ 10.673467 ] kthread+0x150/0x188
[ 10.677118 ] ret_from_fork+0x10/0x18
[ 10.681114 ]
[ 10.682592 ] [E] complete_all(&larval->completion:0):
[ 10.687544 ] [<ffffffc0104e8d20>] cryptomgr_probe+0xb0/0xd0
[ 10.693016 ] stacktrace:
[ 10.695452 ] complete_all+0x30/0x70
[ 10.699362 ] cryptomgr_probe+0xb0/0xd0
[ 10.703532 ] kthread+0x150/0x188
[ 10.707181 ] ret_from_fork+0x10/0x18
[ 10.711177 ] ---------------------------------------------------
[ 10.717083 ] information that might be helpful
[ 10.721426 ] ---------------------------------------------------
[ 10.727334 ] CPU: 0 PID: 1787 Comm: cryptomgr_probe Tainted: G W
5.4.96-242 #1
[ 10.735757 ] Hardware name: LG Electronics, DTV SoC LG1213 (AArch64) (DT)
[ 10.742444 ] Call trace:
[ 10.744879 ] dump_backtrace+0x0/0x148
[ 10.748529 ] show_stack+0x14/0x20
[ 10.751833 ] dump_stack+0xd0/0x12c
[ 10.755223 ] print_circle+0x3b0/0x3f8
[ 10.758873 ] cb_check_dl+0x54/0x70
[ 10.762262 ] bfs+0x64/0x1a0
[ 10.765043 ] add_dep+0x90/0xb8
[ 10.768086 ] dept_event+0x4c8/0x560
[ 10.771562 ] complete_all+0x30/0x70
[ 10.775038 ] cryptomgr_probe+0xb0/0xd0
[ 10.778774 ] kthread+0x150/0x188
[ 10.781989 ] ret_from_fork+0x10/0x18
[ 10.786091 ] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[ 10.792783 ] platform regulatory.0: Direct firmware load for
regulatory.db failed with error -2
[ 10.796148 ] ALSA device list:
[ 10.801423 ] cfg80211: failed to load regulatory.db