Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot

From: Michael Bringmann
Date: Tue Jun 06 2017 - 12:18:49 EST




On 05/25/2017 10:30 AM, Michael Bringmann wrote:
> I will try that patch shortly. I also updated my patch to be conditional
> on whether the pool's cpumask attribute was empty. You should have received
> V2 of that patch by now.

Let's try this again.

The hotplug problem goes away with the changes that you provided earlier, and
shown in the patch below. I kept this change to get_unbound_pool' as a just
in case to explain the crash in the event that it occurs again:

if (!cpumask_weight(pool->attrs->cpumask))
cpumask_copy(pool->attrs->cpumask, cpumask_of(smp_processor_id()));

I could also insert

BUG(!cpumask_weight(pool->attrs->cpumask, cpumask_of(smp_processor_id()));

at that place, but I really prefer not to crash the system if there is a workaround.


> On 05/25/2017 10:07 AM, Tejun Heo wrote:
>> On Thu, May 25, 2017 at 11:03:53AM -0400, Tejun Heo wrote:
>>> wq_update_unbound_numa() should have never called into
>>> alloc_unbound_pwq() w/ empty node cpu mask. It should have fallen
>>> back to the dfl_pwq. It looks like I just messed up the logic there
>>> from the initial commit of the feature. Can you please see whether
>>> the following fixes the problem?
>>
>> Can you please try the following instead. On the second thought, I
>> don't think the current logic is wrong. If this fixes the issue,
>> somehow your setup is having a situation where online cpumask for a
>> node is a proper superset of possible cpumask for the node.
>>
>> Thanks.
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index c74bf39ef764..4da5ff649ff8 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -3559,13 +3559,13 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
>> * stable.
>> *
>> * Return: %true if the resulting @cpumask is different from @attrs->cpumask,
>> - * %false if equal.
>> + * %false if equal. On %false return, the content of @cpumask is undefined.
>> */
>> static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
>> int cpu_going_down, cpumask_t *cpumask)
>> {
>> if (!wq_numa_enabled || attrs->no_numa)
>> - goto use_dfl;
>> + return false;
>>
>> /* does @node have any online CPUs @attrs wants? */
>> cpumask_and(cpumask, cpumask_of_node(node), attrs->cpumask);
>> @@ -3573,15 +3573,13 @@ static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
>> cpumask_clear_cpu(cpu_going_down, cpumask);
>>
>> if (cpumask_empty(cpumask))
>> - goto use_dfl;
>> + return false;
>>
>> /* yeap, return possible CPUs in @node that @attrs wants */
>> cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]);
>> - return !cpumask_equal(cpumask, attrs->cpumask);
>>
>> -use_dfl:
>> - cpumask_copy(cpumask, attrs->cpumask);
>> - return false;
>> + return !cpumask_empty(cpumask) &&
>> + !cpumask_equal(cpumask, attrs->cpumask);
>> }
>>
>> /* install @pwq into @wq's numa_pwq_tbl[] for @node and return the old pwq */
>>
>>

> Can you please post the messages with the debug patch from the prev
> thread? In fact, let's please continue on that thread. I'm having a
> hard time following what's going wrong with the code.

Are these the failure logs that you requested?


Red Hat Enterprise Linux Server 7.3 (Maipo)
Kernel 4.12.0-rc1.wi91275_debug_03.ppc64le+ on an ppc64le

ltcalpine2-lp20 login: root
Password:
Last login: Wed May 24 18:45:40 from oc1554177480.austin.ibm.com
[root@ltcalpine2-lp20 ~]# numactl -H
available: 2 nodes (0,6)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 6 size: 19858 MB
node 6 free: 16920 MB
node distances:
node 0 6
0: 10 40
6: 40 10
[root@ltcalpine2-lp20 ~]# numactl -H
available: 2 nodes (0,6)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
node 6 size: 19858 MB
node 6 free: 16362 MB
node distances:
node 0 6
0: 10 40
6: 40 10
[root@ltcalpine2-lp20 ~]# [ 321.310943] workqueue:get_unbound_pool has empty cpumask for pool attrs
[ 321.310961] ------------[ cut here ]------------
[ 321.310997] WARNING: CPU: 184 PID: 13201 at kernel/workqueue.c:3375 alloc_unbound_pwq+0x5c0/0x5e0
[ 321.311005] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag sg pseries_rng ghash_generic gf128mul xts vmx_crypto binfmt_misc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
[ 321.311097] CPU: 184 PID: 13201 Comm: cpuhp/184 Not tainted 4.12.0-rc1.wi91275_debug_03.ppc64le+ #8
[ 321.311106] task: c000000408961080 task.stack: c000000406394000
[ 321.311113] NIP: c000000000116c80 LR: c000000000116c7c CTR: 0000000000000000
[ 321.311121] REGS: c0000004063977b0 TRAP: 0700 Not tainted (4.12.0-rc1.wi91275_debug_03.ppc64le+)
[ 321.311128] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>
[ 321.311150] CR: 28000082 XER: 00000000
[ 321.311159] CFAR: c000000000a2dc80 SOFTE: 1
[ 321.311159] GPR00: c000000000116c7c c000000406397a30 c0000000013ae900 000000000000003b
[ 321.311159] GPR04: c000000408961a38 0000000000000006 00000000a49e41e5 ffffffffa4a5a483
[ 321.311159] GPR08: 00000000000062cc 0000000000000000 0000000000000000 c000000408961a38
[ 321.311159] GPR12: 0000000000000000 c00000000fb38c00 c00000000011e858 c00000040a902ac0
[ 321.311159] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 321.311159] GPR20: c000000406394000 0000000000000002 c000000406394000 0000000000000000
[ 321.311159] GPR24: c000000405075400 c000000404fc0000 0000000000000110 c0000000015a4c88
[ 321.311159] GPR28: 0000000000000000 c0000004fe256000 c0000004fe256008 c0000004fe052800
[ 321.311290] NIP [c000000000116c80] alloc_unbound_pwq+0x5c0/0x5e0
[ 321.311298] LR [c000000000116c7c] alloc_unbound_pwq+0x5bc/0x5e0
[ 321.311305] Call Trace:
[ 321.311310] [c000000406397a30] [c000000000116c7c] alloc_unbound_pwq+0x5bc/0x5e0 (unreliable)
[ 321.311323] [c000000406397ad0] [c000000000116e30] wq_update_unbound_numa+0x190/0x270
[ 321.311334] [c000000406397b60] [c000000000118eb0] workqueue_offline_cpu+0xe0/0x130
[ 321.311345] [c000000406397bf0] [c0000000000e9f20] cpuhp_invoke_callback+0x240/0xcd0
[ 321.311355] [c000000406397cb0] [c0000000000eab28] cpuhp_down_callbacks+0x78/0xf0
[ 321.311365] [c000000406397d00] [c0000000000eae6c] cpuhp_thread_fun+0x18c/0x1a0
[ 321.311376] [c000000406397d30] [c0000000001251cc] smpboot_thread_fn+0x2fc/0x3b0
[ 321.311386] [c000000406397dc0] [c00000000011e9c0] kthread+0x170/0x1b0
[ 321.311397] [c000000406397e30] [c00000000000b4f4] ret_from_kernel_thread+0x5c/0x68
[ 321.311406] Instruction dump:
[ 321.311413] 3d42fff0 892ac565 2f890000 40fefd98 39200001 3c62ff89 3c82ff6c 3863d590
[ 321.311437] 38847cb0 992ac565 48916fc9 60000000 <0fe00000> 4bfffd70 60000000 60420000
[ 321.311462] ---[ end trace 9f7c1cd616b26de8 ]---
[ 321.318347] Unable to handle kernel paging request for unaligned access at address 0xc0000003c5577ebf
[ 321.318448] Faulting instruction address: 0xc00000000055ec8c
[ 321.318457] Oops: Kernel access of bad area, sig: 7 [#1]
[ 321.318462] SMP NR_CPUS=2048
[ 321.318463] NUMA
[ 321.318468] pSeries
[ 321.318473] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag sg pseries_rng ghash_generic gf128mul xts vmx_crypto binfmt_misc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
[ 321.318524] CPU: 184 PID: 13201 Comm: cpuhp/184 Tainted: G W 4.12.0-rc1.wi91275_debug_03.ppc64le+ #8
[ 321.318532] task: c000000408961080 task.stack: c000000406394000
[ 321.318537] NIP: c00000000055ec8c LR: c0000000001312d4 CTR: c000000000145d50
[ 321.318544] REGS: c000000406397690 TRAP: 0600 Tainted: G W (4.12.0-rc1.wi91275_debug_03.ppc64le+)
[ 321.318551] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>
[ 321.318563] CR: 28000024 XER: 00000000
[ 321.318571] CFAR: c0000000001312d0 DAR: c0000003c5577ebf DSISR: 00000000 SOFTE: 0
[ 321.318571] GPR00: c000000000131298 c000000406397910 c0000000013ae900 c0000004b6d22820
[ 321.318571] GPR04: c0000004b6d22820 c0000003c5577ebf 0000000000000000 00000004f1230000
[ 321.318571] GPR08: 0000000d8ddb1ea7 0000000000000000 0000000000000008 c000000408961aa8
[ 321.318571] GPR12: c000000000145d50 c00000000fb38c00 c00000000011e858 c00000040a902ac0
[ 321.318571] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 321.318571] GPR20: c000000406394000 0000000000000002 0000000000004000 c000000000fb7700
[ 321.318571] GPR24: c0000000013f5d00 c0000000013f9d48 0000000000000000 c0000004b6d230e8
[ 321.318571] GPR28: 0000000000000004 00000003c45bfc57 0000000000000800 c0000004b6d22800
[ 321.318664] NIP [c00000000055ec8c] llist_add_batch+0xc/0x40
[ 321.318670] LR [c0000000001312d4] try_to_wake_up+0x524/0x850
[ 321.318675] Call Trace:
[ 321.318679] [c000000406397910] [c000000000131298] try_to_wake_up+0x4e8/0x850 (unreliable)
[ 321.318689] [c000000406397990] [c000000000111bf8] create_worker+0x148/0x220
[ 321.318696] [c000000406397a30] [c000000000116ae8] alloc_unbound_pwq+0x428/0x5e0
[ 321.318705] [c000000406397ad0] [c000000000116e30] wq_update_unbound_numa+0x190/0x270
[ 321.318713] [c000000406397b60] [c000000000118eb0] workqueue_offline_cpu+0xe0/0x130
[ 321.318721] [c000000406397bf0] [c0000000000e9f20] cpuhp_invoke_callback+0x240/0xcd0
[ 321.318729] [c000000406397cb0] [c0000000000eab28] cpuhp_down_callbacks+0x78/0xf0
[ 321.318737] [c000000406397d00] [c0000000000eae6c] cpuhp_thread_fun+0x18c/0x1a0
[ 321.318745] [c000000406397d30] [c0000000001251cc] smpboot_thread_fn+0x2fc/0x3b0
[ 321.318754] [c000000406397dc0] [c00000000011e9c0] kthread+0x170/0x1b0
[ 321.318762] [c000000406397e30] [c00000000000b4f4] ret_from_kernel_thread+0x5c/0x68
[ 321.318769] Instruction dump:
[ 321.318775] 60420000 38600000 4e800020 60000000 60420000 7c832378 4e800020 60000000
[ 321.318790] 60000000 e9250000 f9240000 7c0004ac <7d4028a8> 7c2a4800 40c20010 7c6029ad
[ 321.318808] ---[ end trace 9f7c1cd616b26de9 ]---
[ 321.322303]
[ 323.322505] Kernel panic - not syncing: Fatal exception
[ 323.429027] Rebooting in 10 seconds..


Regards,

--
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line 363-5196
External: (512) 286-5196
Cell: (512) 466-0650
mwb@xxxxxxxxxxxxxxxxxx