Re: "BUG: using smp_processor_id() in preemptible" with KPTI on 4.14.11

From: Thomas Zeitlhofer
Date: Sat Jan 06 2018 - 16:39:02 EST


On Thu, Jan 04, 2018 at 07:38:00PM +0100, Thomas Zeitlhofer wrote:
> On Thu, Jan 04, 2018 at 06:07:12PM +0100, Peter Zijlstra wrote:
> > On Thu, Jan 04, 2018 at 04:37:24PM +0100, Thomas Gleixner wrote:
> > > > Yes:
> > > >
> > > > BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498
> > > > caller is native_flush_tlb_single+0x57/0xc0
> > > > CPU: 2 PID: 4498 Comm: ovsdb-server Not tainted 4.15.0-rc6-kvm-00423-gea1908c252eb #3
> > > > Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
> > > > Call Trace:
> > > > dump_stack+0x5c/0x86
> > > > check_preemption_disabled+0xdd/0xe0
> > > > native_flush_tlb_single+0x57/0xc0
> > > > ? __set_pte_vaddr+0x2d/0x40
> > > > __set_pte_vaddr+0x2d/0x40
> > > > set_pte_vaddr+0x2f/0x40
> > > > cea_set_pte+0x30/0x40
> > > > ds_update_cea.constprop.4+0x4d/0x70
> > > > reserve_ds_buffers+0x159/0x410
> > > > ? wp_page_copy+0x370/0x6c0
> > > > x86_reserve_hardware+0x150/0x160
> > > > x86_pmu_event_init+0x3e/0x1f0
> > > > perf_try_init_event+0x69/0x80
> > > > perf_event_alloc+0x652/0x740
> > > > SyS_perf_event_open+0x3f6/0xd60
> > > > do_syscall_64+0x5c/0x190
> > > > entry_SYSCALL64_slow_path+0x25/0x25
> > > > RIP: 0033:0x72bff0a3c0b9
> > > > RSP: 002b:00007ffed11c2f18 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
> > > > RAX: ffffffffffffffda RBX: 00007ffed11c30f0 RCX: 000072bff0a3c0b9
> > > > RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007ffed11c2f20
> > > > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
> > > > R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
> > > > R13: 0000000000000000 R14: 00007ffed11c30d0 R15: 000060986ecfb600
> >
> > Fun, so set_pte_vaddr() and the whole cpu_entry_area are supposed to be
> > per CPU. But the DS crud does cross CPU updates of those tables.
> >
> > So we need some additional fun and games..
> >
> > How's the below?
> [...]
>
> Looks good - I have successfully tested it on top of 4.14.11 and
> 4.15-rc6. In both cases, the error message is gone when this patch is
> applied.

While solving the previous problem, this patch also introduces new "fun
and games"...

Now, terminating a systemd-nspawn container, reliably crashes the host
(so far tested only on Haswell, if that matters). Once, I was able to
capture the following trace:

BUG: unable to handle kernel paging request at 0000000000206ccc
IP: __task_pid_nr_ns+0x57/0xc0
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
Modules linked in: uinput veth ip_vti ip_tunnel esp4 xfrm6_mode_tunnel fuse ccm xt_CHECKSUM tun bridge stp llc xfrm_user xfrm_algo ebtable_filter twofish_generic twofish_avx_x86_64 ebtables twofish_x86_64_3way twofish_x86_64 twofish_common vxlan ip6_udp_tunnel udp_tunnel serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic devlink blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic algif_skcipher camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 ablk_helper camellia_x86_64 xcbc openvswitch nf_nat_ipv6 md4 algif_hash af_alg cmac rfcomm bnep xt_policy nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat msr nf_nat_ipv4 nf_nat xt_TCPMSS iptable_mangle ipt_REJECT
nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack binfmt_misc iptable_filter snd_hda_codec_hdmi hid_sensor_als hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_incl_3d hid_sensor_rotation hid_sensor_accel_3d hid_sensor_trigger hid_sensor_iio_common industrialio_triggered_buffer kfifo_buf industrialio rtsx_pci_sdmmc mmc_core iTCO_wdt wmi_bmof arc4 x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel uvcvideo joydev wacom videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_sensor_hub videodev btusb btrtl hid_multitouch btbcm media btintel rtsx_pci i915 bluetooth snd_hda_codec_conexant lpc_ich snd_hda_codec_generic mfd_core iwlmvm iosf_mbi i2c_algo_bit ecdh_generic
drm_kms_helper mac80211 snd_hda_intel syscopyarea snd_hda_codec sysfillrect sysimgblt snd_hda_core snd_pcm_oss iwlwifi fb_sys_fops thinkpad_acpi snd_mixer_oss drm nvram snd_pcm video cfg80211 intel_gtt snd_timer rfkill snd evdev wmi ecryptfs nfsd ip_tables x_tables ipv6 crc_ccitt
CPU: 2 PID: 1 Comm: systemd Not tainted 4.14.12-kvm-00437-gd6765c06f03d #4
Hardware name: LENOVO 20CD0035GE/20CD0035GE, BIOS GQET40WW (1.20 ) 11/07/2014
task: ffff9c66560e0d00 task.stack: ffffbc6a00038000
RIP: 0010:__task_pid_nr_ns+0x57/0xc0
RSP: 0018:ffffbc6a0003bdb0 EFLAGS: 00010246
RAX: ffff9c66560e8680 RBX: 0000000000000000 RCX: 0000000000206cc8
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000004d0
RBP: 0000000000000000 R08: ffffffffb0237b10 R09: 0000000000000005
R10: ffffbc6a0003bee0 R11: ffff9c65aa33c004 R12: ffffffffb02309a0
R13: 0000000000001000 R14: ffff9c65ecbd4a00 R15: ffff9c6624516b00
FS: 0000767a01669980(0000) GS:ffff9c665f280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000206ccc CR3: 0000000215476003 CR4: 00000000001606e0
Call Trace:
cgroup_procs_show+0x10/0x30
seq_read+0x30c/0x3d0
__vfs_read+0x2e/0x150
vfs_read+0x84/0x110
SyS_read+0x4d/0xc0
do_syscall_64+0x5c/0x190
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x767a00fa671d
RSP: 002b:00007ffca8edc6e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 000057d4d8a02c10 RCX: 0000767a00fa671d
RDX: 0000000000001000 RSI: 000057d4d8a05320 RDI: 0000000000000083
RBP: 0000000000000d68 R08: 0000767a01265178 R09: 0000000000001010
R10: 000057d4d8a03490 R11: 0000000000000293 R12: 0000767a01261440
R13: 0000767a01260900 R14: 00000000ffffffff R15: 0000000000000000
Code: 74 0d 48 8d 44 6d 00 48 8d 3c c5 d0 04 00 00 48 8b 9b 98 04 00 00 48 01 fb 48 8b 0b 48 85 c9 74 37 41 8b b4 24 30 08 00 00 31 db <3b> 71 04 77 0d 48 c1 e6 05 48 01 f1 4c 3b 61 38 74 0c e8 12 db
RIP: __task_pid_nr_ns+0x57/0xc0 RSP: ffffbc6a0003bdb0
CR2: 0000000000206ccc
---[ end trace ce7578070732b5ee ]---
BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
IP: pids_free+0xb/0x30
PGD 0 P4D 0
Oops: 0000 [#2] PREEMPT SMP PTI
Modules linked in: uinput veth ip_vti ip_tunnel esp4 xfrm6_mode_tunnel fuse ccm xt_CHECKSUM tun bridge stp llc xfrm_user xfrm_algo ebtable_filter twofish_generic twofish_avx_x86_64 ebtables twofish_x86_64_3way twofish_x86_64 twofish_common vxlan ip6_udp_tunnel udp_tunnel serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic devlink blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic algif_skcipher camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 ablk_helper camellia_x86_64 xcbc openvswitch nf_nat_ipv6 md4 algif_hash af_alg cmac rfcomm bnep xt_policy nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat msr nf_nat_ipv4 nf_nat xt_TCPMSS iptable_mangle ipt_REJECT
nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack binfmt_misc iptable_filter snd_hda_codec_hdmi hid_sensor_als hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_incl_3d hid_sensor_rotation hid_sensor_accel_3d hid_sensor_trigger hid_sensor_iio_common industrialio_triggered_buffer kfifo_buf industrialio rtsx_pci_sdmmc mmc_core iTCO_wdt wmi_bmof arc4 x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel uvcvideo joydev wacom videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_sensor_hub videodev btusb btrtl hid_multitouch btbcm media btintel rtsx_pci i915 bluetooth snd_hda_codec_conexant lpc_ich snd_hda_codec_generic mfd_core iwlmvm iosf_mbi i2c_algo_bit ecdh_generic
drm_kms_helper mac80211 snd_hda_intel syscopyarea snd_hda_codec sysfillrect sysimgblt snd_hda_core snd_pcm_oss iwlwifi fb_sys_fops thinkpad_acpi snd_mixer_oss drm nvram snd_pcm video cfg80211 intel_gtt snd_timer rfkill snd evdev wmi ecryptfs nfsd ip_tables x_tables ipv6 crc_ccitt
CPU: 2 PID: 1 Comm: systemd Tainted: G D 4.14.12-kvm-00437-gd6765c06f03d #4
Hardware name: LENOVO 20CD0035GE/20CD0035GE, BIOS GQET40WW (1.20 ) 11/07/2014
task: ffff9c66560e0d00 task.stack: ffffbc6a00038000
RIP: 0010:pids_free+0xb/0x30
RSP: 0018:ffffbc6a0003bdd8 EFLAGS: 00010297
RAX: 0000000000000000 RBX: 000000000000000a RCX: 000000000000000a
RDX: 000000000000000a RSI: 000000000000000c RDI: ffff9c6624516b00
RBP: ffff9c6624516b00 R08: 0000000000000000 R09: 0000000000000000
R10: ffff9c65bf8a8510 R11: ffff9c6656003800 R12: ffffffffb02387e0
R13: ffff9c662ac6d590 R14: ffff9c66534cc7a0 R15: ffff9c6625d5f1e0
FS: 0000000000000000(0000) GS:ffff9c665f280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000b0 CR3: 000000008220a006 CR4: 00000000001606e0
Call Trace:
cgroup_free+0x57/0xd0
__put_task_struct+0x38/0x130
cgroup_procs_release+0x12/0x20
kernfs_fop_release+0x82/0x90
__fput+0x9d/0x220
task_work_run+0x84/0xa0
do_exit+0x2b1/0xab0
rewind_stack_do_exit+0x17/0x20
Code: c7 e8 6a fd ff ff 48 8b 80 b0 00 00 00 48 83 b8 b0 00 00 00 00 75 e7 f3 c3 0f 1f 80 00 00 00 00 48 8b 87 88 07 00 00 48 8b 40 50 <48> 83 b8 b0 00 00 00 00 74 19 48 89 c7 e8 33 fd ff ff 48 8b 80
RIP: pids_free+0xb/0x30 RSP: ffffbc6a0003bdd8
CR2: 00000000000000b0
---[ end trace ce7578070732b5ef ]---
Fixing recursive fault but reboot is needed!
------------[ cut here ]------------
WARNING: CPU: 2 PID: 1 at kernel/rcu/tree_plugin.h:329 rcu_note_context_switch+0x27/0x350
Modules linked in: uinput veth ip_vti ip_tunnel esp4 xfrm6_mode_tunnel fuse ccm xt_CHECKSUM tun bridge stp llc xfrm_user xfrm_algo ebtable_filter twofish_generic twofish_avx_x86_64 ebtables twofish_x86_64_3way twofish_x86_64 twofish_common vxlan ip6_udp_tunnel udp_tunnel serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic devlink blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic algif_skcipher camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 ablk_helper camellia_x86_64 xcbc openvswitch nf_nat_ipv6 md4 algif_hash af_alg cmac rfcomm bnep xt_policy nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat msr nf_nat_ipv4 nf_nat xt_TCPMSS iptable_mangle ipt_REJECT
nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack binfmt_misc iptable_filter snd_hda_codec_hdmi hid_sensor_als hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_incl_3d hid_sensor_rotation hid_sensor_accel_3d hid_sensor_trigger hid_sensor_iio_common industrialio_triggered_buffer kfifo_buf industrialio rtsx_pci_sdmmc mmc_core iTCO_wdt wmi_bmof arc4 x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel uvcvideo joydev wacom videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_sensor_hub videodev btusb btrtl hid_multitouch btbcm media btintel rtsx_pci i915 bluetooth snd_hda_codec_conexant lpc_ich snd_hda_codec_generic mfd_core iwlmvm iosf_mbi i2c_algo_bit ecdh_generic
drm_kms_helper mac80211 snd_hda_intel syscopyarea snd_hda_codec sysfillrect sysimgblt snd_hda_core snd_pcm_oss iwlwifi fb_sys_fops thinkpad_acpi snd_mixer_oss drm nvram snd_pcm video cfg80211 intel_gtt snd_timer rfkill snd evdev wmi ecryptfs nfsd ip_tables x_tables ipv6 crc_ccitt
CPU: 2 PID: 1 Comm: systemd Tainted: G D 4.14.12-kvm-00437-gd6765c06f03d #4
Hardware name: LENOVO 20CD0035GE/20CD0035GE, BIOS GQET40WW (1.20 ) 11/07/2014
task: ffff9c66560e0d00 task.stack: ffffbc6a00038000
RIP: 0010:rcu_note_context_switch+0x27/0x350
RSP: 0018:ffffbc6a0003be58 EFLAGS: 00010002
RAX: 0000000000000001 RBX: ffff9c66560e0d00 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffffffafff992f RDI: ffffffffaffb7ead
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000365
R10: 0000000000000086 R11: 0000000000000000 R12: ffff9c665f29fbc0
R13: ffff9c66560e0d00 R14: ffff9c66560e12a8 R15: 000000000001fbc0
FS: 0000000000000000(0000) GS:ffff9c665f280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000b0 CR3: 000000008220a006 CR4: 00000000001606e0
Call Trace:
__schedule+0x84/0x6f0
schedule+0x37/0x90
do_exit+0x8c2/0xab0
rewind_stack_do_exit+0x17/0x20
Code: 00 00 00 00 41 56 41 55 41 54 55 89 fd 53 65 48 8b 1c 25 00 4d 01 00 e8 48 da ff ff 40 84 ed 8b 83 f8 02 00 00 75 7d 85 c0 7e 7d <0f> ff 80 bb fc 02 00 00 00 0f 84 89 00 00 00 e8 c5 ca ff ff e8
---[ end trace ce7578070732b5f0 ]---
INFO: rcu_preempt detected stalls on CPUs/tasks:
Tasks blocked on level-0 rcu_node (CPUs 0-7): P1
(detected by 2, t=60002 jiffies, g=551687, c=551686, q=11683)
systemd D 0 1 0 0x80080002
Call Trace:
? __schedule+0x292/0x6f0
schedule+0x37/0x90
do_exit+0x8c2/0xab0
rewind_stack_do_exit+0x17/0x20
systemd D 0 1 0 0x80080002
Call Trace:
? __schedule+0x292/0x6f0
schedule+0x37/0x90
do_exit+0x8c2/0xab0
rewind_stack_do_exit+0x17/0x20

The crash does not happen with plain 4.14.11, but when this patch (*) is
included it happens with 4.14.1[12], and 4.14.12 plus the following set
of patches from the current 4.14 stable-queue:

x86-mm-set-modules_end-to-0xffffffffff000000.patch
x86-mm-map-cpu_entry_area-at-the-same-place-on-4-5-level.patch
x86-kaslr-fix-the-vaddr_end-mess.patch
(*) x86-events-intel-ds-use-the-proper-cache-flush-method-for-mapping-ds-buffers.patch
x86-tlb-drop-the-_gpl-from-the-cpu_tlbstate-export.patch
x86-alternatives-add-missing-n-at-end-of-alternative-inline-asm.patch
x86-pti-rename-bug_cpu_insecure-to-bug_cpu_meltdown.patch
kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
mm-mprotect-add-a-cond_resched-inside-change_pmd_range.patch
mm-sparse.c-wrong-allocation-for-mem_section.patch
userfaultfd-clear-the-vma-vm_userfaultfd_ctx-if-uffd_event_fork-fails.patch
btrfs-fix-refcount_t-usage-when-deleting-btrfs_delayed_nodes.patch
efi-capsule-loader-reinstate-virtual-capsule-mapping.patch
crypto-n2-cure-use-after-free.patch
crypto-chacha20poly1305-validate-the-digest-size.patch
crypto-pcrypt-fix-freeing-pcrypt-instances.patch
crypto-chelsio-select-crypto_gf128mul.patch
drm-i915-disable-dc-states-around-gmbus-on-glk.patch
drm-i915-apply-display-wa-1183-on-skl-kbl-and-cfl.patch
sunxi-rsb-include-of-based-modalias-in-device-uevent.patch
fscache-fix-the-default-for-fscache_maybe_release_page.patch

Thanks,

Thomas