[3.8 Regression] backporting "[PATCH stable pre 3.9] mm, gup: close FOLL MAP_PRIVATE race"

From: Brian Norris
Date: Fri Oct 21 2016 - 02:39:50 EST


(Preface: this wasn't a clean backport, I'm a bit under the weather, and
it's getting late here. So forgive me if my head's not on straight.)

Hi,

I'm not sure the best way to report this, but the Chrome OS test
infrastructure noticed some problems when testing the following patch
backported to our 3.8 kernels:

http://www.spinics.net/lists/stable/msg147998.html

Specifically (if you can hold your nose and stand Gerrit), this change:

https://chromium-review.googlesource.com/#/c/401041/

I believe this is partly because there were some context differences in
applying to 3.8 (where FOLL_NUMA was added), whereas Michal claimed
support for 3.0, 3.2, and 3.4. At any rate, I thought I'd post my
findings here, in case anyone is going through the same troubles.

I see problems simply by running `gdb -ex run -ex quit /path/to/a.out`,
where a.out can be anything. I just use the following program:

---
int main()
{
return 0;
}
---

This quickly triggers a spinlock bug:

[ 935.064578] BUG: spinlock already unlocked on CPU#1, gdb/30697
[ 935.064600] lock: 0xffff8801434611e8, .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1
[ 935.064610] Pid: 30697, comm: gdb Tainted: G WC 3.8.11 #1
[ 935.064617] Call Trace:
[ 935.064629] [<ffffffffba9f4f33>] spin_dump+0x93/0x98
[ 935.064638] [<ffffffffba9f4f5e>] spin_bug+0x26/0x28
[ 935.064647] [<ffffffffba9f512b>] do_raw_spin_unlock+0x33/0x82
[ 935.064657] [<ffffffffbacc9fe8>] _raw_spin_unlock+0xe/0x10
[ 935.064668] [<ffffffffba8d9441>] follow_page+0x29e/0x2e0
[ 935.064677] [<ffffffffba8dab8a>] __get_user_pages+0x296/0x3ee
[ 935.064686] [<ffffffffba8dad24>] get_user_pages+0x42/0x44
[ 935.064695] [<ffffffffba8dadb4>] __access_remote_vm+0x8e/0x1c6
[ 935.064704] [<ffffffffba8db285>] access_process_vm+0x4e/0x68
[ 935.064714] [<ffffffffba83e6c7>] generic_ptrace_pokedata+0x22/0x31
[ 935.064724] [<ffffffffba83e758>] ptrace_request+0x82/0x427
[ 935.064733] [<ffffffffbacc9f91>] ? _raw_spin_lock+0xe/0x10
[ 935.064742] [<ffffffffba85b092>] ? task_rq_unlock+0x22/0x27
[ 935.064753] [<ffffffffba85e254>] ? wait_task_inactive+0xa6/0x144
[ 935.064766] [<ffffffffba8f5d60>] ? fsnotify_access+0x5a/0x61
[ 935.064778] [<ffffffffba80c864>] arch_ptrace+0x1a4/0x1b2
[ 935.064789] [<ffffffffba83e603>] sys_ptrace+0xcc/0x108
[ 935.064800] [<ffffffffbaccadc2>] system_call_fastpath+0x16/0x1b
[ 935.065893] traps: gdb[30697] trap int3 ip:7f4729e66fe1 sp:7ffcc0e2a278 error:0

... our crash reporter goes crazy for a few seconds, and eventually the soft lockup detector kicks in ...

[ 947.773250] BUG: soft lockup - CPU#3 stuck for 11s! [a.out:30699]
[ 947.773257] Modules linked in: uinput i2c_dev memconsole rfcomm snd_hda_codec_hdmi aesni_intel xts aes_x86_64 lrw gf128mul ablk_helper snd_hda_codec_ca0132 cryptd snd_hda_intel isl29018(C) snd_hda_codec industrialio snd_hwdep snd_pcm snd_page_alloc fuse zram(C) zsmalloc(C) smsc95xx nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer smsc75xx usbnet ath9k_btcoex ath9k_common_btcoex ath9k_hw_btcoex ath mac80211 cfg80211 ath3k btusb btrtl btbcm btintel bluetooth uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core joydev
[ 947.773394] CPU 3
[ 947.773399] Pid: 30699, comm: a.out Tainted: G WC 3.8.11 #1
[ 947.773407] RIP: 0010:[<ffffffffba9ef408>] [<ffffffffba9ef408>] delay_tsc+0x19/0x4a
[ 947.773421] RSP: 0000:ffff880130bb7af8 EFLAGS: 00000203
[ 947.773427] RAX: 000000001d8ee08a RBX: 0000000000000000 RCX: 000000001d8ee04a
[ 947.773433] RDX: 000000000000018d RSI: 0000000000000003 RDI: 0000000000000001
[ 947.773440] RBP: ffff880130bb7af8 R08: 0000000000000000 R09: ffff88014f1fef90
[ 947.773446] R10: 0000000000000004 R11: 000000000000003e R12: 0000000000011d40
[ 947.773454] R13: ffff880130bb7ce4 R14: ffffffffba865dbc R15: ffff880130bb7b20
[ 947.773462] FS: 0000000000000000(0000) GS:ffff88014f380000(0000) knlGS:0000000000000000
[ 947.773469] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 947.773474] CR2: 00007fd5dceba048 CR3: 000000003b00c000 CR4: 00000000001407e0
[ 947.773481] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 947.773488] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 947.773495] Process a.out (pid: 30699, threadinfo ffff880130bb6000, task ffff88011bca2d00)
[ 947.773501] Stack:
[ 947.773505] ffff880130bb7b08 ffffffffba9ef399 ffff880130bb7b30 ffffffffba9f506b
[ 947.773518] 0000555555554000 ffff88011dce0210 0000000000000000 ffff880130bb7b40
[ 947.773531] ffffffffbacc9f91 ffff880130bb7c10 ffffffffba8d8307 0000000000000000
[ 947.773544] Call Trace:
[ 947.773552] [<ffffffffba9ef399>] __delay+0xf/0x11
[ 947.773560] [<ffffffffba9f506b>] do_raw_spin_lock+0xad/0xfb
[ 947.773569] [<ffffffffbacc9f91>] _raw_spin_lock+0xe/0x10
[ 947.773577] [<ffffffffba8d8307>] unmap_single_vma+0x2c8/0x5c7
[ 947.773585] [<ffffffffba8d90c4>] unmap_vmas+0x3a/0x49
[ 947.773593] [<ffffffffba8de3f5>] exit_mmap+0x6f/0x135
[ 947.773602] [<ffffffffba854a7c>] ? hrtimer_try_to_cancel+0x95/0xb5
[ 947.773612] [<ffffffffba830ba5>] mmput+0x49/0xdf
[ 947.773619] [<ffffffffba836f79>] do_exit+0x364/0x8fc
[ 947.773626] [<ffffffffbacc9334>] ? __schedule+0x589/0x5d3
[ 947.773634] [<ffffffffba83821b>] do_group_exit+0x42/0xb0
[ 947.773643] [<ffffffffba84535f>] get_signal_to_deliver+0x541/0x560
[ 947.773651] [<ffffffffba84448a>] ? do_send_sig_info+0x74/0x98
[ 947.773659] [<ffffffffba8017d7>] do_signal+0x43/0x533
[ 947.773666] [<ffffffffba8edc1c>] ? kfree+0xb0/0xe3
[ 947.773675] [<ffffffffba8ff865>] ? final_putname+0x34/0x37
[ 947.773682] [<ffffffffba8edafe>] ? kmem_cache_free+0x8a/0xc5
[ 947.773689] [<ffffffffba8edafe>] ? kmem_cache_free+0x8a/0xc5
[ 947.773696] [<ffffffffba801cf0>] do_notify_resume+0x29/0x5b
[ 947.773704] [<ffffffffbaccafc8>] int_signal+0x12/0x17
[ 947.773710] Code: 44 00 00 55 48 8d 3c bf 48 89 e5 e8 ae ff ff ff 5d c3 0f 1f 44 00 00 55 48 89 e5 65 8b 34 25 1c a0 00 00 66 66 90 0f ae e8 0f 31 <89> c1 66 66 90 0f ae e8 0f 31 89 c0 48 c1 e2 20 48 09 c2 89 d0
[ 947.773839] Kernel panic - not syncing: softlockup: hung tasks
[ 947.773846] Pid: 30699, comm: a.out Tainted: G WC 3.8.11 #1
[ 947.773851] Call Trace:
[ 947.773855] <IRQ> [<ffffffffbacc616a>] panic+0xd2/0x1d3
[ 947.773868] [<ffffffffba88b729>] watchdog_timer_fn+0x123/0x145
[ 947.773875] [<ffffffffba88b606>] ? __touch_watchdog+0x25/0x25
[ 947.773883] [<ffffffffba8547f7>] __run_hrtimer+0x95/0x148
[ 947.773890] [<ffffffffba85508d>] hrtimer_interrupt+0xe1/0x1d6
[ 947.773899] [<ffffffffbaccc4b8>] smp_apic_timer_interrupt+0x77/0x8a
[ 947.773907] [<ffffffffbaccb8ca>] apic_timer_interrupt+0x6a/0x70
[ 947.773912] <EOI> [<ffffffffba9ef408>] ? delay_tsc+0x19/0x4a
[ 947.773924] [<ffffffffba9ef399>] __delay+0xf/0x11
[ 947.773932] [<ffffffffba9f506b>] do_raw_spin_lock+0xad/0xfb
[ 947.773940] [<ffffffffbacc9f91>] _raw_spin_lock+0xe/0x10
[ 947.773947] [<ffffffffba8d8307>] unmap_single_vma+0x2c8/0x5c7
[ 947.773955] [<ffffffffba8d90c4>] unmap_vmas+0x3a/0x49
[ 947.773962] [<ffffffffba8de3f5>] exit_mmap+0x6f/0x135
[ 947.773970] [<ffffffffba854a7c>] ? hrtimer_try_to_cancel+0x95/0xb5
[ 947.773978] [<ffffffffba830ba5>] mmput+0x49/0xdf
[ 947.773985] [<ffffffffba836f79>] do_exit+0x364/0x8fc
[ 947.773993] [<ffffffffbacc9334>] ? __schedule+0x589/0x5d3
[ 947.774000] [<ffffffffba83821b>] do_group_exit+0x42/0xb0
[ 947.774008] [<ffffffffba84535f>] get_signal_to_deliver+0x541/0x560
[ 947.774016] [<ffffffffba84448a>] ? do_send_sig_info+0x74/0x98
[ 947.774023] [<ffffffffba8017d7>] do_signal+0x43/0x533
[ 947.774030] [<ffffffffba8edc1c>] ? kfree+0xb0/0xe3
[ 947.774037] [<ffffffffba8ff865>] ? final_putname+0x34/0x37
[ 947.774044] [<ffffffffba8edafe>] ? kmem_cache_free+0x8a/0xc5
[ 947.774051] [<ffffffffba8edafe>] ? kmem_cache_free+0x8a/0xc5
[ 947.774059] [<ffffffffba801cf0>] do_notify_resume+0x29/0x5b
[ 947.774066] [<ffffffffbaccafc8>] int_signal+0x12/0x17
[ 947.774076] Kernel Offset: 0x39800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 947.777734] gsmi: Log Shutdown Reason 0x02

Sorry I don't have any more analysis than that so far. I just wanted to
get this out there in case anyone was grabbing this patch naively like I
was.

Regards,
Brian