[RFC 0/2] Potential race condition with page lock

From: Chintan Pandya
Date: Mon Feb 11 2019 - 07:53:58 EST


In 4.14 kernel, observed following 2 BUG_ON(!PageLocked(page)) scenarios.
Both looks to be having similar cause.

Case: 1
[127823.176076] try_to_free_buffers+0xfc/0x108 (BUG_ON(), page lock was freed
somehow)
[127823.176079] jbd2_journal_try_to_free_buffers+0x15c/0x194 (page lock was
available till this function)
[127823.176083] ext4_releasepage+0xe0/0x110
[127823.176087] try_to_release_page+0x68/0x90 (page lock was available till
this function)
[127823.176090] invalidate_inode_page+0x94/0xa8
[127823.176093] invalidate_mapping_pages_without_uidlru+0xec/0x1a4 (page lock
taken here)
...
...

Case: 2
[<ffffff9547a82fb0>] el1_dbg+0x18
[<ffffff9547bfb544>] __remove_mapping+0x160 (BUG_ON(), page lock is not
available. Some one might have
free'd that.)
[<ffffff9547bfb3c8>] remove_mapping+0x28
[<ffffff9547bf8404>] invalidate_inode_page+0xa4
[<ffffff9547bf8bcc>] invalidate_mapping_pages+0xd4 (acquired the page lock)
[<ffffff9547c7f934>] inode_lru_isolate+0x128
[<ffffff9547c1b500>] __list_lru_walk_one+0x10c
[<ffffff9547c1b3e0>] list_lru_walk_one+0x58
[<ffffff9547c7f7d4>] prune_icache_sb+0x50
[<ffffff9547c64fbc>] super_cache_scan+0xfc
[<ffffff9547bfb17c>] shrink_slab+0x304
[<ffffff9547bffb38>] shrink_node+0x254
[<ffffff9547bfd4fc>] do_try_to_free_pages+0x144
[<ffffff9547bfd2d8>] try_to_free_pages+0x390
[<ffffff9547bebb80>] __alloc_pages_nodemask+0x940
[<ffffff9547becedc>] __get_free_pages+0x28
[<ffffff9547cd6870>] proc_pid_readlink+0x6c
[<ffffff9547c7075c>] vfs_readlink+0x124
[<ffffff9547c66374>] SyS_readlinkat+0xc8
[<ffffff9547a83818>] __sys_trace_return+0x0

Both the scenarios say that current stack tried taking page lock but got
released in meantime by someone else. There could be 2 possiblities here.

1) Someone trying to update page flags and due to race condition, PG_locked
bit got cleared, unwantedly.
2) Someone else took the lock without checking if it is really locked or not
as there are explicit APIs to set PG_locked.

I didn't get traces of history for having PG_locked being set non-atomically.
I believe it could be because of performance reasons. Not sure though.

Chintan Pandya (2):
page-flags: Make page lock operation atomic
page-flags: Catch the double setter of page flags

fs/cifs/file.c | 8 ++++----
fs/pipe.c | 2 +-
include/linux/page-flags.h | 4 ++--
include/linux/pagemap.h | 6 +++---
mm/filemap.c | 4 ++--
mm/khugepaged.c | 2 +-
mm/ksm.c | 2 +-
mm/memory-failure.c | 2 +-
mm/memory.c | 2 +-
mm/migrate.c | 2 +-
mm/shmem.c | 6 +++---
mm/swap_state.c | 4 ++--
mm/vmscan.c | 2 +-
13 files changed, 23 insertions(+), 23 deletions(-)

--
2.17.1