Re: fs/coda oops bisected to (925b9cd1b8) "locking/rwsem: Make owner store task pointer of last owni

From: Jan Harkes
Date: Tue Apr 02 2019 - 15:17:10 EST


On Sun, Mar 31, 2019 at 03:13:47PM -0400, Jan Harkes wrote:
> On Sun, Mar 31, 2019 at 02:14:13PM -0400, Waiman Long wrote:
> > One possibility is that there is a previous reference to the memory
> > currently occupied by the spinlock. If the memory location is previously
> > part of a rwsem structure and someone is still using it, you may get
> > memory corruption.
>
> Ah, I hadn't even thought of that possibility. Good, it will open up

First of all, I have to thank you for your original patch because
otherwise I probably would never have discovered that something was
seriously wrong. Your patch made the problem visible.

I ended up changing 'owner' to '_RET_IP_' and dumping the value of the
clobbered coda inode spinlock and surrounding memory and found that the
'culprit' is in ext4_filemap_fault and despite it being in ext4, it is
still a Coda specific problem.

Effectively Coda overlays other filesystems' inodes for mmap, but
the vma->vm_file still points at Coda's file. So when we use
file_inode() in ext4_filemap_fault we end up with the Coda inode instead
of the ext4 inode and when trying to grab ext4's mmap_sem we really just
scribble over the memory region that happens to contain the Coda inode
spinlock. A fix is to use vm_file->f_mapping->host instead of
file_inode(vm_file).

Of course everyone looks at ext4 as a canonical example so this problem
has spread pretty much everywhere and I'm wondering how to best resolve
this.

- change file_inode() to follow file->f_mapping->host

would fix most places, but maybe f_mapping is not always guaranteed to
point at a usable place?

- change Coda's mmap to replace vma->vm_file with the host file

we'd probably no longer get notified when the last reference to the
host file goes away, so we'd call coda_release and notify userspace on
close() even when there are still active mmap regions.

- fix every in-tree file system to use vma->vm_file->f_mapping->host.

Jan


diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 69d65d49837b..122d691d3eda 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -284,7 +284,7 @@ static vm_fault_t ext4_dax_huge_fault(struct vm_fault *vmf,
vm_fault_t result;
int retries = 0;
handle_t *handle = NULL;
- struct inode *inode = file_inode(vmf->vma->vm_file);
+ struct inode *inode = vmf->vma->vm_file->f_mapping->host;
struct super_block *sb = inode->i_sb;

/*
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index b54b261ded36..62a0025ce7f8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -6211,7 +6211,7 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
int err;
vm_fault_t ret;
struct file *file = vma->vm_file;
- struct inode *inode = file_inode(file);
+ struct inode *inode = file->f_mapping->host;
struct address_space *mapping = inode->i_mapping;
handle_t *handle;
get_block_t *get_block;
@@ -6302,7 +6302,7 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)

vm_fault_t ext4_filemap_fault(struct vm_fault *vmf)
{
- struct inode *inode = file_inode(vmf->vma->vm_file);
+ struct inode *inode = vmf->vma->vm_file->f_mapping->host;
vm_fault_t ret;

down_read(&EXT4_I(inode)->i_mmap_sem);