Re: [RFC][PATCH] vfs: In mntput run deactivate_super on a shallow stack.

From: Al Viro
Date: Sun Apr 13 2014 - 17:52:59 EST


On Sun, Apr 13, 2014 at 12:51:56AM -0700, Eric W. Biederman wrote:

> btrfs_init_test_fs, and of course I was silly when I thought the module
> ref count would be useful for something before init_module succeeds.
>
> Still I suspect I was on the right track. We do have the get_fs_type,
> get_filesystem and put_filesystem. Which ought to be enough to allow
> us to convert unregister_filesystem into an appropriate barrier.

Umm... I don't think so - register_filesystem and unregister_filesystem
are only about the fs being visible in fs type list. You do *not* have
to register the sucker at all in order to be able to do kern_mount().
So the call of unregister_filesystem() is not guaranteed to happen at
all - as the matter of fact, I think we ought to stop registering any
mount_pseudo()-based ones (pipefs, etc.) and the only reason for _not_
doing that is that some scripts might be poking in /proc/filesystems.
We definitely do not want that for anything *new* that isn't mountable
from userland...

BTW, could somebody explain this: in fs/ext4/super.c we have
#define IS_EXT2_SB(sb) ((sb)->s_bdev->bd_holder == &ext2_fs_type)
What's wrong with simple ((sb)->s_type == &ext2_fs_type)? What am
I missing there? Ted?

FWIW, I have found one bug of similar sort, but it's already
present with the current semantics: init_hugetlb_fs() has
for_each_hstate(h) {
char buf[50];
unsigned ps_kb = 1U << (h->order + PAGE_SHIFT - 10);

snprintf(buf, sizeof(buf), "pagesize=%uK", ps_kb);
hugetlbfs_vfsmount[i] = kern_mount_data(&hugetlbfs_fs_type,
buf);

if (IS_ERR(hugetlbfs_vfsmount[i])) {
pr_err("hugetlb: Cannot mount internal hugetlbfs for "
"page size %uK", ps_kb);
error = PTR_ERR(hugetlbfs_vfsmount[i]);
hugetlbfs_vfsmount[i] = NULL;
}
i++;
}
/* Non default hstates are optional */
if (!IS_ERR_OR_NULL(hugetlbfs_vfsmount[default_hstate_idx]))
return 0;
followed by
kmem_cache_destroy(hugetlbfs_inode_cachep);

If some of those kern_mount_data() succeed, but not the one we really
want to have, we'll end up with kmem_cache_destroy() on non-empy cache...
Granted, it's very unlikely to happen, but it's still a bug. BTW, that
IS_ERR_OR_NULL() there looks like cargo-culting - it's ERR_PTR() is
explicitly turned into NULL a few lines above...

Another thing: does anybody seriously expect the system to survive with
every pipe(2) failing (and oopsing, actually)? If not, we probably should
just make init_pipe_fs() panic on failure... The same goes for sock_init(),
IMO...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/