Re: [PATCH 0/2] Fix debugfs bind mount regression

From: Serge Hallyn
Date: Wed Mar 09 2016 - 16:18:42 EST


Quoting Eric W. Biederman (ebiederm@xxxxxxxxxxxx):
> Seth Forshee <seth.forshee@xxxxxxxxxxxxx> writes:
>
> > Some full-OS container software bind mounts debugfs into containers to
> > satisfy the assumptions of older userspaces which expect to be able to
> > mount debugfs. This regressed in 4.1 due to the addition of tracefs,
> > which gets automounted in the tracing subdirectory of debugfs. In a
> > cloned mount namespace the bind mount now fails because the tracefs
> > mount is a locked child of the debugfs mount.
> >
> > For new mounts we already make an exception to the "locked child mount"
> > rule. Directories in psuedo filesystems created for the sole purpose of
> > being mountpoints are created as permanently empty directories which can
> > never contain any entries, therefore the kernel can know than any mounts
> > on these directories are not for security purposes. These mounts are
> > then excluded from locked mount tests in some circumstances.
> >
> > The same logic clearly applies to directories created in
> > debugfs_create_automount(). The following patches update this function
> > to create permanently empty directories for mountpoints and adds an
> > exclusion to the tests for bind mounts to exclude child mounts on
> > permanently empty directories.
>
> So I don't know that this approach is bad. However in reading through
> your patch descriptions I do not see any consideration of using
> "mount --rbind" instead of "mount --bind". AKA adding the MS_REC flag
> to your bind mount.
>
> I would think simply using MS_REC would solve this problem, without
> needing any additional kernel support. Am I missing something?

That's what we're doing to work around it fwiw, but it would be nice to
not have to.