Re: [Regression?] 1ea0ce4069 ("selinux: allow changing labels for cgroupfs") stops Android from booting

From: Paul Moore
Date: Tue Feb 28 2017 - 12:24:07 EST


On Tue, Feb 28, 2017 at 10:29 AM, Stephen Smalley <sds@xxxxxxxxxxxxx> wrote:
> On Mon, 2017-02-27 at 19:18 -0500, Paul Moore wrote:
>> On Mon, Feb 27, 2017 at 4:23 PM, Stephen Smalley <sds@xxxxxxxxxxxxx>
>> wrote:
>> >
>> > On Mon, 2017-02-27 at 12:48 -0800, Nick Kralevich wrote:
>> > >
>> > > On Mon, Feb 27, 2017 at 11:53 AM, Stephen Smalley <sds@xxxxxxxxxx
>> > > gov>
>> > > wrote:
>> > > >
>> > > >
>> > > > >
>> > > > >
>> > > > > I can reproduce it on angler (with a back-port of just that
>> > > > > patch),
>> > > > > although I am unclear on the cause. The patch is only
>> > > > > supposed
>> > > > > to
>> > > > > enable explicit setting of security labels by userspace on
>> > > > > cgroup
>> > > > > files, so it isn't supposed to cause any breakage under
>> > > > > existing
>> > > > > policy. Prior to the patch, the kernel would always just
>> > > > > return
>> > > > > -1
>> > > > > with errno EOPNOTSUPP upon attempts to set security labels on
>> > > > > cgroup
>> > > > > files; with the patch, the kernel may instead return -1 with
>> > > > > errno
>> > > > > EACCES if not allowed. So I suppose if userspace was
>> > > > > explicitly
>> > > > > testing for EOPNOTSUPP and not failing hard in that case, it
>> > > > > might
>> > > > > cause breakage. Not sure why existing userspace would be
>> > > > > trying
>> > > > > to
>> > > > > relabel cgroup files, unless it is just a recursive
>> > > > > restorecon
>> > > > > that
>> > > > > happens to traverse into a cgroup mount (and in that case,
>> > > > > not
>> > > > > sure
>> > > > > why
>> > > > > it would be fatal). Other possible interaction would be use
>> > > > > of
>> > > > > setfscreatecon() prior to creating a file in cgroup.
>> > > >
>> > > > Oh, I see - it is the latter.
>> > > >
>> > > > For example, init.rc does mkdir /dev/cpuctl/bg_non_interactive,
>> > > > which
>> > > > internally looks up the context for that directory from
>> > > > file_contexts
>> > > > and does a setfscreatecon() followed by a mkdir(). Previously,
>> > > > that
>> > > > was ignored because cgroup did not support anything other than
>> > > > the
>> > > > policy-defined label. But now it will try to use that label,
>> > > > which
>> > > > in
>> > > > turn will trigger a denial in enforcing mode and the create
>> > > > will
>> > > > fail.
>> > > >
>> > > > So this is an incompatible change and needs to be reverted.
>> > > > We'll need to wrap it up with a policy capability or something
>> > > > to
>> > > > allow
>> > > > it to be enabled only if the policy correctly supports
>> > > > it. Even
>> > > > better, we should instead just allow the policy to specify
>> > > > which
>> > > > filesystems should support this behavior (already on the issues
>> > > > list).
>> > > >
>> > >
>> > > If Android is the only system affected by this bug, I would
>> > > prefer to
>> > > just fix Android to allow for this patch, rather than having
>> > > additional kernel complexity.
>> >
>> > Well, it does break userspace (even if it happens to only affect
>> > Android, which isn't clear, e.g. possibly a distribution would
>> > likewise
>> > suffer breakage under a tighter policy), and we already have a
>> > long-
>> > standing open issue to replace the current set of whitelisted
>> > filesystem types with something configuration-driven. So I'm ok
>> > with
>> > reverting it and requiring it to be done in a more general
>> > way. The
>> > latter is something we want regardless.
>>
>> This went up to Linus during the current merge window via the
>> stable-4.11 branch and I know the container guys really want this so
>> I'd prefer to fix this up in 4.11 with a policy capability if
>> possible
>> (and I believe it should be). I agree with Stephen that we need a
>> better long term solution, but I think a policy capability should
>> work
>> in the short term.
>>
>> Who wants to send me a patch? ;)
>
> So, there are a couple of caveats with doing that:
>
> 1) It still requires the container folks to update their kernel,
> libsepol, and policy in order to make use of the new policy capability.

A kernel upgrade is going to be necessary regardless at this original
change just landed in Linus' tree this merge window. As far the
userspace and policy updates, I think that is a reasonable
expectation. I just want to protect from the new-kernel/old-userspace
case that appears to affect Android systems at a minimum.

> 2) The determination of whether a given mount should be assigned this
> flag is made at mount time, so you can't simply reload policy with a
> policy that defines this capability and have it automatically applied
> to existing cgroup mounts. You'd have to unmount and re-mount them
> (more likely reboot).

Yes, but once again, I don't think that is an unreasonable requirement
for new functionality.

> Not saying you can't do that, just understand what is required.

--
paul moore
www.paul-moore.com