Re: [PATCH 0/2] Add further ioctl() operations for namespace discovery

From: Eric W. Biederman
Date: Tue Dec 20 2016 - 15:25:39 EST


"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:

> Hello Eric,
>
> On 12/19/2016 11:53 PM, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:
>>
>>> Eric,
>>>
>>> The code proposed in this patch series is pretty small. Is there any
>>> chance we could make the 4.10 merge window, if the changes seem
>>> acceptable to you?
>>
>> I see why you are asking but I am not comfortable with aiming for
>> the merge window that is on-going and could close at any moment.
>> I have seen recenly too many patches that should work fine have
>> some odd minor issue. Like an extra _ in a label used in an ifdef
>> that resulted in memory stomps. Linus might be more brave but i would
>> rather wait until the next merge window, so I don't need to worry about
>> spoiling anyone's holidays with a typo someone over looked.
>
> I'll just gently ask if you'll reconsider and take another look at the
> patches. They patches are very small, and don't change any existing
> behavior. And if we see a problem in the next weeks they could be pulled.
> In the meantime, I'd be aiming to publicize this API somewhat, so that we
> might get some eyeballs to spot design bugs. But, I do understand your
> position, if the answer is still "not for this merge window".

My position is still not this merge window. I am more than happy to
queue up the changes for the next one. Even on the best of days there
is a reasonable chance Linus would not be happy to receive code
development done in the merge window.

I think there is also just a little bit of discussion that needs
to happen with these new userspace APIs (below). And I have seen way
too many times user space APIs added too quickly and having to be
repaired afterwards.

>> At first glance these patches seem reasonable. I don't see any problem
>> with the ioctls you have added.
>>
>> That said I have a question. Should we provide a more direct way to
>> find the answer to your question? Something like the access system
>> call?
>>
>> I think a more direct answer would be more maintainable in the long run
>> as it does not bind tools to specific implementation details in the
>> future. Which could allow us to account for LSM policies and the like.
>
> My thoughts:
>
> 1. Regarding NS_GET_NSTYPE... It always struck me as a little odd
> that you could ask setns() to check if the supplied FD referred
> to a certain type of NS (and thus, in a round about way, setns()
> gives us the same information as NS_GET_NSTYPE), but you can't
> directly ask what the NS type is. The fact that setns() has this
> facility suggests that there could be other uses for the operation
> "tell me what type of NS this FD refers to".

Yes. I have no problem with that one.

> 2. Regarding NS_GET_CREATOR_UID... There are defined rules about what
> this UID means with respect to capabilities in a namespace. It's
> not an implementation detail, as I understand it. Also in terms of
> introspecting to try to understand the structure of namespaces on
> a running system, knowing this UID is useful in and of itself.

I am not quite sold on the name NS_GET_CREATOR_UID. NS_GET_OWNER_UID
seems to match the code better. The owner is the creator but
the important part seems to be the ownership not the act of creation.

> 3. NS_GET_NSTYPE and NS_GET_CREATOR_UID solve my problem, but
> obviously your idea would make life simpler for user space.
> Am I correct to understand that you mean an API that takes
> three pieces of info: a PID, a capability, and an fd referring
> to a /proc/PID/ns/xxx, and tells us whether PID has the specified
> capability for operations in the specified namespace?

Something like that. But yes something we can wire up to
ns_capable_noaudit and be told the result. That will let the
LSMs and any future kerel changes have their say, without any extra
maintenance burden in the kernel.

What I really don't want is for userspace to start depending on the
current formula being the only factors that say if it has a capabliltiy
in a certain situation because in practice that just isn't true.
Permission checks just keep evoloving in the kernel.

Eric