RE: [PATCH] [RFC] Limit dump_pipe program's permission to init for container

From: Zhao Lei
Date: Tue Jul 12 2016 - 10:18:27 EST


Hi, StÃphane Graber

Many thanks for your review!

> -----Original Message-----
> From: StÃphane Graber [mailto:stgraber@xxxxxxxxxx]
> Sent: Friday, July 08, 2016 11:26 PM
> To: Zhao Lei <zhaolei@xxxxxxxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx; containers@xxxxxxxxxxxxxxxxxxxxxxxxxx; Eric W.
> Biederman <ebiederm@xxxxxxxxxxxx>
> Subject: Re: [PATCH] [RFC] Limit dump_pipe program's permission to init for
> container
>
> On Fri, Jul 08, 2016 at 07:08:10PM +0800, Zhao Lei wrote:
> > Currently when we set core_pattern to a pipe, the pipe program is
> > forked by kthread running with root's permission, and write dumpfile
> > into host's filesystem.
> > Same thing happened for container, the dumper and dumpfile are also
> > in host(not in container).
> >
> > It have following program:
> > 1: Not consistent with file_type core_pattern
> > When we set core_pattern to a file, the container will write dump
> > into container's filesystem instead of host.
> > 2: Not safe for privileged container
> > In a privileged container, user can destroy host system by following
> > command:
> > # # In a container
> > # echo "|/bin/dd of=/boot/vmlinuz" >/proc/sys/kernel/core_pattern
> > # make_dump
> >
> > This patch switch dumper program's permission to init task, it is to say,
> > for container, dumper program have same permission with init task in
> > container, which make dumper program put in container's filesystem, and
> > write coredump into container's filesystem.
> > The dumper's permission is also limited into subset of container's init
> > process.
> >
> > See following discussion for detail:
> > http://www.gossamer-threads.com/lists/linux/kernel/2455363
> > http://www.gossamer-threads.com/lists/linux/kernel/2397602
> >
> > Todo:
> > 1: Set different core_pattern to host and each container.
> > 2: Keep compatibility with current design
> > All above are done in previous version of this patch,
> > need some restruct.
> >
> > Any suggestion is welcome for this patch.
>
>
> Ok, so those two todo items are I think absolute musts before we can
> consider any of this.
>
> Changing the default behavior that we have had ever since pipe in
> core_pattern and namespaces have existed would cause serious problem for
> existing users.
>
>
> Speaking for myself, LXC does setup limitations on privileged containers
> which prevent them from changing the core_pattern so that it can't be
> abused for privileged containers.
> The core_pattern on Ubuntu then sends all crashes to apport, as root, on
> the host. Apport is then container aware and handles forwarding the
> crash (through a unix socket) to the apport process inside the
> container.
>
> Your suggested change would likely make this all fail as apport on the
> host must be called as root to be able to access the container's
> filesystem (through /proc/<PID>/root).
>
>
> With namespacing in place, then we wouldn't need the apport relay trick
> on the host, so that'd be fine.
>
Yes, keeping compatibility with existed container program
is important, we can not break these tools.

The compatibility problem is discussed in previous impliment of this
patch before, from:
http://www.gossamer-threads.com/lists/linux/kernel/2399347#2399347
to
http://www.gossamer-threads.com/lists/linux/kernel/2400776#2400776

We get this way in discusstion: Use new behavior(dump into container)
only if user re-set core_pattern in container.
And for lxc, because core_pattern is set by host, the dump will
keeping old behavior.

It is already done in the previous patch, and it can be move to
this patch, I'll start it.

> And otherwise, making this behavior optional (different pattern or
> something) would be fine too (not breaking existing users).
>
Yes, it is a good idea, it give us another selection for
compatibility problem.

Thanks
Zhaolei

> >
> > Suggested-by: Eric W. Biederman <ebiederm@xxxxxxxxxxxx>
> > Suggested-by: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>
> >
> > Signed-off-by: Zhao Lei <zhaolei@xxxxxxxxxxxxxx>
> > ---
> > fs/coredump.c | 84
> ++++++++++++++++++++++++++++++++++++++++++++++++-
> > include/linux/binfmts.h | 1 +
> > 2 files changed, 84 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/coredump.c b/fs/coredump.c
> > index 281b768..92dc343 100644
> > --- a/fs/coredump.c
> > +++ b/fs/coredump.c
> > @@ -516,6 +516,8 @@ static int umh_pipe_setup(struct subprocess_info
> *info, struct cred *new)
> > {
> > struct file *files[2];
> > struct coredump_params *cp = (struct coredump_params *)info->data;
> > + struct task_struct *base_task;
> > +
> > int err = create_pipe_files(files, 0);
> > if (err)
> > return err;
> > @@ -524,10 +526,78 @@ static int umh_pipe_setup(struct subprocess_info
> *info, struct cred *new)
> >
> > err = replace_fd(0, files[0], 0);
> > fput(files[0]);
> > + if (err)
> > + return err;
> > +
> > /* and disallow core files too */
> > current->signal->rlim[RLIMIT_CORE] = (struct rlimit){1, 1};
> >
> > - return err;
> > + base_task = cp->base_task;
> > + if (base_task) {
> > + const struct cred *base_cred;
> > +
> > + /* Set fs_root to base_task */
> > + spin_lock(&base_task->fs->lock);
> > + set_fs_root(current->fs, &base_task->fs->root);
> > + spin_unlock(&base_task->fs->lock);
> > +
> > + /* Set namespaces to base_task */
> > + switch_task_namespaces(current, base_task->nsproxy);
> > +
> > + /* Set cgroup to base_task */
> > + current->flags &= ~PF_NO_SETAFFINITY;
> > + err = cgroup_attach_task_all(base_task, current);
> > + if (err < 0)
> > + return err;
> > +
> > + /* Set cred to base_task */
> > + base_cred = get_task_cred(base_task);
> > +
> > + new->uid = base_cred->uid;
> > + new->gid = base_cred->gid;
> > + new->suid = base_cred->suid;
> > + new->sgid = base_cred->sgid;
> > + new->euid = base_cred->euid;
> > + new->egid = base_cred->egid;
> > + new->fsuid = base_cred->fsuid;
> > + new->fsgid = base_cred->fsgid;
> > +
> > + new->securebits = base_cred->securebits;
> > +
> > + new->cap_inheritable = base_cred->cap_inheritable;
> > + new->cap_permitted = base_cred->cap_permitted;
> > + new->cap_effective = base_cred->cap_effective;
> > + new->cap_bset = base_cred->cap_bset;
> > + new->cap_ambient = base_cred->cap_ambient;
> > +
> > + security_cred_free(new);
> > +#ifdef CONFIG_SECURITY
> > + new->security = NULL;
> > +#endif
> > + err = security_prepare_creds(new, base_cred, GFP_KERNEL);
> > + if (err < 0) {
> > + put_cred(base_cred);
> > + return err;
> > + }
> > +
> > + free_uid(new->user);
> > + new->user = base_cred->user;
> > + get_uid(new->user);
> > +
> > + put_user_ns(new->user_ns);
> > + new->user_ns = base_cred->user_ns;
> > + get_user_ns(new->user_ns);
> > +
> > + put_group_info(new->group_info);
> > + new->group_info = base_cred->group_info;
> > + get_group_info(new->group_info);
> > +
> > + put_cred(base_cred);
> > +
> > + validate_creds(new);
> > + }
> > +
> > + return 0;
> > }
> >
> > void do_coredump(const siginfo_t *siginfo)
> > @@ -590,6 +660,7 @@ void do_coredump(const siginfo_t *siginfo)
> >
> > if (ispipe) {
> > int dump_count;
> > + struct task_struct *vinit_task;
> > char **helper_argv;
> > struct subprocess_info *sub_info;
> >
> > @@ -631,6 +702,12 @@ void do_coredump(const siginfo_t *siginfo)
> > goto fail_dropcount;
> > }
> >
> > + vinit_task = find_task_by_vpid(1);
> > + if (!vinit_task) {
> > + printk(KERN_WARNING "failed getting init task info, skipping
> core dump\n");
> > + goto fail_dropcount;
> > + }
> > +
> > helper_argv = argv_split(GFP_KERNEL, cn.corename, NULL);
> > if (!helper_argv) {
> > printk(KERN_WARNING "%s failed to allocate memory\n",
> > @@ -638,6 +715,10 @@ void do_coredump(const siginfo_t *siginfo)
> > goto fail_dropcount;
> > }
> >
> > + get_task_struct(vinit_task);
> > +
> > + cprm.base_task = vinit_task;
> > +
> > retval = -ENOMEM;
> > sub_info = call_usermodehelper_setup(helper_argv[0],
> > helper_argv, NULL, GFP_KERNEL,
> > @@ -646,6 +727,7 @@ void do_coredump(const siginfo_t *siginfo)
> > retval = call_usermodehelper_exec(sub_info,
> > UMH_WAIT_EXEC);
> >
> > + put_task_struct(vinit_task);
> > argv_free(helper_argv);
> > if (retval) {
> > printk(KERN_INFO "Core dump to |%s pipe failed\n",
> > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > index 314b3ca..0c9a72c 100644
> > --- a/include/linux/binfmts.h
> > +++ b/include/linux/binfmts.h
> > @@ -59,6 +59,7 @@ struct linux_binprm {
> >
> > /* Function parameter for binfmt->coredump */
> > struct coredump_params {
> > + struct task_struct *base_task;
> > const siginfo_t *siginfo;
> > struct pt_regs *regs;
> > struct file *file;
> > --
> > 1.8.5.1
> >
> >
> >
> > _______________________________________________
> > Containers mailing list
> > Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
> > https://lists.linuxfoundation.org/mailman/listinfo/containers
>
> --
> StÃphane Graber
> Ubuntu developer
> http://www.ubuntu.com