Re: [PATCH] procfs: Do not release pid_ns->proc_mnt too early

From: Louis Rilling
Date: Fri Jun 18 2010 - 04:19:50 EST


On 17/06/10 23:20 +0200, Oleg Nesterov wrote:
> On 06/16, Louis Rilling wrote:
> >
> > Detached tasks are not seen by zap_pid_ns_processes()->sys_wait4(), so
> > that release_task()->proc_flush_task() of container init can be called
> > before it is for some detached tasks in the namespace.
> >
> > Pin proc_mnt's in copy_process(), so that proc_flush_task() becomes safe
> > whatever the ordering of tasks.
>
> I must have missed something, but can't we just move mntput() ?

See the log of the commit introducing pid_ns_release_proc() (6f4e6433):

Sice the namespace holds the vfsmnt, vfsmnt holds the superblock and the
superblock holds the namespace we must explicitly break this circle to destroy
all the stuff. This is done after the init of the namespace dies. Running a
few steps forward - when init exits it will kill all its children, so no
proc_mnt will be needed after its death.

Thanks,

Louis

>
> Oleg.
>
> --- x/kernel/pid_namespace.c
> +++ x/kernel/pid_namespace.c
> @@ -110,6 +110,9 @@ static void destroy_pid_namespace(struct
> {
> int i;
>
> + if (ns->proc_mount)
> + mntput(ns->proc_mount);
> +
> for (i = 0; i < PIDMAP_ENTRIES; i++)
> kfree(ns->pidmap[i].page);
> kmem_cache_free(pid_ns_cachep, ns);
> --- x/fs/proc/base.c
> +++ x/fs/proc/base.c
> @@ -2745,10 +2745,6 @@ void proc_flush_task(struct task_struct
> proc_flush_task_mnt(upid->ns->proc_mnt, upid->nr,
> tgid->numbers[i].nr);
> }
> -
> - upid = &pid->numbers[pid->level];
> - if (upid->nr == 1)
> - pid_ns_release_proc(upid->ns);
> }
>
> static struct dentry *proc_pid_instantiate(struct inode *dir,
>

--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes

Attachment: signature.asc
Description: Digital signature