Re: [RFC][PATCH 04/20] pspace: Allow multiple instaces of the processid namespace

From: Kirill Korotaev
Date: Fri Feb 10 2006 - 15:27:54 EST


Eric,

All my commments are inline below.

This patch modifies the fork/exit, signal handling, and pid and
process group manipulating syscalls to support multiple process
spaces, and implements the data for allow multiple instaces of the pid
namespace.

[ ... skipped .... ]

+extern struct pspace init_pspace;
+
+#define INVALID_PID 0x7fffffff
<<<< what is it for?

+
+static inline int pspace_task_visible(struct pspace *pspace, struct task_struct *tsk)
+{
+ return (tsk->pspace == pspace) || + ((tsk->pspace->child_reaper.pspace == pspace) &&
+ (tsk->pspace->child_reaper.task == tsk));
<<< the logic with child_reaper which can be somehow partly inside pspace... and this check is not that abvious.

Actually I can't say your patch is cleaner somehow.
It is very big and most of the changes are trivial, which creates an illusion that it is straightforward and clean.

[ ... skipped .... ]

@@ -788,6 +801,16 @@ fastcall NORET_TYPE void do_exit(long co
panic("Attempted to kill the idle task!");
if (unlikely(is_init(tsk)))
panic("Attempted to kill init!");
+
+ /*
+ * If we are the pspace leader it is nonsense for the pspace
+ * to continue so kill everyone else in the pspace.
+ */
+ if (pspace_leader(tsk)) {
+ tsk->pspace->flags = PSPACE_EXIT;
+ kill_pspace_info(SIGKILL, (void *)1, tsk->pspace);
+ }
+
<<<<

1.
flags are neither atomic nor protected with any lock.
So if it will be used later for something else in future, there will be 100% race. You also assigns them here, while everywhere in other places a bit is checked.

2. due to 1) you code is buggy. in this respect do_exit() is not serialized with copy_process().

3. due to the same 1) reason
> + kill_pspace_info(SIGKILL, (void *)1, tsk->pspace);
can miss a task being forked. Bang!!!

4.
So you are effectively inserting a code for cleaning up pspace here and the same is actually required for other subsystems like networking/IPC etc.
I think you suppose that other resources are freed when the last reference is dropped, but to tell the truth this is a way to deadlocks. Because refs are put in too many places, where you can't make a real cleanup due to locking etc. You can't for example call syncronize_net() from bh, which is required for network cleanup.

Just an another argument why containers are easier/better and why you will eventually end up with it.

if (tsk->io_context)
exit_io_context();

[ ... skipped ... ]

@@ -1147,11 +1150,55 @@ retry:
}
/*
+ * kill_pspace_info() sends a signal to all processes in a process space.
+ * This is what kill(-1, sig) does.
+ */
+
+int __kill_pspace_info(int sig, struct siginfo *info, struct pspace *pspace)
+{
+ struct task_struct *p = NULL;
+ int retval = 0, count = 0;
+
+ for_each_process(p) {
+ int err;
+ /* Skip the current pspace leader */
+ if (current_pspace_leader(p))
+ continue;
+
+ /* Skip the sender of the signal */
+ if (p->signal == current->signal)
+ continue;
+
+ /* Skip processes outside the target process space */
+ if (!in_pspace(pspace, p))
+ continue;
+
+ /* Finally it is a good process send the signal. */
+ err = group_send_sig_info(sig, info, p);
+ ++count;
+ if (err != -EPERM)
+ retval = err;
<<<<
why EPERM is ok?
do you want to miss some tasks?
+ }
+ return count ? retval : -ESRCH;
+}
+

Thanks,
Kirill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/