[PATCH] allow execve'ing "/proc/self/exe" even if /proc is not mounted

From: Denys Vlasenko
Date: Wed Jun 24 2009 - 19:01:08 EST


In some circumstances running process needs to re-execute
its image.

Among other useful cases, it is _crucial_ for NOMMU arches.

They need it to perform daemonization. Classic sequence
of "fork, parent dies, child continues" can't be used
due to lack of fork on NOMMU, and instead we have to do
"vfork, child re-exec itself (with a flag to not daemonize)
and therefore unblocks parent, parent dies".

Another crucial use case on NOMMU is POSIX shell support.
Imagine a shell command of the form "func1 | func2 | func3".
This can be implemented on NOMMU by vforking thrice,
re-executing the shell in every child in the form
"<shell> -c 'body of funcN'", and letting parent wait and collect
exitcodes and such. As far as I can see, it's the only way
to implement it correctly on NOMMU.

The program may re-execute itself by name if it knows the name,
but we generally may be unsure about it. Binary may be renamed,
or even deleted while it is being run.

More elegant way is to execute /proc/self/exe.
This works just fine as long as /proc is mounted.

But it breaks if /proc isn't mounted, and this can happen in real-world
usage. For example, when shell invoked very early in initrd/initramfs.

With this patch, it is possible to execute /proc/self/exe
even if /proc is not mounted. In the below example,
./sh is a static shell binary:

# chroot . ./sh
/ # echo $0
./sh
/ # . /proc/self/exe
hush: /proc/self/exe: No such file or directory
/ # /proc/self/exe <==========
/ # echo $0
/proc/self/exe
/ # exit
/ # exit
#

On an unpatched kernel, command marked with <=== would fail.

How patch does it: when execve syscall discovers that opening of binary
image fails, a small bit of code is added to special case "/proc/self/exe"
string. If binary name is *exactly* that string, and if error is ENOENT
or EACCES, then exec will still succeed, using current binary's image.

Please apply.

Signed-off-by: Denys Vlasenko <vda.linux@xxxxxxxxxxxxxx>
--
vda
diff -urp ../linux-2.6.30.org/fs/exec.c linux-2.6.30/fs/exec.c
--- ../linux-2.6.30.org/fs/exec.c 2009-06-10 05:05:27.000000000 +0200
+++ linux-2.6.30/fs/exec.c 2009-06-25 00:20:13.000000000 +0200
@@ -652,9 +652,25 @@ struct file *open_exec(const char *name)
file = do_filp_open(AT_FDCWD, name,
O_LARGEFILE | O_RDONLY | FMODE_EXEC, 0,
MAY_EXEC | MAY_OPEN);
- if (IS_ERR(file))
- goto out;
+ if (IS_ERR(file)) {
+ if ((PTR_ERR(file) == -ENOENT || PTR_ERR(file) == -EACCES)
+ && strcmp(name, "/proc/self/exe") == 0
+ ) {
+ struct file *sv = file;
+ struct mm_struct *mm;

+ mm = get_task_mm(current);
+ if (!mm)
+ goto out;
+ file = get_mm_exe_file(mm);
+ mmput(mm);
+ if (file)
+ goto ok;
+ file = sv;
+ }
+ goto out;
+ }
+ok:
err = -EACCES;
if (!S_ISREG(file->f_path.dentry->d_inode->i_mode))
goto exit;