Re: + fix-procfs-task-exe-symlink.patch added to -mm tree

From: Oleg Nesterov
Date: Tue Jan 29 2008 - 06:34:11 EST


s/mm-commits/lkml/

On 01/28, Matt Helsley wrote:
>
> On Sun, 2008-01-27 at 19:25 +0300, Oleg Nesterov wrote:
> >
> > Why? Linux doesn't allow sys_mmap(MAP_EXECUTABLE), so any VM_EXECUTABLE vma
> > should refer to the same bprm->file which was mapped by elf_map(), no?
>
> You're right. So in this case the kernel wouldn't find any VMA marked
> VM_EXECUTABLE and it would return with -ENOENT. The same would happen
> with this patch since we drop the extra file reference once the
> VM_EXECTUABLE VMAs disappear.

OK, thanks. This leads to another question which I forgot to ask.

This patch has a lot of complications because it tries to preserve the
current behaviour: we release the bprm->file when all VM_EXECTUABLE vmas
are unmapped. Q: is this so important/useful? I don't think this is very
common case, and I don't quite understand why it is critical to release
the file. To unmount fs after starting the app? One can always copy
the file before execing, or do "/lib/ld-linus.so application" and then
unmap the vmas.

(I am not arguing, just curious).

> > I don't understand why do we need ->exe_file_lock. Afaics, all callers of
> > added_exe_file_vma/removed_exe_file_vma must hold ->mmap_sem, yes? But this
> > means get_mm_exe_file() can use down_read(mm->map_sem). No?
>
> Yes, I could get the task's ->mmap_sem there too and reuse the mmap_sem
> rather than add a lock. That allows nearly any task to grab another
> tasks mmap_sem simply by doing a readlink on /proc/pid/exe. So I thought
> avoiding reuse of the mmap_sem might be best.
>
> Do you still think it would be better to reuse mmap_sem?

Well, we only need down_read(mmap_sem) for the very short time. /proc/pid/maps
is much "worse" in this sense.

> > > @@ -409,6 +410,7 @@ void mmput(struct mm_struct *mm)
> > > if (atomic_dec_and_test(&mm->mm_users)) {
> > > exit_aio(mm);
> > > exit_mmap(mm);
> > > + set_mm_exe_file(mm, NULL);
> >
> > This change looks unneeded. exit_mmap() removes all vmas. The last VM_EXECUTABLE
> > vma should clear ->exe_file via removed_exe_file_vma() ?
>
> You're right -- it's redundant. I'll fix that.

Sorry, I was wrong.

mmput() has to release ->exe_file if it is called when exec fails before the
first do_mmmap(MAP_EXECUTABLE). This also means that it is not completely
trivial to set ->exe_file before exec_mmap(), it can fail. This is solvable,
but I'm not sure we should do this.

Still, the accounting looks a little bit fragile to me. flush_old_exec()
increments ->f_count but sets ->num_exe_file_vmas = 0 because we know that
the next elf_map() will bump ->num_exe_file_vmas and thus "sync" 2 counters.
But I don't see how to do better if we really want to release the file when
VM_EXECUTABLE disappears.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/