Re: [PATCH v2] do_exit(): Make sure we run with get_fs() ==USER_DS.

From: Andrew Morton
Date: Wed Dec 01 2010 - 19:30:24 EST


On Wed, 1 Dec 2010 11:50:32 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> wrote:

> > If a user manages to trigger an oops with fs set to KERNEL_DS, fs is not
> > otherwise reset before do_exit(). do_exit may later (via mm_release in fork.c)
> > do a put_user to a user-controlled address, potentially allowing a user to
> > leverage an oops into a controlled write into kernel memory.
> >
> > A more logical place to put this might be when we know an oops has occurred,
> > before we call do_exit(), but that would involve changing every architecture, in
> > multiple places. Let's just stick it in do_exit instead.
> >
> > Signed-off-by: Nelson Elhage <nelhage@xxxxxxxxxxx>
> > ---
> > kernel/exit.c | 8 ++++++++
> > 1 files changed, 8 insertions(+), 0 deletions(-)
> >
> > diff --git a/kernel/exit.c b/kernel/exit.c
> > index 21aa7b3..68899b3 100644
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -914,6 +914,14 @@ NORET_TYPE void do_exit(long code)
> > if (unlikely(!tsk->pid))
> > panic("Attempted to kill the idle task!");
> >
> > + /*
> > + * If do_exit is called because this processes oopsed, it's possible
> > + * that get_fs() was left as KERNEL_DS, so reset it to USER_DS before
> > + * continuing. This is relevant at least for clearing clear_child_tid in
> > + * mm_release.
> > + */
> > + set_fs(USER_DS);
>
> "This is relevant" is no good explanation ;)
> Please recognize this is tricky code and Please consider to write more
> careful and looooong comments.

I've seen worse comments. And occasionally none at all :)

Is this better?

--- a/kernel/exit.c~do_exit-make-sure-we-run-with-get_fs-==-user_ds-fix
+++ a/kernel/exit.c
@@ -917,8 +917,9 @@ NORET_TYPE void do_exit(long code)
/*
* If do_exit is called because this processes oopsed, it's possible
* that get_fs() was left as KERNEL_DS, so reset it to USER_DS before
- * continuing. This is relevant at least for clearing clear_child_tid in
- * mm_release.
+ * continuing. Amongst other possible reasons, this is to prevent
+ * mm_release()->clear_child_tid() from writing to a user-controlled
+ * kernel address.
*/
set_fs(USER_DS);

_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/