Re: FSGSBASE ABI considerations

From: Stas Sergeev
Date: Mon Aug 07 2017 - 18:14:12 EST


07.08.2017 19:20, Andy Lutomirski ÐÐÑÐÑ:
I think
this is the half-step. It clearly shows that you don't want
such state to ever exist, but why not to go a step further
and just make the bases to be reset not only by any
unrelated modify_ldt() call, but always on schedule?
You can state that using wrgsbase on non-zero selector
is invalid, reset it to LDT state and maybe send a signal
to the program so that it knows it did something wrong.
This may sound too rough, but I really don't see how it
differs from resetting all LDT bases on some unrelated
modify_ldt() that was done for read, not write.
Or you may want to reset selector to 0 rather than
base to LDT.
Windows does something sort of like this (I think), but I don't like
this solution. I fully expect that someone will write a program that
does:

old = rdgsbase();
wrgsbase(new);
call_very_fast_function();
wrgsbase(old);

This will work if GS == 0, which is fine. The problem is that it will
*also* work if GS != 0 with very high probability, especially if this
code sequence is right after some operation that sleeps. And then
we'll get random crashes with very low probability, depending on where
the scheduler hits.
So, as Linus already pointed, if the fixup is to
zero out the selector, then this will still work fine.


I am far from the kernel development so my thoughts
may be naive, but IMHO you should just disallow this
by some means (like by doing a fixup on schedule() and
sending a signal). No one will suffer, people will just
write 0 to segreg first. Note that such a problem can
be provoked by the fact that the sighandler does not
reset the segregs to their default values, and someone
may simply forget to reset it to 0. You need to remind
him to do so rather than to invent the tricky code to
do something theoretically correct.
I would *love* to disallow it. The problem is that I don't believe it
to be possible in a way that doesn't cause more problems than it
solves.
I wonder if sending a signal (after doing a fixup)
is too much of a punishment?

I'm trying to avoid a situation where we implement that policy and the
interaction with modify_ldt() becomes very strange.
IMHO if you do the fixup on schedule (like setting
the selector to zero), then the interaction with
modify_ldt() is completely avoided, i.e. modify_ldt()
should then never special-case the threads that
did wrgsbase. So if something inconsistent comes
out, then it was likely there already without wrgsbase.