Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

From: Andy Lutomirski
Date: Wed Jul 08 2015 - 15:59:55 EST


On Wed, Jul 8, 2015 at 12:39 PM, Brian Gerst <brgerst@xxxxxxxxx> wrote:
> On Wed, Jul 8, 2015 at 3:14 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>> On Wed, Jul 8, 2015 at 12:05 PM, Brian Gerst <brgerst@xxxxxxxxx> wrote:
>>> On Wed, Jul 8, 2015 at 1:30 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>>> On Wed, Jul 8, 2015 at 9:59 AM, Linus Torvalds
>>>> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>>>>> On Tue, Jul 7, 2015 at 7:33 PM, Arjan van de Ven <arjan@xxxxxxxxxxxxxxx> wrote:
>>>>>>
>>>>>> if this patch would not be acceptable, at minimum we need some sort of "off
>>>>>> by default
>>>>>> unless the sysadmin flips a sysfs thing", which is really just a huge hack.
>>>>>
>>>>> The only thing that matters is whether people use this or not.
>>>>>
>>>>
>>>> I think that the world contains precisely two programs that use the
>>>> vm86 syscalls. One is dosemu, and one is a test case I wrote. (There
>>>> are probably some exploits written by other people that I don't know
>>>> about. Certainly Spender has been patching vm86 for long enough that
>>>> he must have an exploit or two up his sleeve.)
>>>>
>>>> As far as I can tell (and I'll try to test this better for real later
>>>> this week), dosemu already knows how to emulate real mode if vm86 is
>>>> unavailable. So it's unclear that turning off the vm86 syscalls
>>>> actually breaks anything whatsoever.
>>>>
>>>> On the other hand, sys_vm86 fails if the syscall slow path is in use.
>>>> That means that quite a few Fedora versions (auditing), anything with
>>>> ptrace, seccomp (before 3.16 IIRC), and anything with context tracking
>>>> is probably actually *improved* by turning off the vm86 syscalls even
>>>> for dosemu users.
>>>>
>>>> And apparently Ubuntu has had CONFIG_VM86 disabled forever.
>>>>
>>>> IOW, vm86 really is broken.
>>>>
>>>>> If people use vm86 mode, we can't just disable it. It's that simple.
>>>>> "It's poorly maintained" isn't an argument for removal. Only "nobody
>>>>> cares" works as an argument for that.
>>>>>
>>>>> My suspicion is that people still do use vm86 mode, but who knows..
>>>>> Quite frankly, rather than disable it, I'd much rather see people who
>>>>> modify low-level x86 code (yes, that means you, Luto) *test* it. If
>>>>> you aren't willign to test the modifications you make, I don't think
>>>>> those modifications should be merged, regardless of how nice a cleanup
>>>>> they are.
>>>>
>>>> I tried to test it. As far as I know, my changes in -tip have no
>>>> effect on vm86, and the changes I'm planning on sending this week will
>>>> make it work better. I still thing that Linux users should have it
>>>> configured out or deleted altogether. Especially people who care at
>>>> all about security.
>>>>
>>>> It's easy to try the easy case (run from tools/testing/selftests/x86)
>>>> -- this is v4.2-rc1, but most recent versions should be identical:
>>>>
>>>> $ ./entry_from_vm86_32
>>>> [RUN] #BR from vm86 mode
>>>> [OK] Exited vm86 mode due to #BR
>>>> [RUN] SYSENTER from vm86 mode
>>>> [OK] Exited vm86 mode due to unhandled GP fault
>>>>
>>>> $ strace -e vm86 ./entry_from_vm86_32
>>>> [RUN] #BR from vm86 mode
>>>> vm86(0x1, 0xbfa50fcc, 0xbfa50fcc, 0x80488bb, 0x1000) = -1 ENOSYS
>>>> (Function not implemented)
>>>> [OK] Exited vm86 mode due to type 0, arg 0
>>>> [RUN] SYSENTER from vm86 mode
>>>> vm86(0x1, 0xbfa50fcc, 0xbfa50fcc, 0x80488bb, 0x1000) = -1 ENOSYS
>>>> (Function not implemented)
>>>> [OK] Exited vm86 mode due to type 0, arg 0
>>>>
>>>> It only says "[OK]" because my test case isn't careful enough. That's
>>>> a failure. I suspect it was a much worse failure a couple versions
>>>> ago before my ENOSYS-reworking patch went in.
>>>>
>>>> Replace "-e vm86" with "-e write" and be puzzled. The failure mode is
>>>> really pretty bad.
>>>>
>>>> This only tests easy stuff. The integration between vm86 and fault
>>>> handling is truly awful and I don't even know how to approach testing
>>>> it. I'd probably have to run twenty or thirty old real-mode games to
>>>> even exercise those code paths.
>>>>
>>>> I'll try to confirm later this week that dosemu can really handle real
>>>> mode without sys_vm86.
>>>
>>> None of these issues are unfixable. As I said before, many of them
>>> can be resolved if vm86 is changed to use the normal syscall/exception
>>> exit paths. Give me a few days to finish off that patch set.
>>>
>>
>> I look forward to it.
>>
>> However: I imagine that, if you do this, you may need to be quite
>> careful about an x86_32-ism. Currently, if you have a pt_regs pointer
>> for the current entry and user_mode(regs) returns true, then regs ==
>> current_pt_regs(). If you let user mode run with EFLAGS.VM set with
>> the normal tss.sp0, then this will no longer be true, as the
>> extra-long entry-from-v8086 frame will shift pt_regs by a few bytes.
>> I don't know whether this matters, but I can imagine it causing
>> do_signal to explode. *shudder*
>
> I am aware that pt_regs is in a fixed location on the stack. What I
> plan to do is increase the padding at the top of the stack if VM86 is
> configured, to reserve space for the extra segment registers. Then it
> will move tss.sp0 up 16 bytes when entering vm86 mode so that the
> longer IRET frame is in the right place.
>

Hmm, should work.

I wonder if the right way to do this is to set a TIF_VM86 flag and do
the fixups in enter_from_user_mode and prepare_return_to_usermode.
See the patches I just sent (and tip/x88/asm, which they apply to).

Without something like that, we'll be in the awkward position of
having some of the selectors (DS, ES, FS, and GS) in both the normal
pt_regs slot and in the extended hardware frame during execution of
normal vm86-unaware kernel code. If, on the other hand, we copied the
selectors across in enter_from_user_mode and
prepare_return_from_usermode, then pt_regs would work normally even
for tasks that are running in v8086 mode.

regs->flags & X86_EFLAGS_VM will be true regardless, so all of the asm
that decides to invoke those helpers should work fine.

--Andy

--Andy

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/