Re: [PATCH v7 09/26] x86/insn-eval: Add utility function to identify string instructions

From: Ricardo Neri
Date: Tue Jun 06 2017 - 02:01:45 EST


On Mon, 2017-05-29 at 23:48 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:07AM -0700, Ricardo Neri wrote:
> > String instructions are special because in protected mode, the linear
> > address is always obtained via the ES segment register in operands that
> > use the (E)DI register.
>
> ... and DS for rSI.

Right, I omitted this in the commit message.
>
> If we're going to account for both operands of string instructions with
> two operands.
>
> Btw, LODS and OUTS use only DS:rSI as a source operand. So we have to be
> careful with the generalization here. So if ES:rDI is the only seg. reg
> we want, then we don't need to look at those insns... (we assume DS by
> default).

My intention with this function is to write a function that does only
one thing: identify string instructions, irrespective of the operands
they use. A separate function, resolve_seg_register, will have the logic
to decide what to segment register to use based on the registers used as
operands, whether we are looking at a string instruction, whether we
have segment override prefixes and whether such overrides should be
ignored.

If I was to leave out string instructions from this function it should
be renamed as is_string_instruction_non_lods_outs. In my opinion this
separation makes the code more clear and I would end up having logic to
decide which segment register to use in two places. Does it makes sense
to you?

>
> ...
>
> > +/**
> > + * is_string_instruction - Determine if instruction is a string instruction
> > + * @insn: Instruction structure containing the opcode
> > + *
> > + * Return: true if the instruction, determined by the opcode, is any of the
> > + * string instructions as defined in the Intel Software Development manual.
> > + * False otherwise.
> > + */
> > +static bool is_string_instruction(struct insn *insn)
> > +{
> > + insn_get_opcode(insn);
> > +
> > + /* all string instructions have a 1-byte opcode */
> > + if (insn->opcode.nbytes != 1)
> > + return false;
> > +
> > + switch (insn->opcode.bytes[0]) {
> > + case INSB:
> > + /* fall through */
> > + case INSW_INSD:
> > + /* fall through */
> > + case OUTSB:
> > + /* fall through */
> > + case OUTSW_OUTSD:
> > + /* fall through */
> > + case MOVSB:
> > + /* fall through */
> > + case MOVSW_MOVSD:
> > + /* fall through */
> > + case CMPSB:
> > + /* fall through */
> > + case CMPSW_CMPSD:
> > + /* fall through */
> > + case STOSB:
> > + /* fall through */
> > + case STOSW_STOSD:
> > + /* fall through */
> > + case LODSB:
> > + /* fall through */
> > + case LODSW_LODSD:
> > + /* fall through */
> > + case SCASB:
> > + /* fall through */
>
> That "fall through" for every opcode is just too much. Also, you can use
> the regularity of the x86 opcode space and do:
>
> case 0x6c ... 0x6f: /* INS/OUTS */
> case 0xa4 ... 0xa7: /* MOVS/CMPS */
> case 0xaa ... 0xaf: /* STOS/LODS/SCAS */
> return true;
> default:
> return false;
> }
>
> And voila, there's your compact is_string_insn() function! :^)

Thanks for the suggestion! It looks really nice. I will implement
accordingly.

Thanks and BR,
Ricardo