Re: [patch 4/7] Immediate Values - i386 Optimization

From: Mathieu Desnoyers
Date: Wed Sep 19 2007 - 12:04:01 EST


* Mathieu Desnoyers (mathieu.desnoyers@xxxxxxxxxx) wrote:
> * Jeremy Fitzhardinge (jeremy@xxxxxxxx) wrote:
> > H. Peter Anvin wrote:
> > > Allowing different registers should be doable, but if so, one would have
> > > to put 0: at the *end* of the instruction and use (0f)-4 instead, since
> > > the non-%eax forms are one byte longer.
> > >
> >
> > OK, that's already a problem since its using "=r" as the constraint.
> >
> > > This also seems "safer", since an imm32 is always the last thing in the
> > > instruction.
> >
> > Good idea. If gas/gcc generates entirely the wrong addressing mode,
> > then we've got bigger problems.
> >
>
> Ok, let's have a good look at what we want:
>
> 1 - get a pointer to the beginning of the immediate value within the
> instruction.
> 2 - make sure that the immediate value, within the instruction, is
> written to atomically wrt all CPUs, even on older architectures
> where non aligned writes are not atomic.
>
> Effectively, placing a label at the end of the instruction, and then
> offsetting backward from there, will give us (1).
>
> Then, for the alignment, we have to give a good look at the instruction
> set reference, mov instruction, to see what the variants of mov
> immediate value to register are on i386.
>
> First, let's look at the possible prefixes:
> - Lock and repeat prefixes : no
> - Segment override prefixes: no memory reference there, so doesn't
> apply.
> - Branch hints : no, it's a mov instruction
> - Operand-size override prefix: *yes*, can be used for 2 bytes mov
> - Address-size override prefix: no address in there, only immediate
> value and register.
>
> (looking at the Compat/Leg Mode)
>
> 3 cases:
>
> * 1 byte
> B0 + rb MOV r8, imm8 (1 byte opcode)
> REX + B0 + rb MOV r8, imm8 (only on 64 bits archs, never generated)
> C6 /0 MOV r/m8, imm8 (2 bytes opcode)
> (this one doesn't require alignment at all, since we do a 1 byte write)
>
> * 2 bytes
> B8 + rw MOV r16, imm16 (1 byte opcode)
> 66 B8 + rd MOV r16, imm16 (2 bytes opcode) (with 66H prefix)
> C7 /0 MOV r/m16, imm16 (2 bytes opcode)
> (Alignment on 4 bytes boundaries would be required because of the
> possible 1 byte opcode ? Or is the 66H prefix mandatory there ? If it
> is, then we can safely align on 2 bytes boundaries.)
>
> * 4 bytes
> B8 + rd MOV r32, imm32 (1 byte opcode)
> C7 /0 MOV r/m32, imm32 (2 bytes opcode)
> (the 2 bytes opcode can be a problem)
>
> I have missed anything ?
>

This is not a problem in practice because we place a breakpoint and use
a bypass when we are updating immediate values that can be used
concurrently. So let's not bother about that too much. It just means
that the breakpoint approach might have to be used for more
architectures in the future to address that kind of issue.

Mathieu


--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/