>Hi Andrea,
>On Mon, 29 Mar 1999, Andrea Arcangeli wrote:
>
>> On Mon, 29 Mar 1999, Tigran Aivazian wrote:
>>
>> >which would enforce "Von Neumann execution stream", e.g. by doing CPUID
>>
>> What is a Von Neumann execution stream? ;)
>P6 architecture (PPro, PII etc.) introduce speculative execution, i.e. if
>you for example try to "profile" fdiv by putting a couple of rdtsc before
>and after you will be told that fdiv took 0 cycles which is obviously not
>true (I wish it was :). This happens because the processor decides that
>the second rdtsc is independent from the fdiv and executes it first. So,
>one needs to serialize it somehow and the easiest way I know of doing it
>is cpuid (but one needs to remember that it clobbers registers).
Ah ok, the right thing to do is to add 0 at the stack pointer as wmb()
does.
The point is that you should do that in the caller if you want that
behavior.
barrier();
get_cycles();
barrier();
will be equivalent to your __volatile__. There's to say that barrier
will also flush the register set while only using volatile would preserve
it making a better profiling, but it depends on what you have to profile...
>some other purpose. Putting __volatile__ does not make the current usage
>of get_cycles() any worse so why not, if it gives you extra choice?
The compiler could have register pressure a bit before your rdtsc and I
think that reordering it could allow the compiler in some case to save
some access to memory. It's sure not a critical thing but the point is
that get_cycles() as it is used now, it _doesn't_ need __volatile__
according to me.
>I personally use it to count the number of cycles it takes for a
>particular code path (i.e. without having to enable profiling globally). I
That's a different usage!!
As first thing get_cycles() is fine right now and there's no bug.
Currently get_cycles() is used only to know delta times between
two schedule(). And the delta will be the _same_ even if rdtsc is
reordered. Do you see my point now? This was the offset and the delta I
was talking about in my previous email.
The point you are talking about is that if you will use get_cycles()
around a piece of code to profile it, you have also to add an mb() around
get_cycles().
Andrea Arcangeli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/