Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch

From: Milton Miller
Date: Fri Feb 09 2007 - 14:47:34 EST

Next message: Greg KH: "Re: [stable] [patch 00/50] -stable review"
Previous message: Andrew Morton: "Re: [patch 3/3] ext2: use perform_write aop"
In reply to: Arnd Bergmann: "Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch"
Next in thread: Maynard Johnson: "Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updatedpatch"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Feb 9, 2007, at 1:10 PM, Arnd Bergmann wrote:

On Friday 09 February 2007 19:47, Milton Miller wrote:
On Feb 8, 2007, at 11:21 AM, Arnd Bergmann wrote:

Doing the translation in two stages in user space, as you
suggest here, definitely makes sense to me. I think it
can be done a little simpler though:

Why would you need the accurate dcookie information to be
provided by the kernel? The ELF loader is done in user
space, and the kernel only reproduces what it thinks that
came up with. If the kernel only gives the dcookie information
about the SPU ELF binary to the oprofile user space, then
that can easily recreate the same mapping.

Actually, I was trying to cover issues such as anonymous
memory. If the kernel doesn't generate dcookies for
the load segments the information is not recoverable once
the task exits. This would also allow the loader to create
an artifical elf header that covered both the base executable
and a dynamicly linked one.

Other alternatives exist, such as a structure for context-id
that would have its own identifing magic and an array of elf
header pointers.

But _why_ do you want to solve that problem? we don't have
dynamically linked binaries and I really don't see why the loader
would want to create artificial elf headers...

I'm explainign how they could be handled. Actully I think
the other proposal (an identified structure that points to
sevear elf headers) would be more approprate. As you point
out, if there are presently no dynamic libraries in use, it
doesn't have to be solved today. I'm just trying to make
the code future proof, or at least a clear path forward.

The kernel still needs to provide the overlay identifiers
though.

Yes, which means it needs to parse the header (or libpse
be enhanced to write the monitor info to a spufs file).

we thought about this in the past and discarded it because of
the complexity of an spufs interface that would handle this
correctly.

Not sure what would be difficuult, and it would allow other
binary formats. But parsing the headers in the kernel
means existing userspace doesn't have to be upgraded, so I
am not proposing this requirement.

yes, this sounds nice. But tt does not at all help accuracy,
only performance, right?

It allows the user space to know when the sample was taken
and be aware of the ambiguity. If the sample rate is
high enough in relation to the overlay switch rate, user space
could decide to discard the ambiguous samples.

yes, good point.

This approach allows multiple objects by its nature. A new
elf header could be constructed in memory that contained
the union of the elf objects load segments, and the tools
will magically work. Alternatively the object id could
point to a new structure, identified via a new header, that
it points to other elf headers (easily differentiated by the
elf magic headers). Other binary formats, including several
objects in a ar archive, could be supported.

Yes, that would be a new feature if the kernel passed dcookie
information for every section, but I doubt that it is worth
it. I have not seen any program that allows loading code
from more than one ELF file. In particular, the ELF format
on the SPU is currently lacking the relocation mechanisms
that you would need for resolving spu-side symbols at load
time

Actually, It could check all load segments, and only report
those where the dcookie changes (as opposed to the offset).

I'm not really following you here, but probably you misunderstood
my point as well.

I was thinking in terms of dyanmic libraries, and totally skipped
your comment about the relocaiton info being missing. My reply
point was that the table could be compressed to the current
entry if all hased to the same vm area.

This seems to incur a run-time overhead on the SPU even if not
profiling, I would consider that not acceptable.

It definitely is overhead. Which means it would have to be
optional, like gprof.

There is some work going on for another profiler independent
of the hardware feature that only relies on instrumenting the
spu executable for things like DMA transfers and overlay
changes.

Regardless, its beyond the current scope.

milton

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Greg KH: "Re: [stable] [patch 00/50] -stable review"
Previous message: Andrew Morton: "Re: [patch 3/3] ext2: use perform_write aop"
In reply to: Arnd Bergmann: "Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch"
Next in thread: Maynard Johnson: "Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updatedpatch"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]