Re: [PATCH 27/53] perf/core: Put size of a sample at the end of it by PERF_SAMPLE_TAILSIZE

From: Wangnan (F)
Date: Tue Jan 12 2016 - 23:34:56 EST




On 2016/1/13 3:56, Alexei Starovoitov wrote:
On Tue, Jan 12, 2016 at 08:36:23PM +0800, Wangnan (F) wrote:
hmm, in this kernel patch I see that you're adding 8 bytes for
every record via this extra TAILSISZE flag and in perf you're
walking the ring buffer backwards by reading this 8 byte
sizes, comparing header sizes and so on until reaching beginning,
where you start dumping it as normal.
So for this 'signal to perf' approach to work the ring buffer
will contain tailsizes everywhere just so that user space can
find the beginning. That's not very pretty. imo if kernel
can do header read to adjust data_tail it would make user
space side clean. May be there are other solutions.
Adding tailsize seems like brute force hack.
There must be some nicer way.
Hi Peter,

What's your opinion? Should we reconsider moving size field from header the
end?
Or moving whole header to the end of a record?
I think moving the whole header under new TAILHEADER flag is
actually very good idea. The ring buffer will be fully utilized
and no extra bytes necessary. User space would need to parse it
backwards, but for this use case it fits well.

I have another crazy suggestion: can we make kernel writing to
the ring buffer from the end to the beginning? For example:

This is the initial state of the ring buffer, head pointer
pointes to the end of it:

-------------> Address increase

head
|
V
+--+---+-------+----------+------+---+
| |
+--+---+-------+----------+------+---+


Write the first event at the end of the ring buffer, and *decrease*
the head pointer:

head
|
V
+--+---+-------+----------+------+---+
| | A |
+--+---+-------+----------+------+---+


Another record:
head
|
V
+--+---+-------+----------+------+---+
| | B | A |
+--+---+-------+----------+------+---+


Ring buffer rewind, A is fully overwritten and B is broken:

head
|
V
+--+---+-------+----------+-----+----+
|F | E | D | C | ... | F |
+--+---+-------+----------+-----+----+

At this time user can parse the ring buffer normally from
F to C. From timestamp in it he know which one is the
oldest.

By this perf don't need too much extra work to do. There's no
performance penalty at all, and the 8 bytes are saved.

Thought?

Thank you.