From: Wojciech Zmuda wzmuda@n7space.com
This patchset adds time notion to perf instruction and branch samples to allow coarse time measurement of code blocks execution.
The simplest verification is visibility of the time field in 'perf script' output:
root@zynq:~# perf record -e cs_etm/timestamp,@fe970000.etr/u -a sleep 1 Couldn't synthesize bpf events. [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.262 MB perf.data ]
root@zynq:~# perf script --ns -F cpu,comm,time perf [002] 9546.053455325: perf [002] 9546.053455340: perf [002] 9546.053455344: (...) sleep [003] 9546.060163742: sleep [003] 9546.060163754: sleep [003] 9546.060163766: (...) ntpd [001] 9546.389083194: ntpd [001] 9546.389083400: ntpd [001] 9546.389086319: (...)
The step above works only if trace has been collected in CPU-wide mode because of some perf event flags mismatch I'm working on fixing.
Timestamps in subsequent samples are monotonically increasing. The only exception are discontinuities in trace. From my understanding, we can't timestamp discontinuities reasonably, since after decoder synchronizes back after trace loss, it needs to wait for another timestamp packet. Thus, time value of such samples stays at 0x0.
Another way to access these values is to use the perf script engine, which I used for validation of the feature. The script below calculates timestamp differences of two consecutive branches sharing the same branch address. This is a simple example of execution time fluctuation detector.
from __future__ import print_function
import os import sys
sys.path.append(os.environ['PERF_EXEC_PATH'] + \ '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
from perf_trace_context import *
target_start_addr = int('4005e4', 16) # 0x4005e4 is func() from listing below
branch = dict() branch['from'] = 0 branches = []
def process_event(s): global branch global branches
sample = s['sample']
branch['cpu'] = sample['cpu']
if not branch['from']: branch['from'] = sample['addr'] branch['ts'] = sample['time'] return
branch['to'] = sample['ip'] if not branch['to']: branch['from'] = 0 branch['ts'] = 0 return
if branch['from'] and branch['to']: branches.append(branch.copy()) branch['from'] = 0 return
def trace_end(): global branches
count = 0 timestamp_start = 0
print("Got {0} samples:".format(len(branches)))
for b in branches: if b['from'] == target_start_addr: if not timestamp_start: timestamp_start = b['ts'] continue
print("[{0}]: ts diff = 0x{1:x} - 0x{2:x} = {3:d}".format(count, b['ts'], timestamp_start, b['ts'] - timestamp_start)) count = count + 1 timestamp_start = b['ts']
The following function was traced:
static int func(int cnt) { volatile int x = 0; static int i;
x += cnt + 0xdeadbeefcafeb00b; (...) /* repeats ~100 times */
if (i++ % 3 == 0) // Every third execution is longer usleep(1000);
return x; }
root@zynq:~# perf record -m,16K -e cs_etm/timestamp,@fe970000.etr/u \ --filter 'filter func @./program \ --per-thread ./program Couldn't synthesize bpf events. CTRL+C me when you find appropriate. ^C[ perf record: Woken up 12 times to write data ] [ perf record: Captured and wrote 0.135 MB perf.data ]
root@zynq:~# perf script -s exectime.py Got 2469 samples: [0]: ts diff = 0x92f2752e512 - 0x92f274a7ae9 = 551465 [1]: ts diff = 0x92f2752e694 - 0x92f2752e512 = 386 [2]: ts diff = 0x92f2752e817 - 0x92f2752e694 = 387 [3]: ts diff = 0x92f275bef12 - 0x92f2752e817 = 591611 [4]: ts diff = 0x92f275bf093 - 0x92f275bef12 = 385 [5]: ts diff = 0x92f275bf211 - 0x92f275bf093 = 382 [6]: ts diff = 0x92f276451d7 - 0x92f275bf211 = 548806 [7]: ts diff = 0x92f2764535a - 0x92f276451d7 = 387 [8]: ts diff = 0x92f276454d7 - 0x92f2764535a = 381 [9]: ts diff = 0x92f276cb256 - 0x92f276454d7 = 548223 [10]: ts diff = 0x92f276cb3d9 - 0x92f276cb256 = 387 [11]: ts diff = 0x92f276cb556 - 0x92f276cb3d9 = 381 (...)
At the listing above it is visible that every third execution of the function lasted longer than the other two. It is a naive example and could be enhanced to point to the area that caused the disruption by examining events 'in the middle' of the traced code range.
Applies cleanly on Mathieu's 5.1-rc3-cpu-wide-v3 branch.
Changes for V2: - move packet timestamping logic to decoder. Front end only uses this information to timestamp samples (as suggested by Mathieu). - leave original behaviour of CPU-wide mode, where decoder is stopped and front end is triggered about pending queue with timestamp packet. At the same time, always adjust next and current timestamp in both CPU-wide and per-thread modes (as suggested by Mathieu). - when timestamp packet is encountered, timestamp range and discontinuity packets waiting in the queue, that are not yet consumed by the front end (as suggested by Mathieu). - don't timestamp exceptions, since they are not turned into branch nor instruction samples. - fix timestamping of the last branch sample before discontinuity appears (as suggested by Leo).
Wojciech Zmuda (1): perf cs-etm: Set time value for samples
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 70 ++++++++++++++++++++----- tools/perf/util/cs-etm.c | 3 ++ tools/perf/util/cs-etm.h | 1 + 3 files changed, 61 insertions(+), 13 deletions(-)
From: Wojciech Zmuda wzmuda@n7space.com
Populate the 'time' field in branches and instruction samples generated from range and discontinuity packets. The time value is based on timestamp packets that may appear in the trace.
The time value may be accessed through perf-script built-in script engine in the sample dict 'time' field or printed in 'perf script -F time' output (the latter only in CPU-wide mode at the moment).
Signed-off-by: Wojciech Zmuda wzmuda@n7space.com --- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 70 ++++++++++++++++++++----- tools/perf/util/cs-etm.c | 3 ++ tools/perf/util/cs-etm.h | 1 + 3 files changed, 61 insertions(+), 13 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 33e975c8d11b..9263c803487f 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -269,10 +269,39 @@ cs_etm_decoder__create_etm_packet_printer(struct cs_etm_trace_params *t_params, trace_config); }
+static void +cs_etm_decoder__timestamp_pending_packets(struct cs_etm_packet_queue *packet_queue) +{ + u32 count, head; + struct cs_etm_packet *packet, *prev_packet; + + count = packet_queue->packet_count; + if (count == 0) + return; + + prev_packet = NULL; + head = packet_queue->head; + while (count--) { + head = (head + 1) & (CS_ETM_PACKET_MAX_BUFFER - 1); + + packet = &packet_queue->packet_buffer[head]; + + if (packet->sample_type != CS_ETM_RANGE && + packet->sample_type != CS_ETM_DISCONTINUITY) + continue; + + /* TODO: find better approximation of the number of instructions per cycle. */ + if (!prev_packet) + packet->timestamp = packet_queue->timestamp; + else + packet->timestamp = prev_packet->timestamp + prev_packet->instr_count; + + prev_packet = packet; + } +} + static ocsd_datapath_resp_t -cs_etm_decoder__do_soft_timestamp(struct cs_etm_queue *etmq, - struct cs_etm_packet_queue *packet_queue, - const uint8_t trace_chan_id) +cs_etm_decoder__do_soft_timestamp(struct cs_etm_packet_queue *packet_queue) { /* No timestamp packet has been received, nothing to do */ if (!packet_queue->timestamp) @@ -284,9 +313,6 @@ cs_etm_decoder__do_soft_timestamp(struct cs_etm_queue *etmq, packet_queue->next_timestamp += packet_queue->instr_count; packet_queue->instr_count = 0;
- /* Tell the front end which traceid_queue needs attention */ - cs_etm__etmq_set_traceid_queue_timestamp(etmq, trace_chan_id); - return OCSD_RESP_WAIT; }
@@ -324,6 +350,9 @@ cs_etm_decoder__do_hard_timestamp(struct cs_etm_queue *etmq, packet_queue->next_timestamp = elem->timestamp; packet_queue->instr_count = 0;
+ /* Timestamp range packets that got queued up to this point */ + cs_etm_decoder__timestamp_pending_packets(packet_queue); + /* Tell the front end which traceid_queue needs attention */ cs_etm__etmq_set_traceid_queue_timestamp(etmq, trace_chan_id);
@@ -372,6 +401,7 @@ cs_etm_decoder__buffer_packet(struct cs_etm_packet_queue *packet_queue, packet_queue->packet_buffer[et].flags = 0; packet_queue->packet_buffer[et].exception_number = UINT32_MAX; packet_queue->packet_buffer[et].trace_chan_id = trace_chan_id; + packet_queue->packet_buffer[et].timestamp = 0;
if (packet_queue->packet_count == CS_ETM_PACKET_MAX_BUFFER - 1) return OCSD_RESP_WAIT; @@ -436,10 +466,6 @@ cs_etm_decoder__buffer_range(struct cs_etm_queue *etmq,
packet->last_instr_size = elem->last_instr_sz;
- /* per-thread scenario, no need to generate a timestamp */ - if (cs_etm__etmq_is_timeless(etmq)) - goto out; - /* * The packet queue is full and we haven't seen a timestamp (had we * seen one the packet queue wouldn't be full). Let the front end @@ -449,9 +475,14 @@ cs_etm_decoder__buffer_range(struct cs_etm_queue *etmq, goto out;
packet_queue->instr_count += elem->num_instr_range; + ret = cs_etm_decoder__do_soft_timestamp(packet_queue); + if (ret == OCSD_RESP_CONT) + goto out; + /* Tell the front end we have a new timestamp to process */ - ret = cs_etm_decoder__do_soft_timestamp(etmq, packet_queue, - trace_chan_id); + packet->timestamp = packet_queue->timestamp; + if (!cs_etm__etmq_is_timeless(etmq)) + cs_etm__etmq_set_traceid_queue_timestamp(etmq, trace_chan_id); out: return ret; } @@ -460,13 +491,26 @@ static ocsd_datapath_resp_t cs_etm_decoder__buffer_discontinuity(struct cs_etm_packet_queue *queue, const uint8_t trace_chan_id) { + int ret; + struct cs_etm_packet *packet; + u64 timestamp; + /* * Something happened and who knows when we'll get new traces so * reset time statistics. + * Record the last observed timestamp, so the last branch sample + * before discontinuity can be marked with it. */ + timestamp = queue->timestamp; cs_etm_decoder__reset_timestamp(queue); - return cs_etm_decoder__buffer_packet(queue, trace_chan_id, + + ret = cs_etm_decoder__buffer_packet(queue, trace_chan_id, CS_ETM_DISCONTINUITY); + + packet = &queue->packet_buffer[queue->tail]; + packet->timestamp = timestamp; + + return ret; }
static ocsd_datapath_resp_t diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8496190ad553..84d615ae3a1a 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -204,6 +204,7 @@ static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) queue->packet_buffer[i].exception_number = UINT32_MAX; queue->packet_buffer[i].trace_chan_id = UINT8_MAX; queue->packet_buffer[i].cpu = INT_MIN; + queue->packet_buffer[i].timestamp = 0; } }
@@ -1098,6 +1099,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, sample.flags = tidq->prev_packet->flags; sample.insn_len = 1; sample.cpumode = event->sample.header.misc; + sample.time = tidq->packet->timestamp;
if (etm->synth_opts.last_branch) { cs_etm__copy_last_branch_rb(etmq, tidq); @@ -1157,6 +1159,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq, sample.cpu = tidq->packet->cpu; sample.flags = tidq->prev_packet->flags; sample.cpumode = event->sample.header.misc; + sample.time = tidq->packet->timestamp;
/* * perf report cannot handle events without a branch stack diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 1f6ec7babe70..a14b0220eaa7 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -122,6 +122,7 @@ struct cs_etm_packet { enum cs_etm_isa isa; u64 start_addr; u64 end_addr; + u64 timestamp; u32 instr_count; u32 last_instr_type; u32 last_instr_subtype;
Hi Wojciech,
On Tue, May 28, 2019 at 06:18:50PM +0000, Wojciech Żmuda wrote:
From: Wojciech Zmuda wzmuda@n7space.com
Populate the 'time' field in branches and instruction samples generated from range and discontinuity packets. The time value is based on timestamp packets that may appear in the trace.
The time value may be accessed through perf-script built-in script engine in the sample dict 'time' field or printed in 'perf script -F time' output (the latter only in CPU-wide mode at the moment).
Did checkpatch return any errors on your side?
Signed-off-by: Wojciech Zmuda wzmuda@n7space.com
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 70 ++++++++++++++++++++----- tools/perf/util/cs-etm.c | 3 ++ tools/perf/util/cs-etm.h | 1 + 3 files changed, 61 insertions(+), 13 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 33e975c8d11b..9263c803487f 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -269,10 +269,39 @@ cs_etm_decoder__create_etm_packet_printer(struct cs_etm_trace_params *t_params, trace_config); } +static void +cs_etm_decoder__timestamp_pending_packets(struct cs_etm_packet_queue *packet_queue) +{
- u32 count, head;
- struct cs_etm_packet *packet, *prev_packet;
- count = packet_queue->packet_count;
- if (count == 0)
return;
- prev_packet = NULL;
- head = packet_queue->head;
- while (count--) {
head = (head + 1) & (CS_ETM_PACKET_MAX_BUFFER - 1);
packet = &packet_queue->packet_buffer[head];
if (packet->sample_type != CS_ETM_RANGE &&
packet->sample_type != CS_ETM_DISCONTINUITY)
I don't think we should care about discontinuity packets if we haven't received a timestamp since the timestamp we just received is for range packets that appeared after the discontinuity.
continue;
/* TODO: find better approximation of the number of instructions per cycle. */
if (!prev_packet)
packet->timestamp = packet_queue->timestamp;
else
packet->timestamp = prev_packet->timestamp + prev_packet->instr_count;
prev_packet = packet;
- }
+}
static ocsd_datapath_resp_t -cs_etm_decoder__do_soft_timestamp(struct cs_etm_queue *etmq,
struct cs_etm_packet_queue *packet_queue,
const uint8_t trace_chan_id)
+cs_etm_decoder__do_soft_timestamp(struct cs_etm_packet_queue *packet_queue) { /* No timestamp packet has been received, nothing to do */ if (!packet_queue->timestamp) @@ -284,9 +313,6 @@ cs_etm_decoder__do_soft_timestamp(struct cs_etm_queue *etmq, packet_queue->next_timestamp += packet_queue->instr_count; packet_queue->instr_count = 0;
- /* Tell the front end which traceid_queue needs attention */
- cs_etm__etmq_set_traceid_queue_timestamp(etmq, trace_chan_id);
- return OCSD_RESP_WAIT;
The new cs_etm_decoder__do_soft_timestamp() should only concern itself with setting the timestamp for the packet queue and let the calling code deal with everything else. As such the return type should be a "void". More on that below.
} @@ -324,6 +350,9 @@ cs_etm_decoder__do_hard_timestamp(struct cs_etm_queue *etmq, packet_queue->next_timestamp = elem->timestamp; packet_queue->instr_count = 0;
- /* Timestamp range packets that got queued up to this point */
- cs_etm_decoder__timestamp_pending_packets(packet_queue);
- /* Tell the front end which traceid_queue needs attention */ cs_etm__etmq_set_traceid_queue_timestamp(etmq, trace_chan_id);
@@ -372,6 +401,7 @@ cs_etm_decoder__buffer_packet(struct cs_etm_packet_queue *packet_queue, packet_queue->packet_buffer[et].flags = 0; packet_queue->packet_buffer[et].exception_number = UINT32_MAX; packet_queue->packet_buffer[et].trace_chan_id = trace_chan_id;
- packet_queue->packet_buffer[et].timestamp = 0;
if (packet_queue->packet_count == CS_ETM_PACKET_MAX_BUFFER - 1) return OCSD_RESP_WAIT; @@ -436,10 +466,6 @@ cs_etm_decoder__buffer_range(struct cs_etm_queue *etmq, packet->last_instr_size = elem->last_instr_sz;
Function cs_etm_decoder__do_soft_timestamp() (along with timestamping the packet) should be done right here. That way we can still get a timestamp for the packet even if the packet queue is full.
- /* per-thread scenario, no need to generate a timestamp */
- if (cs_etm__etmq_is_timeless(etmq))
goto out;
- /*
- The packet queue is full and we haven't seen a timestamp (had we
- seen one the packet queue wouldn't be full). Let the front end
@@ -449,9 +475,14 @@ cs_etm_decoder__buffer_range(struct cs_etm_queue *etmq, goto out; packet_queue->instr_count += elem->num_instr_range;
- ret = cs_etm_decoder__do_soft_timestamp(packet_queue);
- if (ret == OCSD_RESP_CONT)
goto out;
- /* Tell the front end we have a new timestamp to process */
- ret = cs_etm_decoder__do_soft_timestamp(etmq, packet_queue,
trace_chan_id);
- packet->timestamp = packet_queue->timestamp;
- if (!cs_etm__etmq_is_timeless(etmq))
cs_etm__etmq_set_traceid_queue_timestamp(etmq, trace_chan_id);
out: return ret; } @@ -460,13 +491,26 @@ static ocsd_datapath_resp_t cs_etm_decoder__buffer_discontinuity(struct cs_etm_packet_queue *queue, const uint8_t trace_chan_id) {
- int ret;
- struct cs_etm_packet *packet;
- u64 timestamp;
- /*
- Something happened and who knows when we'll get new traces so
- reset time statistics.
* Record the last observed timestamp, so the last branch sample
*/* before discontinuity can be marked with it.
- timestamp = queue->timestamp; cs_etm_decoder__reset_timestamp(queue);
If you move this below "packet->timestamp = timestamp;", along with the comment, it won't be necessary to declare a new variable.
Thanks, Mathieu
- return cs_etm_decoder__buffer_packet(queue, trace_chan_id,
- ret = cs_etm_decoder__buffer_packet(queue, trace_chan_id, CS_ETM_DISCONTINUITY);
- packet = &queue->packet_buffer[queue->tail];
- packet->timestamp = timestamp;
- return ret;
} static ocsd_datapath_resp_t diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8496190ad553..84d615ae3a1a 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -204,6 +204,7 @@ static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) queue->packet_buffer[i].exception_number = UINT32_MAX; queue->packet_buffer[i].trace_chan_id = UINT8_MAX; queue->packet_buffer[i].cpu = INT_MIN;
}queue->packet_buffer[i].timestamp = 0;
} @@ -1098,6 +1099,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, sample.flags = tidq->prev_packet->flags; sample.insn_len = 1; sample.cpumode = event->sample.header.misc;
- sample.time = tidq->packet->timestamp;
if (etm->synth_opts.last_branch) { cs_etm__copy_last_branch_rb(etmq, tidq); @@ -1157,6 +1159,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq, sample.cpu = tidq->packet->cpu; sample.flags = tidq->prev_packet->flags; sample.cpumode = event->sample.header.misc;
- sample.time = tidq->packet->timestamp;
/* * perf report cannot handle events without a branch stack diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 1f6ec7babe70..a14b0220eaa7 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -122,6 +122,7 @@ struct cs_etm_packet { enum cs_etm_isa isa; u64 start_addr; u64 end_addr;
- u64 timestamp; u32 instr_count; u32 last_instr_type; u32 last_instr_subtype;
-- 2.11.0