When we use perf to catpure Coresight trace and generate instruction trace using 'perf script', we get the following output:
# perf record -e cs_etm/@tmc_etr0/ -C 9 taskset -c 9 sleep 1 # perf script --itrace=i1ns --ns -Fcomm,tid,pid,time,cpu,event,ip,sym,addr,symoff,flags,callindent ... perf 9024/9024 [009] 2690.650470551: instructions: call 0 ffffb305591aed54 coresight_timeout+0x28 perf 9024/9024 [009] 2690.650470551: instructions: call 0 ffffb305591aed58 coresight_timeout+0x2c perf 9024/9024 [009] 2690.650470551: instructions: call 0 ffffb305591aed5c coresight_timeout+0x30 perf 9024/9024 [009] 2690.650470551: instructions: call 0 ffffb305591aed60 coresight_timeout+0x34 perf 9024/9024 [009] 2690.650470551: instructions: jmp 0 ffffb305591aed7c coresight_timeout+0x50 perf 9024/9024 [009] 2690.650470551: instructions: jmp 0 ffffb305591aed80 coresight_timeout+0x54 perf 9024/9024 [009] 2690.650470551: instructions: jmp 0 ffffb305591aed84 coresight_timeout+0x58 perf 9024/9024 [009] 2690.650470552: instructions: jcc 0 ffffb305591aede4 coresight_timeout+0xb8 perf 9024/9024 [009] 2690.650470552: instructions: jcc 0 ffffb305591aede8 coresight_timeout+0xbc perf 9024/9024 [009] 2690.650470552: instructions: jcc 0 ffffb305591aedec coresight_timeout+0xc0 perf 9024/9024 [009] 2690.650470552: instructions: jcc 0 ffffb305591aedf0 coresight_timeout+0xc4 perf 9024/9024 [009] 2690.650470557: instructions: call 0 ffffb305591bccec ete_sysreg_read+0x0 perf 9024/9024 [009] 2690.650470557: instructions: call 0 ffffb305591bccf0 ete_sysreg_read+0x4 perf 9024/9024 [009] 2690.650470557: instructions: call 0 ffffb305591bccf4 ete_sysreg_read+0x8 perf 9024/9024 [009] 2690.650470557: instructions: call 0 ffffb305591bccf8 ete_sysreg_read+0xc perf 9024/9024 [009] 2690.650470557: instructions: call 0 ffffb305591bccfc ete_sysreg_read+0x10 perf 9024/9024 [009] 2690.650470557: instructions: call 0 ffffb305591bcd00 ete_sysreg_read+0x14
This output has the following issues: 1. Non-branch instructions have mnemonics of branch instructions (Column 6) 2. Branch target address is missing (Column 7)
This patch fixes these issues by changing the logic of instruction syntehsis for the Coresight trace queues.
Output after applying the patch: ... perf 6111/6111 [008] 457.332794461: instructions: 0 ffffb305591aed54 coresight_timeout+0x28 perf 6111/6111 [008] 457.332794461: instructions: 0 ffffb305591aed58 coresight_timeout+0x2c perf 6111/6111 [008] 457.332794461: instructions: 0 ffffb305591aed5c coresight_timeout+0x30 perf 6111/6111 [008] 457.332794461: instructions: jmp ffffb305591aed7c ffffb305591aed60 coresight_timeout+0x34 perf 6111/6111 [008] 457.332794461: instructions: 0 ffffb305591aed7c coresight_timeout+0x50 perf 6111/6111 [008] 457.332794461: instructions: 0 ffffb305591aed80 coresight_timeout+0x54 perf 6111/6111 [008] 457.332794461: instructions: jcc ffffb305591aede4 ffffb305591aed84 coresight_timeout+0x58 perf 6111/6111 [008] 457.332794462: instructions: 0 ffffb305591aede4 coresight_timeout+0xb8 perf 6111/6111 [008] 457.332794462: instructions: 0 ffffb305591aede8 coresight_timeout+0xbc perf 6111/6111 [008] 457.332794462: instructions: 0 ffffb305591aedec coresight_timeout+0xc0 perf 6111/6111 [008] 457.332794462: instructions: call ffffb305591bccec ffffb305591aedf0 coresight_timeout+0xc4 perf 6111/6111 [008] 457.332794462: instructions: 0 ffffb305591bccec ete_sysreg_read+0x0 perf 6111/6111 [008] 457.332794462: instructions: 0 ffffb305591bccf0 ete_sysreg_read+0x4 perf 6111/6111 [008] 457.332794462: instructions: 0 ffffb305591bccf4 ete_sysreg_read+0x8 perf 6111/6111 [008] 457.332794462: instructions: 0 ffffb305591bccf8 ete_sysreg_read+0xc perf 6111/6111 [008] 457.332794462: instructions: 0 ffffb305591bccfc ete_sysreg_read+0x10 perf 6111/6111 [008] 457.332794462: instructions: 0 ffffb305591bcd00 ete_sysreg_read+0x14
Changes in V2 - Updated commit message of Patch 1 - As discussed in the previous version [1], there were differences in instruction trace output before and after the patch. The timestamps for the instructions were not in sync. Added a patch 2 which fixes this issue.
Changes in V1 - https://lkml.org/lkml/2023/6/23/912
[1] https://lkml.org/lkml/2023/6/28/506
Tanmay Jagdale (2): perf: cs-etm: Fixes in instruction sample synthesis perf: cs-etm: Store previous timestamp in packet queue
tools/perf/util/cs-etm.c | 49 +++++++++++++++++++++++++++++++--------- tools/perf/util/cs-etm.h | 1 + 2 files changed, 39 insertions(+), 11 deletions(-)
The existing method of synthesizing instruction samples has the following issues: 1. Branch target address is missing. 2. Non-branch instructions have mnemonics of branch instructions.
To fix issue 1), start synthesizing the instructions from the previous packet (tidq->prev_packet) instead of current packet (tidq->packet). This way, it is easy to figure out the target address of the branch instruction in tidq->prev_packet which is the current packet's (tidq->packet) first executed instruction.
After the switch to processing the previous packet first, we no longer need to swap the packets during cs_etm__flush()
Fix for issue 2) is to set the sample flags only when we reach the last instruction in the tidq (which would be a branch instruction).
Signed-off-by: Tanmay Jagdale tanmay@marvell.com --- tools/perf/util/cs-etm.c | 32 +++++++++++++++++++++++++------- 1 file changed, 25 insertions(+), 7 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index d65d7485886c..55db1932f785 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1493,10 +1493,26 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, sample.stream_id = etmq->etm->instructions_id; sample.period = period; sample.cpu = tidq->packet->cpu; - sample.flags = tidq->prev_packet->flags; sample.cpumode = event->sample.header.misc;
- cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample); + cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->prev_packet, &sample); + + /* Populate branch target information only when we encounter + * branch instruction, which is at the end of tidq->prev_packet. + */ + if (addr == (tidq->prev_packet->end_addr - 4)) { + /* Update the perf_sample flags using the prev_packet + * since that is the queue we are synthesizing. + */ + sample.flags = tidq->prev_packet->flags; + + /* The last instruction of the previous queue would be a + * branch operation. Get the target of that branch by looking + * into the first executed instruction of the current packet + * queue. + */ + sample.addr = cs_etm__first_executed_instr(tidq->packet); + }
if (etm->synth_opts.last_branch) sample.branch_stack = tidq->last_branch; @@ -1717,7 +1733,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, /* Get instructions remainder from previous packet */ instrs_prev = tidq->period_instructions;
- tidq->period_instructions += tidq->packet->instr_count; + tidq->period_instructions += tidq->prev_packet->instr_count;
/* * Record a branch when the last instruction in @@ -1797,8 +1813,11 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, * been executed, but PC has not advanced to next * instruction) */ + /* Get address from prev_packet since we are synthesizing + * that in cs_etm__synth_instruction_sample() + */ addr = cs_etm__instr_addr(etmq, trace_chan_id, - tidq->packet, offset - 1); + tidq->prev_packet, offset - 1); ret = cs_etm__synth_instruction_sample( etmq, tidq, addr, etm->instructions_sample_period); @@ -1862,7 +1881,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
/* Handle start tracing packet */ if (tidq->prev_packet->sample_type == CS_ETM_EMPTY) - goto swap_packet; + goto reset_last_br;
if (etmq->etm->synth_opts.last_branch && etmq->etm->synth_opts.instructions && @@ -1898,8 +1917,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq, return err; }
-swap_packet: - cs_etm__packet_swap(etm, tidq); +reset_last_br:
/* Reset last branches after flush the trace */ if (etm->synth_opts.last_branch)
Since logic in cs_etm__sample is changed to synthesizing the previous packet (tidq->prev_packet) instead of current packet (tidq->packet), the first time this function is called, tidq->prev_packet is NULL and we return without processing anything.
This is as expected but, in the first call, we would have a valid timestamp (stored in tidq->packet_queue.cs_timestamp) which belongs to tidq->packet. This would be lost due to no processing.
Losing this timestamp results in all the synthesized packets being associated with the next timestamp and not their corresponding one.
To fix this, introduce a new variable (prev_cs_timestamp) to store the queue's timestamp in cs_etm__sample(). When we start synthesizing the prev_packet, use this cached value instead of the current timestamp (cs_timestamp).
Signed-off-by: Tanmay Jagdale tanmay@marvell.com --- tools/perf/util/cs-etm.c | 17 +++++++++++++---- tools/perf/util/cs-etm.h | 1 + 2 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 55db1932f785..d5072c16fcd8 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1459,13 +1459,15 @@ u64 cs_etm__convert_sample_time(struct cs_etm_queue *etmq, u64 cs_timestamp) }
static inline u64 cs_etm__resolve_sample_time(struct cs_etm_queue *etmq, - struct cs_etm_traceid_queue *tidq) + struct cs_etm_traceid_queue *tidq, + bool instruction_sample) { struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet_queue *packet_queue = &tidq->packet_queue;
if (!etm->timeless_decoding && etm->has_virtual_ts) - return packet_queue->cs_timestamp; + return instruction_sample ? packet_queue->prev_cs_timestamp : + packet_queue->cs_timestamp; else return etm->latest_kernel_timestamp; } @@ -1484,7 +1486,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, event->sample.header.size = sizeof(struct perf_event_header);
/* Set time field based on etm auxtrace config. */ - sample.time = cs_etm__resolve_sample_time(etmq, tidq); + sample.time = cs_etm__resolve_sample_time(etmq, tidq, true);
sample.ip = addr; sample.pid = thread__pid(tidq->thread); @@ -1560,7 +1562,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq, event->sample.header.size = sizeof(struct perf_event_header);
/* Set time field based on etm auxtrace config. */ - sample.time = cs_etm__resolve_sample_time(etmq, tidq); + sample.time = cs_etm__resolve_sample_time(etmq, tidq, false);
sample.ip = ip; sample.pid = thread__pid(tidq->prev_packet_thread); @@ -1849,6 +1851,13 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, } }
+ /* + * Since we synthesize the prev_packet, store the current timestamp + * here in prev_cs_timestamp so that when we *actually* synthesize + * the prev_packet, we use this timestamp and not the future one. + */ + tidq->packet_queue.prev_cs_timestamp = tidq->packet_queue.cs_timestamp; + cs_etm__packet_swap(etm, tidq);
return 0; diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 4696267a32f0..333eeb2c06a0 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -205,6 +205,7 @@ struct cs_etm_packet_queue { u32 instr_count; u64 cs_timestamp; /* Timestamp from trace data, converted to ns if possible */ u64 next_cs_timestamp; + u64 prev_cs_timestamp; struct cs_etm_packet packet_buffer[CS_ETM_PACKET_MAX_BUFFER]; };
On Thu, Apr 04, 2024 at 11:37:29PM +0530, Tanmay Jagdale wrote:
When we use perf to catpure Coresight trace and generate instruction trace using 'perf script', we get the following output:
# perf record -e cs_etm/@tmc_etr0/ -C 9 taskset -c 9 sleep 1 # perf script --itrace=i1ns --ns -Fcomm,tid,pid,time,cpu,event,ip,sym,addr,symoff,flags,callindent
Applies cleanly, can some Coresight people review this and provide a Reviewed-by?
Thanks!
- Arnaldo
⬢[acme@toolbox perf-tools-next]$ b4 am -ctsl --cc-trailers 20240404180731.7006-3-tanmay@marvell.com Grabbing thread from lore.kernel.org/all/20240404180731.7006-3-tanmay@marvell.com/t.mbox.gz Checking for newer revisions Grabbing search results from lore.kernel.org Analyzing 3 messages in the thread Checking attestation on all messages, may take a moment... --- ✓ [PATCH v2 1/2] perf: cs-etm: Fixes in instruction sample synthesis + Link: https://lore.kernel.org/r/20240404180731.7006-2-tanmay@marvell.com + Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com ✓ [PATCH v2 2/2] perf: cs-etm: Store previous timestamp in packet queue + Link: https://lore.kernel.org/r/20240404180731.7006-3-tanmay@marvell.com + Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com --- ✓ Signed: DKIM/marvell.com --- Total patches: 2 --- Cover: ./v2_20240404_tanmay_fix_coresight_instruction_synthesis_logic.cover Link: https://lore.kernel.org/r/20240404180731.7006-1-tanmay@marvell.com Base: applies clean to current tree git checkout -b v2_20240404_tanmay_marvell_com HEAD git am ./v2_20240404_tanmay_fix_coresight_instruction_synthesis_logic.mbx ⬢[acme@toolbox perf-tools-next]$
On 25/04/2024 19:54, Arnaldo Carvalho de Melo wrote:
On Thu, Apr 04, 2024 at 11:37:29PM +0530, Tanmay Jagdale wrote:
When we use perf to catpure Coresight trace and generate instruction trace using 'perf script', we get the following output:
# perf record -e cs_etm/@tmc_etr0/ -C 9 taskset -c 9 sleep 1 # perf script --itrace=i1ns --ns -Fcomm,tid,pid,time,cpu,event,ip,sym,addr,symoff,flags,callindent
Applies cleanly, can some Coresight people review this and provide a Reviewed-by?
Thanks!
Reviewed-by: James Clark james.clark@arm.com
Hi Arnaldo, James, On 2024-04-26 at 08:07:56, James Clark (james.clark@arm.com) wrote:
On 25/04/2024 19:54, Arnaldo Carvalho de Melo wrote:
On Thu, Apr 04, 2024 at 11:37:29PM +0530, Tanmay Jagdale wrote:
When we use perf to catpure Coresight trace and generate instruction trace using 'perf script', we get the following output:
# perf record -e cs_etm/@tmc_etr0/ -C 9 taskset -c 9 sleep 1 # perf script --itrace=i1ns --ns -Fcomm,tid,pid,time,cpu,event,ip,sym,addr,symoff,flags,callindent
Applies cleanly, can some Coresight people review this and provide a Reviewed-by?
Thanks!
Reviewed-by: James Clark <james.clark(a)arm.com>
Did this patch series get picked up the perf-tool-next branch ?
With Regards, Tanmay