This patchset adds more information about the final instuction in the
Instruction Range generic packet.
i) A flag is set if the last instruction is conditional [last_instr_cond].
ii) for A32/T32 ISA, the instruction subtype will be set to 'Implied Return'
[OCSD_S_INSTR_V7_IMPLIED_RET] if it is one of the instructions:
mov pc,lr
bx r14
pop {...,pc}
ldr pc,[sp], #offset
These are used by the CPU return predictor and in general by compilers
when a return is required.
The patchset also removes the uint32_t casts in the version #define
OCSD_VER_NUM to enable correct use with pre-processor.
updates for v2:
i) Instruction matching for implied return tightened to focus correctly on
specific instructions
ii) Typos in help / printed text corrected.
Mike Leach (4):
opencsd: Generic output packet - add additional instruction info
opencsd: Typo Fixes.
opencsd: docs: Update documents for new generic packet field
opencsd: Update README etc for version 0.10.0
README.md | 6 ++-
decoder/docs/doxygen_config.dox | 2 +-
decoder/docs/prog_guide/prog_guide_generic_pkts.md | 3 +-
decoder/include/common/trc_gen_elem.h | 1 +
.../include/opencsd/etmv4/trc_pkt_decode_etmv4i.h | 2 +
decoder/include/opencsd/ocsd_if_types.h | 1 +
decoder/include/opencsd/ocsd_if_version.h | 8 ++--
decoder/include/opencsd/trc_gen_elem_types.h | 1 +
decoder/source/etmv3/trc_pkt_decode_etmv3.cpp | 1 +
decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp | 29 ++++++--------
decoder/source/i_dec/trc_i_decode.cpp | 1 +
decoder/source/i_dec/trc_idec_arminst.cpp | 46 +++++++++++-----------
decoder/source/ocsd_dcd_tree.cpp | 1 -
decoder/source/ptm/trc_pkt_decode_ptm.cpp | 2 +
decoder/source/trc_gen_elem.cpp | 7 +++-
decoder/tests/source/trc_pkt_lister.cpp | 2 +-
16 files changed, 61 insertions(+), 52 deletions(-)
--
2.14.2
Greetings,
I'm trying to find an ARM server dedicated for high-computing and
parallelism,
that also supports Coresight.
The Cavium ThunderX2 fits perfectly for the performance requirements, but
i'm not quite sure Coresight is supported. This isn't documented anywhere,
so i thought it's best to ask this here.
Is Coresight supported on ThunderX2?
Thank you,
Mike.
The perf sample data contains flags to indicate the hardware trace data
is belonging to which type branch instruction, thus this can be used to
print out the human readable string. Arm CoreSight ETM sample data is
missed to set flags and it is always set to zeros, this results in perf
tool skips to print string for instruction types.
Arm CoreSight ETM supports different kind instruction of A64, A32 and
T32; as the first step, this patch is to set sample flags for A64
instructions.
The brief idea for implementation is describe as below:
- For element with OCSD_GEN_TRC_ELEM_TRACE_ON type, it is taken as trace
beginning packet; for element with OCSD_GEN_TRC_ELEM_NO_SYNC or
OCSD_GEN_TRC_ELEM_EO_TRACE, these two kinds elements are used to set
for trace end;
- For instruction range packet, mainly base on three factors to decide
the branch instruction types:
elem->last_i_type
elem->last_i_subtype
elem->last_instr_cond
If the instruction is immediate branch but without link and return
flag, we consider it as function internal branch; in fact the
immediate branch also can be used to invoke the function entry,
usually this is only used in assembly code to directly call a symbol
and don't expect to return back; after reviewing kernel normal
functions and user space programs, both of them are very seldom to use
immediate branch for function call. On the other hand, if we want to
decide the immediate branch is for function branch jumping or for
function calling, we need to rely on the start address of next packet
and check the symbol offset for the start address, this will
introduce much complexity in the implementation. So for this version
we simply consider immediate branch as function internal branch.
If the instruction is immediate branch with link, it's instruction
'BL' and which is used for function call.
If the instruction is indirect branch without link, this is
corresponding to instruction 'BR', this instruction usually is used
for dynamic link lib with below usage; so we think it's a return
instruction.
0000000000000680 <.plt>:
680: a9bf7bf0 stp x16, x30, [sp, #-16]!
684: 90000090 adrp x16, 10000 <__FRAME_END__+0xf630>
688: f947fe11 ldr x17, [x16, #4088]
68c: 913fe210 add x16, x16, #0xff8
690: d61f0220 br x17
If the instruction is indirect branch with link, e.g BLR, we think
it's a function call.
For function return, ARMv8 introduces a dedicated instruction 'ret',
which has flag of OCSD_S_INSTR_V8_RET.
- For exception packets, this patch divides into three types:
The first type of exception is caused by external logics like bus,
interrupt controller, debug module or PE reset or halt; this is
corresponding to flags "bcyi" which defined in doc perf-script.txt;
The second type is for system call, this is set as "bcs" by following
definition in the doc;
The third type is for CPU trap, data and instruction prefetch abort,
alignment abort; usually these exceptions are synchronous for CPU, so
set them as "bci" type.
This part is not very certain that this patch has set right flags for
them, the reason is the instruction trace decoding flags is original
from Intel PT and it's briefly defined in the document:
tools/perf/Documentation/perf-script.txt. But there have no more
detailed information to explain these flags and their corresponding
instructions.
This patch set exception related flags based on the literal meaning
which described in the doc, and should refine according to reviewing.
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Cc: Mike Leach <mike.leach(a)linaro.org>
Cc: Robert Walker <robert.walker(a)arm.com>
Cc: Al Grant <Al.Grant(a)arm.com>
Cc: Andi Kleen <andi(a)firstfloor.org>
Cc: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Arnaldo Carvalho de Melo <acme(a)redhat.com>
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
---
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 158 ++++++++++++++++++++++++
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 1 +
tools/perf/util/cs-etm.c | 4 +-
3 files changed, 161 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 938def6..b7cb962 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -347,6 +347,162 @@ cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder,
CS_ETM_TRACE_ON);
}
+static void cs_etm_decoder__set_sample_flags(
+ const void *context,
+ const ocsd_generic_trace_elem *elem)
+{
+ struct cs_etm_decoder *decoder = (struct cs_etm_decoder *) context;
+ struct cs_etm_packet *packet;
+ static u32 exc_num;
+
+ packet = &decoder->packet_buffer[decoder->tail];
+
+ switch (elem->elem_type) {
+ case OCSD_GEN_TRC_ELEM_TRACE_ON:
+ packet->flags = PERF_IP_FLAG_BRANCH |
+ PERF_IP_FLAG_TRACE_BEGIN;
+ break;
+
+ case OCSD_GEN_TRC_ELEM_NO_SYNC:
+ case OCSD_GEN_TRC_ELEM_EO_TRACE:
+ packet->flags = PERF_IP_FLAG_BRANCH |
+ PERF_IP_FLAG_TRACE_END;
+ break;
+
+ case OCSD_GEN_TRC_ELEM_INSTR_RANGE:
+ /*
+ * Immediate branch instruction without neither link nor
+ * return flag, it's normal branch instruction within
+ * the function.
+ */
+ if (elem->last_i_type == OCSD_INSTR_BR &&
+ elem->last_i_subtype == OCSD_S_INSTR_NONE) {
+ packet->flags = PERF_IP_FLAG_BRANCH;
+
+ if (elem->last_instr_cond)
+ packet->flags |= PERF_IP_FLAG_CONDITIONAL;
+ }
+
+ /*
+ * Immediate branch instruction with link (e.g. BL), this is
+ * branch instruction for function call.
+ */
+ if (elem->last_i_type == OCSD_INSTR_BR &&
+ elem->last_i_subtype == OCSD_S_INSTR_BR_LINK)
+ packet->flags = PERF_IP_FLAG_BRANCH |
+ PERF_IP_FLAG_CALL;
+
+ /*
+ * Indirect branch instruction without link (e.g. BR), usually
+ * this is used for function return, especially for functions
+ * within dynamic link lib.
+ */
+ if (elem->last_i_type == OCSD_INSTR_BR_INDIRECT &&
+ elem->last_i_subtype == OCSD_S_INSTR_NONE)
+ packet->flags = PERF_IP_FLAG_BRANCH |
+ PERF_IP_FLAG_RETURN;
+
+ /*
+ * Indirect branch instruction with link (e.g. BLR), this is
+ * branch instruction for function call.
+ */
+ if (elem->last_i_type == OCSD_INSTR_BR_INDIRECT &&
+ elem->last_i_subtype == OCSD_S_INSTR_BR_LINK)
+ packet->flags = PERF_IP_FLAG_BRANCH |
+ PERF_IP_FLAG_CALL;
+
+ /* Return instruction for function return. */
+ if (elem->last_i_type == OCSD_INSTR_BR_INDIRECT &&
+ elem->last_i_subtype == OCSD_S_INSTR_V8_RET)
+ packet->flags = PERF_IP_FLAG_BRANCH |
+ PERF_IP_FLAG_RETURN;
+
+ break;
+
+ case OCSD_GEN_TRC_ELEM_EXCEPTION:
+
+#define OCSD_EXC_RESET 0
+#define OCSD_EXC_DEBUG_HALT 1
+#define OCSD_EXC_CALL 2
+#define OCSD_EXC_TRAP 3
+#define OCSD_EXC_SYSTEM_ERROR 4
+#define OCSD_EXC_INST_DEBUG 6
+#define OCSD_EXC_DATA_DEBUG 7
+#define OCSD_EXC_ALIGNMENT 10
+#define OCSD_EXC_INST_FAULT 11
+#define OCSD_EXC_DATA_FAULT 12
+#define OCSD_EXC_IRQ 14
+#define OCSD_EXC_FIQ 15
+
+ /*
+ * Exception number is saved and can be used for return
+ * instruction analysis.
+ */
+ exc_num = elem->exception_number;
+
+ /*
+ * The exceptions are triggered by external signals
+ * from bus, interrupt controller, debug module,
+ * PE reset or halt.
+ */
+ if (exc_num == OCSD_EXC_RESET ||
+ exc_num == OCSD_EXC_DEBUG_HALT ||
+ exc_num == OCSD_EXC_SYSTEM_ERROR ||
+ exc_num == OCSD_EXC_INST_DEBUG ||
+ exc_num == OCSD_EXC_DATA_DEBUG ||
+ exc_num == OCSD_EXC_IRQ ||
+ exc_num == OCSD_EXC_FIQ)
+ packet->flags = PERF_IP_FLAG_BRANCH |
+ PERF_IP_FLAG_CALL |
+ PERF_IP_FLAG_ASYNC |
+ PERF_IP_FLAG_INTERRUPT;
+
+ /* The exception is for system call. */
+ if (exc_num == OCSD_EXC_CALL)
+ packet->flags = PERF_IP_FLAG_BRANCH |
+ PERF_IP_FLAG_CALL |
+ PERF_IP_FLAG_SYSCALLRET;
+
+ /*
+ * The exception is introduced by trap, instruction &
+ * data fault or alignment errors.
+ */
+ if (exc_num == OCSD_EXC_TRAP ||
+ exc_num == OCSD_EXC_ALIGNMENT ||
+ exc_num == OCSD_EXC_INST_FAULT ||
+ exc_num == OCSD_EXC_DATA_FAULT)
+ packet->flags = PERF_IP_FLAG_BRANCH |
+ PERF_IP_FLAG_CALL |
+ PERF_IP_FLAG_INTERRUPT;
+
+ break;
+
+ case OCSD_GEN_TRC_ELEM_EXCEPTION_RET:
+ if (exc_num == OCSD_EXC_CALL)
+ packet->flags = PERF_IP_FLAG_BRANCH |
+ PERF_IP_FLAG_RETURN |
+ PERF_IP_FLAG_SYSCALLRET;
+ else
+ packet->flags = PERF_IP_FLAG_BRANCH |
+ PERF_IP_FLAG_RETURN |
+ PERF_IP_FLAG_INTERRUPT;
+ exc_num = -1;
+ break;
+
+ case OCSD_GEN_TRC_ELEM_UNKNOWN:
+ case OCSD_GEN_TRC_ELEM_PE_CONTEXT:
+ case OCSD_GEN_TRC_ELEM_ADDR_NACC:
+ case OCSD_GEN_TRC_ELEM_TIMESTAMP:
+ case OCSD_GEN_TRC_ELEM_CYCLE_COUNT:
+ case OCSD_GEN_TRC_ELEM_ADDR_UNKNOWN:
+ case OCSD_GEN_TRC_ELEM_EVENT:
+ case OCSD_GEN_TRC_ELEM_SWTRACE:
+ case OCSD_GEN_TRC_ELEM_CUSTOM:
+ default:
+ break;
+ }
+}
+
static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
const void *context,
const ocsd_trc_index_t indx __maybe_unused,
@@ -390,6 +546,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
break;
}
+ cs_etm_decoder__set_sample_flags(context, elem);
+
return resp;
}
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index 612b575..9d5f65a 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -36,6 +36,7 @@ struct cs_etm_packet {
u8 exc;
u8 exc_ret;
int cpu;
+ u32 flags;
};
struct cs_etm_queue;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 3b37d66..bf66eb6 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -663,7 +663,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
sample.stream_id = etmq->etm->instructions_id;
sample.period = period;
sample.cpu = etmq->packet->cpu;
- sample.flags = 0;
+ sample.flags = etmq->prev_packet->flags;
sample.insn_len = 1;
sample.cpumode = event->header.misc;
@@ -719,7 +719,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq)
sample.stream_id = etmq->etm->branches_id;
sample.period = 1;
sample.cpu = etmq->packet->cpu;
- sample.flags = 0;
+ sample.flags = etmq->prev_packet->flags;
sample.cpumode = PERF_RECORD_MISC_USER;
/*
--
2.7.4
-Moves ocsd_if_version.h into include/opencsd and adds to headers used
as part of installation.
-Fixes bug in snapshot reader library used by test programs to handle
'offset' parameter in [dump] sections correctly.
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
This patchset adds more information about the final instuction in the
Instruction Range generic packet.
i) A flag is set if the last instruction is conditional [last_instr_cond].
ii) for A32/T32 ISA, the instruction subtype will be set to 'Implied Return'
[OCSD_S_INSTR_V7_IMPLIED_RET] if it is one of the instructions:
mov pc,lr
bx r14
pop {...,pc}
ldr pc,[sp], #offset
These are used by the CPU return predictor and in general by compilers
when a return is required.
The patchset also removes the uint32_t casts in the version #define
OCSD_VER_NUM to enable correct use with pre-processor.
Mike Leach (3):
opencsd: Generic output packet - add additional instruction info
opencsd: docs: Update documents for new generic packet field
opencsd: Update README etc for version 0.10.0
README.md | 6 +++--
decoder/docs/doxygen_config.dox | 2 +-
decoder/docs/prog_guide/prog_guide_generic_pkts.md | 3 ++-
decoder/include/common/trc_gen_elem.h | 1 +
.../include/opencsd/etmv4/trc_pkt_decode_etmv4i.h | 2 ++
decoder/include/opencsd/ocsd_if_types.h | 1 +
decoder/include/opencsd/ocsd_if_version.h | 8 +++---
decoder/include/opencsd/trc_gen_elem_types.h | 1 +
decoder/source/etmv3/trc_pkt_decode_etmv3.cpp | 1 +
decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp | 29 ++++++++++------------
decoder/source/i_dec/trc_i_decode.cpp | 1 +
decoder/source/i_dec/trc_idec_arminst.cpp | 16 ++++++++++++
decoder/source/ocsd_dcd_tree.cpp | 1 -
decoder/source/ptm/trc_pkt_decode_ptm.cpp | 2 ++
decoder/source/trc_gen_elem.cpp | 5 +++-
15 files changed, 53 insertions(+), 26 deletions(-)
--
2.14.2
Hi Mathieu, Mike,
[ + CoreSight list ]
When reviewed Andi's patch 'tools, perf, script: Add --call-trace and
--call-ret-trace' [1], I found after applying this patch with
CoreSight, perf fails to output two fields: one is missing to output
the sample flags (e.g. 'call', 'return', 'jmp', etc) [2], another
missing is failed to output the symbols [3]. The cause for the issues
is CoreSight doesn't set sample flags and simply set it to zero [4].
If I understand correctly (Mathieu also mentioned to me at connect),
I think before you guys have awared for this. So want to check if you
have existed fixing for this in case I am doing duplicate works at
here? :) Or there have some discussion and known issues so cannot
enable sample flags before?
I did some investigation for these, CoreSight can set partial sample
flags for A64 branch instructions with packet infos:
- ocsd_generic_trace_elem::last_i_type can be used to check if the
instruction is immediate branch instruction or indirect branch
instruction;
- ocsd_generic_trace_elem::last_i_subtype can be used to distinguish
if it's link branch instruction (OCSD_S_INSTR_BR_LINK), if it's link
branch instruction then this means this branch insturction is for
function call;
if it's return branch instruction (OCSD_S_INSTR_V8_RET), then it's
return instruction;
Otherwise (OCSD_S_INSTR_NONE), it's a simple branch instruction
within the function.
But there still have several things are not clear for me:
- How can we distinguish the exception is for system call, or
interrupt. The reason is we can see Intel-pt set sample flags for
different values for different exception types:
PERF_IP_FLAG_INTERRUPT
PERF_IP_FLAG_SYSCALLRET
I think we might can use ocsd_generic_trace_elem::exception_number
to distinguish the different exception types, but I don't find any
doc to clear define for this value.
- I don't know if there have any info or hints in CoreSight packet can
be used to indicate the branch is conditional branch or not. For
Intel-pt, it can set the conditional branch instruction with flag:
PERF_IP_FLAG_CONDITIONAL.
- I understand it's low priority to support A32 and T32 instructions,
but just note here I also don't have no idea for this part.
Could you give some suggestion and guidance for this? Thanks in
advance.
Thanks,
Leo Yan
[1] https://lore.kernel.org/patchwork/patch/987916/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/too…
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/too…
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/too…
Coresight DT bindings have been updated to obey the DTS rules
for label/address matching for graph nodes. The changes are in
coresight/next tree scheduled for v4.20. This series updates the
in kernel dts to match the new bindings along with updating a couple
of new examples (e.,g CATU) in the Documentation (which were missed
as they were still in flight when we created the series).
Please note that this should not be pulled for v4.19, which I think
is a safe assumption. But please do pull it for v4.20.
The dt updates for the Juno boards were sent earlier with the original
DT update series and has been queued for v4.20.
Applies on coresight/next (which is based on v4.19) and should apply
cleanly on v4.19-rc3.
Changes since V1:
- Avoid "avoid_unnecessary_addr_size" warnings by removing
#address-cells/#size-cells for single port with address 0.
- Fix TPIU inport for qcom msm8196. (Leo Yan)
- Fix documentation example for TPIU (Leo Yan)
- Fix subject tags (as pointed out by Leo and Shawn)
- Drop patch for TC2, which has been queued by Sudeep
Cc: Alexandre Belloni <alexandre.belloni(a)bootlin.com>
Cc: Andy Gross <andy.gross(a)linaro.org>
Cc: Benoît Cousson <bcousson(a)baylibre.com>
Cc: David Brown <david.brown(a)linaro.org>
Cc: Fabio Estevam <fabio.estevam(a)nxp.com>
Cc: Frank Rowand <frowand.list(a)gmail.com>
Cc: Ivan T. Ivanov <ivan.ivanov(a)linaro.org>
Cc: Linus Walleij <linus.walleij(a)linaro.org>
Cc: linux-omap(a)vger.kernel.org
Cc: lipengcheng8(a)huawei.com
Cc: Liviu Dudau <liviu.dudau(a)arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Cc: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Cc: orsonzhai(a)gmail.com
Cc: Pengutronix Kernel Team <kernel(a)pengutronix.de>
Cc: Rob Herring <robh(a)kernel.org>
Cc: Sascha Hauer <s.hauer(a)pengutronix.de>
Cc: Shawn Guo <shawnguo(a)kernel.org>
Cc: Sudeep Holla <sudeep.holla(a)arm.com>
Cc: Tony Lindgren <tony(a)atomide.com>
Cc: Wei Xu <xuwei5(a)hisilicon.com>
Cc: xuwei5(a)hisilicon.com
Cc: zhang.lyra(a)gmail.com
Cc: arm(a)kernel.org
Suzuki K Poulose (11):
coresight: dts: binding: Fix example for TPIU component
coresight: dts: binding: Update coresight binding examples
arm64: dts: hi6220: Update coresight bindings for hardware ports
arm64: dts: sc9836/sc9860: Update coresight bindings for hardware
ports
arm64: dts: msm8916: Update coresight bindings for hardware ports
arm: dts: hip04: Update coresight bindings for hardware ports
arm: dts: imx7: Update coresight binding for hardware ports
arm: dts: omap: Update coresight bindings for hardware ports
arm: dts: qcom: Update coresight bindings for hardware ports
arm: dts: sama5d2: Update coresight bindings for hardware ports
arm: dts: ste-dbx5x0: Update coresight bindings for hardware port
.../devicetree/bindings/arm/coresight.txt | 27 +-
arch/arm/boot/dts/hip04.dtsi | 346 +++++++++---------
arch/arm/boot/dts/imx7d.dtsi | 14 +-
arch/arm/boot/dts/imx7s.dtsi | 82 ++---
arch/arm/boot/dts/omap3-beagle-xm.dts | 17 +-
arch/arm/boot/dts/omap3-beagle.dts | 17 +-
arch/arm/boot/dts/qcom-apq8064.dtsi | 71 ++--
arch/arm/boot/dts/qcom-msm8974.dtsi | 104 +++---
arch/arm/boot/dts/sama5d2.dtsi | 17 +-
arch/arm/boot/dts/ste-dbx5x0.dtsi | 65 ++--
.../boot/dts/hisilicon/hi6220-coresight.dtsi | 181 +++++----
arch/arm64/boot/dts/qcom/msm8916.dtsi | 95 ++---
arch/arm64/boot/dts/sprd/sc9836.dtsi | 82 +++--
arch/arm64/boot/dts/sprd/sc9860.dtsi | 215 +++++------
14 files changed, 682 insertions(+), 651 deletions(-)
--
2.19.0
Hello,
As promised at teh recent Linaro Connect, the patch to enable ETM
strobing for AutoFDO is now available on github/Linaro/perf-opencsd,
branch master-4.19-rc1-afdo-etm-strobe.
https://github.com/Linaro/perf-opencsd/commits/master-4.19-rc1-afdo-etm-str…
Regards
Mike
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
This set _should_ add support for ETMv3/PTM1.1 trace decoding. It was
produced close to two years ago on top of a code base that no longer
exist. At the time though, it did work.
So I've rebased the work to the current coresight next branch. It apply
and compiles cleanly but other than that, I can't offer any guarantee of
proper operation. I am currently traveling and don't have access to a
platform where it can be tested. Even if I was, I do not have the
bandwidth to work on the feature.
As such I am releasing it on this list, in the hope that it can help
someone get started with trace decoding on ETMv3/PTM1.1.
Let me know how bad it crashes.
Mathieu
Mathieu Poirier (3):
perf tools: Add configuration for ETMv3 trace protocol
perf tools: Add support for ETMv3 trace decoding
perf tools: Add support for PTMv1.1 decoding
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 31 +++++++++++
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 9 +++
tools/perf/util/cs-etm.c | 73 ++++++++++++++++++++-----
3 files changed, 99 insertions(+), 14 deletions(-)
--
2.7.4
Hello,
Sorry for the delay. I appreciate your posts.
I have recorded a different program now("ping 8.8.8.8"), and it seems that
decoding
the trace using the "ping" ELF file gives no issues now. I cannot explain
how "ls"
is the only corrupt trace(i rerecorded, same results). Perhaps the image is
indeed wrong.
I will check it further.
Thank you very much!
On Thu, Sep 20, 2018 at 10:42 AM, Mike Bazov <mike(a)perception-point.io>
wrote:
> Hello,
>
> Sorry for the delay. I appreciate your posts.
>
> I have recorded a different program now("ping 8.8.8.8"), and it seems that
> decoding
> the trace using the "ping" ELF file gives no issues now. I cannot explain
> how "ls"
> is the only corrupt trace(i rerecorded, same results). Perhaps the image
> is indeed wrong.
> I will check it further.
>
> Thank you very much!
>
> Mike.
>
>
> On Thu, Sep 20, 2018 at 1:28 AM, Mike Leach <mike.leach(a)linaro.org> wrote:
>
>> Hi Mike,
>>
>> I have looked into this issue further, found my previous assumption to
>> be wrong, and unfortunately have come to the conclusion that the
>> generated trace is somehow wrong / corrupt, or the supplied image is
>> not what was run when the trace was generated.
>>
>> If you look at the attached analysis of the trace generated from the
>> ls_api.cs data [analysis001.txt] This is at the very start of the
>> traced image.
>>
>> The first few packets [raw packets (0)] show the sync and start at
>> 00000000004003f0 <_start>:
>> followed by the first 'E' atom that marks the branch to 0x41a158. The
>> next two 'E' atoms get us to 0x41a028.
>>
>> At this point we get an exception packet, followed by a preferred
>> return address packet [ raw packets (2)].
>> This return address is 0x400630.
>>
>> The rules from the ETM architecture specification 4.0-4.4 p6-242 state:-
>>
>> "The Exception packet contains an address. This means that execution
>> has continued from the target of the most
>> recent P0 element, up to, but not including, that address, and a trace
>> analyzer must analyze each instruction in this
>> range."
>>
>> Thus the decoder is required to analyze from the previous P0 element -
>> the 'E' atom that marked the branch to 0x41a028, until the preferred
>> return address.
>> This is actually lower than the start address, which results in a huge
>> range seen here, and also seen by you in the example you described.
>> The decoder effectively runs off the end of the memory image before it
>> stops.
>>
>> The trace should be indicating an address after but relatively close
>> to 0x41a028 - as otherwise an atom would have been emitted by the cbnz
>> 41a054.
>>
>> If I examine the start of the perf_ls.cs decode, I see the same 3 'E'
>> atoms followed by the odd data fault exception.
>>
>> So for the first few branches at least, the perf and api captures go
>> in the same direction.
>>
>> Given the it is unlikely that the generated trace packets are
>> incorrect - it seems more likely that the 'ls' image being used for
>> decode is not what is generating this trace. Since we have to analyze
>> opcodes to follow the 'E' and 'N' atoms, decode relies on accurate
>> memory images being fed into the decoder. The only actual addresses we
>> have explicitly stated in the trace are the start: 0x4003f0, and the
>> exception return address 0x400360. The others are synthesized from the
>> supplied image.
>>
>> There may be a case for checking when decoding the exception packet
>> that the address is not behind the current location and throwing an
>> error, but beyond that I do not at this point believe that the decoder
>> is at fault.
>>
>> Regards
>>
>> Mike
>>
>>
>>
>> On 18 September 2018 at 19:32, Mike Leach <mike.leach(a)linaro.org> wrote:
>> > Hi Mike,
>> >
>> > I've looked further at this today, and can see a location where a
>> > large block appears in both the api and perf trace data on decode
>> > using the library test program.
>> >
>> > There does appear to be an issue if the decoder is in a "waiting for
>> > address state" i.e. it has lost track usually because an area of
>> > memory is unavailable, and an exception packet is seen - the exception
>> > address appears to be used twice - both to complete an address range
>> > and as an exception return - hence in this case the improbable large
>> > block. I need to look into this in more detail and fix it up.
>> >
>> > However - I am seeing before this the api and perf decodes have
>> > diverged, which suggests an issue elsewhere too perhaps. I do need to
>> > look deeper into this as well.
>> > I am not 100% certain that using the ls.bin as a full memory image at
>> > 0x400000 is necessarily working in the snapshot tests - there might be
>> > another offset needed to access the correct opcodes for the trace.
>> >
>> > I'll let you know if I make further progress.
>> >
>> >
>> > On 17 September 2018 at 16:53, Mike Leach <mike.leach(a)linaro.org>
>> wrote:
>> >> Hi Mike,
>> >>
>> >> I've looked at the data you supplied.
>> >>
>> >> I created test snapshot directories so that I could run each of the
>> >> trace data files through the trc_pkt_lister test program (the attached
>> >> .tgz file contains these, plus the results).
>> >>
>> >> Now the two trace files are different sizes - this is explained by the
>> >> fact that the api trace run had cycle counts switched on, whereas the
>> >> perf run did not - plus the perf run turned off the trace while in
>> >> kernel calls - the api left the trace on, though filtering out the
>> >> kernel - but a certain amount of sync packets have come through adding
>> >> to the size.
>> >>
>> >> Now looking at the results I cannot see the 0x4148f4 location in
>> >> either trace dump (perf_ls2.ppl and api_ls2.ppl in the .tgz).
>> >>
>> >> There are no obvious differences I could detect in the results, though
>> >> they are difficult to compare given the difference in output.
>> >>
>> >> The effect you are seeing does look like some sort of runaway - with
>> >> the decoder searching for opcodes - possibly in a section of the ls
>> >> binary file that does not contain executable code - till it happens
>> >> upon something that looks like an opcode.
>> >>
>> >> At this point I cannot explain the difference you and I are seeing
>> >> given the data provided. Can you look at the snapshot results, and see
>> >> if there is anything there? You can re-run the tests I ran if you
>> >> rename ls to ls.bin and put on level up from the ss-perf or ss-api
>> >> snapshot directories where the file is referenced to.
>> >>
>> >> Regards
>> >>
>> >> Mike
>> >>
>> >>
>> >>
>> >>
>> >> On 17 September 2018 at 13:44, Mike Bazov <mike(a)perception-point.io>
>> wrote:
>> >>> Greetings,
>> >>>
>> >>> I recorded the program "ls" (statically linked to provide a single
>> >>> executable as a memory accesses file).
>> >>>
>> >>> I recorded the program using perf, and then extracted the actual raw
>> trace
>> >>> data from the perf.data file using a little tool i wrote. I can use
>> OpenCSD
>> >>> to fully decode the trace produced by perf.
>> >>>
>> >>> I also recorded the "ls" util using an API i wrote from kernel mode. I
>> >>> published the API here as an [RFC]. Basically, i start recording and
>> stop
>> >>> recording whenever the __process__ of my interest is scheduling in.
>> >>> This post is not much about requesting a review for my API.. but i do
>> have
>> >>> some issues with the trace that is produced by this API, and i'm not
>> quite
>> >>> sure why.
>> >>>
>> >>> I use the OpenCSD directly in my code, and register a decoder
>> callback for
>> >>> every generic trace element. When my callback is called, i simply
>> print the
>> >>> element string representation(e.g. OCSD_GEN_TRC_ELEM_INSTR_RANGE).
>> >>>
>> >>> Now, the weird thing is the perf and API produce the same generic
>> elements
>> >>> until a certain element:
>> >>>
>> >>> OCSD_GEN_TRC_ELEM_TRACE_ON()
>> >>> ...
>> >>> ...
>> >>> ... same elements...
>> >>> ... same elements...
>> >>> ... same elements...
>> >>> ...
>> >>> ...
>> >>>
>> >>> And eventually diverge from each other. I assume the perf trace is
>> going in
>> >>> the right direction, but my trace simply starts going nuts. The last
>> >>> __common__ generic element is the following:
>> >>>
>> >>> OCSD_GEN_TRC_ELEM_INSTR_RANGE(exec range=0x4148f4:[0x414910]
>> (ISA=A64) E iBR
>> >>> A64:ret )
>> >>>
>> >>> After this element, perf trace goes in a different route, and the API
>> right
>> >>> afterwards produced a very weird instruction range element:
>> >>>
>> >>> OCSD_GEN_TRC_ELEM_INSTR_RANGE(exec range=0x414910:[0x498a20]
>> (ISA=A64) E ---
>> >>> )
>> >>>
>> >>> There is no way this 0x498a20 address was reached, and i cannot see
>> any
>> >>> proof for it in the trace itself(using ptm2human). It seems that the
>> decoder
>> >>> keeps decoding and disassembling opcodes until it reaches 0x498a20...
>> my
>> >>> memory callback(callback that is called if the decoder needs memory
>> that
>> >>> isn't present) is called for the address 0x498a20. From the on, the
>> trace
>> >>> just goes into a very weird path. I can't explain the address
>> branches that
>> >>> are taken from here on.
>> >>>
>> >>>
>> >>> Any ideas on how to approach this? OpenCSD experts would be
>> appreciated.
>> >>> I have attached the perf and API trace, and the "ls" executable which
>> is
>> >>> loaded into address 0x400000. I also attached the ETMv4 config for
>> every
>> >>> trace(trace id, etc..). There is no need to create multiple decoders
>> for
>> >>> different trace ids, theres only a single ID for a single decoder.
>> >>>
>> >>> Thanks,
>> >>> Mike.
>> >>>
>> >>> _______________________________________________
>> >>> CoreSight mailing list
>> >>> CoreSight(a)lists.linaro.org
>> >>> https://lists.linaro.org/mailman/listinfo/coresight
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Mike Leach
>> >> Principal Engineer, ARM Ltd.
>> >> Manchester Design Centre. UK
>> >
>> >
>> >
>> > --
>> > Mike Leach
>> > Principal Engineer, ARM Ltd.
>> > Manchester Design Centre. UK
>>
>>
>>
>> --
>> Mike Leach
>> Principal Engineer, ARM Ltd.
>> Manchester Design Centre. UK
>>
>
>