The first patch is a minor typo to allow printing the trace info.
The second patch enables translation of perf.data files into last branch events
that are then processed by the autoFDO tool to extract a coverage file.
Sebastian Pop (2):
perf tools: fix printing of auxtrace_info
perf tools: new inject capabilitity for CoreSight traces
tools/perf/Documentation/cs-etm.txt | 38 +++++++++
tools/perf/util/cs-etm.c | 163 ++++++++++++++++++++++++++++++++++--
2 files changed, 196 insertions(+), 5 deletions(-)
create mode 100644 tools/perf/Documentation/cs-etm.txt
--
2.6.3
On 11 November 2016 at 06:45, Yan Lin Aung <yan_lin_aung(a)yahoo.com> wrote:
> Hi Mathieu,
>
> I have made progress with getting things up.
>
> I now switched to another platform with quad-core A53 processors because
> the Juno r2 environment is a bit difficult for me to work with.
>
> The following describes the steps taken:
>
> 1) I have Linux 4.4.23 running on quad-core A53. CoreSight is enabled.
> I can see CoreSight components under "/sys/bus/coresight/devices"
>
> linaro@linaro-alip:~/OpenCSD-perf-opencsd-4.9-rc1/tools/perf$ ls
> /sys/bus/coresight/devices/
> 820000.tpiu 821000.funnel 824000.replicator 825000.etf 826000.etr
> 841000.funnel 85c000.etm 85d000.etm 85e000.etm 85f000.etm
>
>
> I can enable/disable ETM and ETR (e.g. echo 1 > 85c000.etm/enable_source,
> cat 85c000.etm/enable_source).
Ok, but that is not required since identification of the sink to use
is now done from the perf cmd line.
>
> 2) I am able to build OpenCSD library. Then, share libraries
> (libcstraced_c_api.so, libcstraced.so) are copied to "/usr/lib".
> Then, I tried testing the library by running "c_api_pkt_print_test". Below
> shows the sample outputs.
>
> C-API packet print test
> Library Version 0.4.2
>
> Idx:86; I_NOT_SYNC : I Stream not synchronised
> Idx:1650; I_ASYNC : Alignment Synchronisation.
> Idx:1662; I_TRACE_INFO : Trace Info.
> Idx:1666; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.;
> Addr=0xFFFFFFC000096A00;
> Idx:1675; I_TRACE_ON : Trace On.
> Idx:1676; I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.;
> Addr=0xFFFFFFC000096A00; Ctxt: AArch64,EL1, NS; CID=0x00000000; VMID=0x0000;
> Idx:1692; I_ATOM_F1 : Atom format 1.; E
> Idx:1693; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.;
> Addr=0xFFFFFFC000594AC0;
> Idx:1703; I_ATOM_F1 : Atom format 1.; E
> Idx:1704; I_ADDR_S_IS0 : Address, Short, IS0.; Addr=0xFFFFFFC000592B58
> ~[0x12B58]
> Idx:1707; I_ATOM_F3 : Atom format 3.; ENN
> Idx:1708; I_ATOM_F1 : Atom format 1.; E
> Idx:1709; I_ADDR_L_32IS0 : Address, Long, 32 bit, IS0.; Addr=0x005AC4C8;
> Idx:1715; I_ATOM_F2 : Atom format 2.; EE
> Idx:1716; I_ADDR_L_32IS0 : Address, Long, 32 bit, IS0.; Addr=0x000EA588;
> Idx:1721; I_ATOM_F3 : Atom format 3.; NNE
> Idx:1722; I_ADDR_L_32IS0 : Address, Long, 32 bit, IS0.; Addr=0x00592B60;
>
>
> 3) Then, perf tool under the "OpenCSD-perf-opencsd-4.9-rc1" branch is
> compiled on target.
> The following is the compilation log. It seems to me there is no issue with
> compilation ("CC util/cs-etm-decoder/cs-etm-decoder.o" is compiled
> properly). I also exported "CSTRACE_PATH=/home/linaro/OpenCSD-0.4.2/decoder"
> before compilation.
>
> linaro@linaro-alip:~/OpenCSD-perf-opencsd-4.9-rc1$ make -C tools/perf/
> make: Entering directory
> '/home/linaro/OpenCSD-perf-opencsd-4.9-rc1/tools/perf'
> BUILD: Doing 'make -j4' parallel build
>
> Auto-detecting system features:
> ... dwarf: [ on ]
> ... dwarf_getlocations: [ on ]
> ... glibc: [ on ]
> ... gtk2: [ on ]
> ... libaudit: [ on ]
> ... libbfd: [ on ]
> ... libelf: [ on ]
> ... libnuma: [ on ]
> ... numa_num_possible_cpus: [ on ]
> ... libperl: [ on ]
> ... libpython: [ on ]
> ... libslang: [ on ]
> ... libcrypto: [ on ]
> ... libunwind: [ on ]
> ... libdw-dwarf-unwind: [ on ]
> ... zlib: [ on ]
> ... lzma: [ on ]
> ... get_cpuid: [ OFF ]
> ... bpf: [ on ]
>
> Makefile.config:349: BPF prologue is not supported by architecture arm64,
> missing regs_query_register_offset()
> Makefile.config:400: No debug_frame support found in libunwind-aarch64
> Makefile.config:459: No debug_frame support found in libunwind
> GEN common-cmds.h
> HOSTCC fixdep.o
> HOSTCC pmu-events/json.o
> HOSTLD fixdep-in.o
> LINK fixdep
> HOSTCC pmu-events/jsmn.o
> HOSTCC pmu-events/jevents.o
> CC fd/array.o
> CC fs/fs.o
> HOSTLD pmu-events/jevents-in.o
> CC event-parse.o
> LD fd/libapi-in.o
> CC cpu.o
> CC debug.o
> CC str_error_r.o
> CC fs/tracing_path.o
> CC exec-cmd.o
> PERF_VERSION = 4.9.0-rc1
> Warning: tools/include/uapi/linux/bpf.h differs from kernel
> CC libbpf.o
> LD fs/libapi-in.o
> LD libapi-in.o
> AR libapi.a
> CC bpf.o
> CC help.o
> CC pager.o
> LD libbpf-in.o
> LINK libbpf.a
> LINK pmu-events/jevents
> CC event-plugin.o
> CC plugin_jbd2.o
> CC parse-options.o
> LD plugin_jbd2-in.o
> CC plugin_hrtimer.o
> LD plugin_hrtimer-in.o
> CC trace-seq.o
> CC plugin_kmem.o
> CC run-command.o
> LD plugin_kmem-in.o
> CC plugin_kvm.o
> CC parse-filter.o
> CC sigchain.o
> LD plugin_kvm-in.o
> CC plugin_mac80211.o
> LD plugin_mac80211-in.o
> CC plugin_sched_switch.o
> CC plugin_function.o
> CC parse-utils.o
> LD plugin_sched_switch-in.o
> CC plugin_xen.o
> LD plugin_xen-in.o
> CC plugin_scsi.o
> LD plugin_function-in.o
> CC plugin_cfg80211.o
> CC kbuffer-parse.o
> LD plugin_scsi-in.o
> LD plugin_cfg80211-in.o
> LINK plugin_jbd2.so
> LINK plugin_hrtimer.so
> LINK plugin_kmem.so
> LINK plugin_kvm.so
> LINK plugin_mac80211.so
> LD libtraceevent-in.o
> LINK libtraceevent.a
> LINK plugin_sched_switch.so
> GEN perf-archive
> LINK plugin_function.so
> GEN perf-with-kcore
> CC ui/gtk/browser.o
> LINK plugin_xen.so
> LINK plugin_scsi.so
> LINK plugin_cfg80211.so
> CC subcmd-config.o
> CC ui/gtk/hists.o
> LD libsubcmd-in.o
> CC util/alias.o
> AR libsubcmd.a
> Warning: tools/arch/x86/lib/memcpy_64.S differs from kernel
> Warning: tools/arch/x86/lib/memset_64.S differs from kernel
> Warning: tools/arch/arm/include/uapi/asm/kvm.h differs from kernel
> CC util/annotate.o
> Warning: tools/include/uapi/asm-generic/mman-common.h differs from kernel
> CC builtin-bench.o
> CC builtin-annotate.o
> CC ui/gtk/setup.o
> CC ui/gtk/util.o
> CC builtin-config.o
> CC builtin-diff.o
> CC util/block-range.o
> CC ui/gtk/helpline.o
> CC arch/common.o
> CC util/build-id.o
> CC arch/arm64/util/dwarf-regs.o
> CC arch/arm64/util/unwind-libunwind.o
> CC ui/gtk/progress.o
> CC builtin-evlist.o
> CC arch/arm64/util/../../arm/util/pmu.o
> CC util/config.o
> CC arch/arm64/util/../../arm/util/auxtrace.o
> CC builtin-help.o
> CC ui/gtk/annotate.o
> CC arch/arm64/util/../../arm/util/cs-etm.o
> CC builtin-sched.o
> CC util/ctype.o
> LD arch/arm64/util/libperf-in.o
> CC arch/arm64/tests/regs_load.o
> CC arch/arm64/tests/dwarf-unwind.o
> CC util/db-export.o
> LD ui/gtk/gtk-in.o
> LD arch/arm64/tests/libperf-in.o
> LD arch/arm64/libperf-in.o
> LD arch/libperf-in.o
> CC ui/setup.o
> CC util/env.o
> CC ui/helpline.o
> LD gtk-in.o
> GEN pmu-events/pmu-events.c
> CC pmu-events/pmu-events.o
> LD pmu-events/pmu-events-in.o
> CC ui/progress.o
> CC util/event.o
> CC ui/util.o
> GEN libtraceevent-dynamic-list
> CC ui/hist.o
> CC ui/stdio/hist.o
> CC builtin-buildid-list.o
> CC builtin-buildid-cache.o
> CC builtin-list.o
> CC ui/browser.o
> CC util/evlist.o
> CC builtin-record.o
> CC builtin-report.o
> CC ui/browsers/annotate.o
> CC ui/browsers/hists.o
> CC util/evsel.o
> CC builtin-stat.o
> CC builtin-timechart.o
> CC builtin-top.o
> CC util/evsel_fprintf.o
> CC builtin-script.o
> CC util/find_bit.o
> CC util/kallsyms.o
> CC util/levenshtein.o
> CC util/llvm-utils.o
> BISON util/parse-events-bison.c
> CC builtin-kmem.o
> CC ui/browsers/map.o
> CC util/perf_regs.o
> CC util/path.o
> CC ui/browsers/scripts.o
> CC util/rbtree.o
> CC ui/browsers/header.o
> CC util/libstring.o
> CC builtin-lock.o
> CC util/bitmap.o
> LD ui/browsers/libperf-in.o
> CC util/hweight.o
> CC ui/tui/setup.o
> CC util/quote.o
> CC util/strbuf.o
> CC ui/tui/util.o
> CC util/string.o
> CC builtin-kvm.o
> CC builtin-inject.o
> CC ui/tui/helpline.o
> CC util/strlist.o
> CC ui/tui/progress.o
> CC util/strfilter.o
> LD ui/tui/libperf-in.o
> LD ui/libperf-in.o
> GEN python/perf.so
> CC scripts/perl/Perf-Trace-Util/Context.o
> CC builtin-mem.o
> CC util/top.o
> LD scripts/perl/Perf-Trace-Util/libperf-in.o
> CC scripts/python/Perf-Trace-Util/Context.o
> CC builtin-data.o
> CC util/usage.o
> LD scripts/python/Perf-Trace-Util/libperf-in.o
> LD scripts/libperf-in.o
> CC builtin-version.o
> CC builtin-trace.o
> CC util/dso.o
> CC builtin-probe.o
> CC bench/sched-messaging.o
> CC bench/sched-pipe.o
> CC util/symbol.o
> CC bench/mem-functions.o
> CC bench/futex-hash.o
> CC bench/futex-wake.o
> CC bench/futex-wake-parallel.o
> CC bench/futex-requeue.o
> CC util/symbol_fprintf.o
> CC util/color.o
> CC bench/futex-lock-pi.o
> CC bench/numa.o
> CC util/header.o
> CC util/callchain.o
> CC util/values.o
> LD bench/perf-in.o
> CC tests/builtin-test.o
> CC tests/parse-events.o
> CC perf.o
> CC util/debug.o
> CC util/machine.o
> CC util/map.o
> CC util/pstack.o
> CC tests/dso-data.o
> CC util/session.o
> CC tests/attr.o
> CC util/syscalltbl.o
> CC tests/vmlinux-kallsyms.o
> CC util/ordered-events.o
> CC tests/openat-syscall.o
> CC tests/openat-syscall-all-cpus.o
> CC tests/openat-syscall-tp-fields.o
> CC tests/mmap-basic.o
> CC util/comm.o
> CC tests/perf-record.o
> CC util/thread.o
> CC tests/evsel-roundtrip-name.o
> CC tests/evsel-tp-sched.o
> CC tests/fdarray.o
> CC util/thread_map.o
> CC tests/pmu.o
> CC tests/hists_common.o
> CC util/trace-event-parse.o
> CC tests/hists_link.o
> CC util/parse-events-bison.o
> BISON util/pmu-bison.c
> CC util/trace-event-read.o
> CC util/trace-event-info.o
> CC tests/hists_filter.o
> CC util/trace-event-scripting.o
> CC util/trace-event.o
> CC tests/hists_output.o
> CC tests/hists_cumulate.o
> CC util/svghelper.o
> CC tests/python-use.o
> CC tests/bp_signal.o
> CC tests/bp_signal_overflow.o
> CC tests/task-exit.o
> CC tests/sw-clock.o
> CC tests/mmap-thread-lookup.o
> CC util/sort.o
> CC util/hist.o
> CC tests/thread-mg-share.o
> CC tests/switch-tracking.o
> CC util/util.o
> CC tests/keep-tracking.o
> CC util/xyarray.o
> CC tests/code-reading.o
> CC tests/sample-parsing.o
> CC tests/parse-no-sample-id-all.o
> CC tests/kmod-path.o
> CC tests/thread-map.o
> CC util/cpumap.o
> CC util/cgroup.o
> CC tests/llvm.o
> CC util/target.o
> CC tests/bpf.o
> CC tests/topology.o
> CC util/rblist.o
> CC util/intlist.o
> CC util/vdso.o
> CC util/counts.o
> CC tests/cpumap.o
> CC tests/stat.o
> CC tests/event_update.o
> CC tests/event-times.o
> CC util/stat.o
> CC util/stat-shadow.o
> CC tests/backward-ring-buffer.o
> CC tests/sdt.o
> CC tests/is_printable_array.o
> CC tests/bitmap.o
> CC tests/dwarf-unwind.o
> CC tests/llvm-src-base.o
> CC util/record.o
> CC tests/llvm-src-kbuild.o
> CC tests/llvm-src-prologue.o
> CC tests/llvm-src-relocation.o
> CC util/srcline.o
> CC util/data.o
> LD tests/perf-in.o
> CC util/tsc.o
> CC util/cloexec.o
> CC util/call-path.o
> CC util/thread-stack.o
> CC util/auxtrace.o
> CC util/intel-pt-decoder/intel-pt-pkt-decoder.o
> LD perf-in.o
> GEN util/intel-pt-decoder/inat-tables.c
> CC util/intel-pt-decoder/intel-pt-log.o
> CC util/intel-pt-decoder/intel-pt-decoder.o
> CC util/intel-pt-decoder/intel-pt-insn-decoder.o
> CC util/cs-etm-decoder/cs-etm-decoder.o
> LD util/cs-etm-decoder/libperf-in.o
> CC util/scripting-engines/trace-event-perl.o
> CC util/scripting-engines/trace-event-python.o
> CC util/intel-pt.o
> LD util/intel-pt-decoder/libperf-in.o
> CC util/intel-bts.o
> CC util/cs-etm.o
> LD util/scripting-engines/libperf-in.o
> CC util/parse-branch-options.o
> CC util/parse-regs-options.o
> CC util/term.o
> CC util/help-unknown-cmd.o
> CC util/mem-events.o
> CC util/vsprintf.o
> CC util/drv_configs.o
> CC util/bpf-loader.o
> CC util/symbol-elf.o
> CC util/probe-file.o
> CC util/probe-event.o
> CC util/probe-finder.o
> CC util/dwarf-aux.o
> CC util/dwarf-regs.o
> CC util/unwind-libunwind-local.o
> CC util/unwind-libunwind.o
> CC util/libunwind/arm64.o
> CC util/zlib.o
> CC util/lzma.o
> CC util/demangle-java.o
> CC util/demangle-rust.o
> CC util/jitdump.o
> CC util/genelf.o
> CC util/genelf_debug.o
> FLEX util/parse-events-flex.c
> FLEX util/pmu-flex.c
> CC util/parse-events.o
> CC util/pmu-bison.o
> CC util/parse-events-flex.o
> CC util/pmu.o
> CC util/pmu-flex.o
> LD util/libperf-in.o
> LD libperf-in.o
> AR libperf.a
> LINK perf
> LINK libperf-gtk.so
> make: Leaving directory
> '/home/linaro/OpenCSD-perf-opencsd-4.9-rc1/tools/perf'
>
> 4) Then, I test the perf as follows:
>
> linaro@linaro-alip:~/OpenCSD-perf-opencsd-4.9-rc1/tools/perf$ ./perf record
> -e cs_etm/(a)826000.etr/ --per-thread uname
> invalid or unsupported event: 'cs_etm/(a)826000.etr/'
> Run 'perf list' for a list of valid events
>
> Usage: perf record [<options>] [<command>]
> or: perf record [<options>] -- <command> [<options>]
>
> -e, --event <event> event selector. use 'perf list' to list available
> events
>
> It seems that "perf" is not able to populate the cs_etm events. "perf list"
> output shows:
>
> linaro@linaro-alip:~/OpenCSD-perf-opencsd-4.9-rc1/tools/perf$ ./perf list
>
> List of pre-defined events (to be used in -e):
>
> branch-misses [Hardware event]
> cache-misses [Hardware event]
> cache-references [Hardware event]
> cpu-cycles OR cycles [Hardware event]
> instructions [Hardware event]
>
> alignment-faults [Software event]
> bpf-output [Software event]
> context-switches OR cs [Software event]
> cpu-clock [Software event]
> cpu-migrations OR migrations [Software event]
> dummy [Software event]
> emulation-faults [Software event]
> major-faults [Software event]
> minor-faults [Software event]
> page-faults OR faults [Software event]
> task-clock [Software event]
>
> L1-dcache-load-misses [Hardware cache event]
> L1-dcache-loads [Hardware cache event]
> L1-dcache-store-misses [Hardware cache event]
> L1-dcache-stores [Hardware cache event]
> branch-load-misses [Hardware cache event]
> branch-loads [Hardware cache event]
>
> rNNN [Raw hardware event
> descriptor]
> cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event
> descriptor]
> (see 'man perf-list' on how to encode it)
>
> mem:<addr>[/len][:access] [Hardware breakpoint]
>
> I think I am very close to getting things up properly. Also, I need to get
> this working for a research work here.
>
> Any idea on why "perf" is not showing the events corresponding to "cs_etm"?
> Where possibly is the issue?
> Your kind help on this will be very much appreciated.
Did you get the kernel from github as per my previous email? Kernel
4.4.23 is very old and doesn't have all of the CoreSight features
required to integrate with perf.
Also, keep in mind that since you are not working with Juno CPUIdle
_needs_ to be disabled and you have to do make sure all power domains
and clocks for the CoreSight IP blocks are managed properly. Out of
curiosity, what platform is this?
Mathieu
>
> Looking forward to hear from you and thanks.
>
>
> Regards,
> Yan Lin Aung
>
> On Thursday, November 10, 2016 12:23 AM, Mathieu Poirier
> <mathieu.poirier(a)linaro.org> wrote:
>
>
> On 9 November 2016 at 08:06, Yan Lin Aung <yan_lin_aung(a)yahoo.com> wrote:
>> Hi Mathieu,
>>
>> Thanks for your reply. Sorry for a bit of delay on my response.
>>
>> Just a bit of intro on myself. I am a research staff from Nanyang
>> Technological University, Singapore.
>>
>> Basically, I used the build scripts provided with Linaro deliverables for
>> Juno and TC2 from ARM
>> at this link: https://community.arm.com/docs/DOC-10803
>>
>> I am able to get the system running either with prebuilt binaries or
>> building from source.
>> For Linaro release 16.09 with built from source option, the Linux 4.8.x
>> runs
>> on Juno r2.
>> In the default configuration, coresight was not activated.
>> I tried to update the config file to enable coresight drive and
>> recompiled.
>> However, the coresight devices are not populated somehow.
>>
>> I would like to have a setup with which I will be able to do some
>> experiments as demonstrated in your presentation at the very minimum.
>> So, my specific question will be that how shall I proceed to get
>> coresight,
>> perf with coresight and OpenCSD working properly
>> using the Linaro release 16.09.
>
> You won't have the required pieces in 16.09. To replicate the
> examples shown in the presentation you will have to use the kernel
> found on github [1]. Since you have a Juno R2 I suggest to use branch
> perf-opencsd-4.9-rc1 - that way you won't have to deal with power
> domain management. Note that CoreSight is not part of the default V8
> configuration as needs to be explicitly enabled.
>
> Thanks,
> Mathieu
>
> [1]. https://github.com/Linaro/OpenCSD
>
>
>>
>> If you are not using Linaro release 16.09 and have other means of getting
>> things up with coresight, perf and OpenCSD on Juno, please kindly share
>> with
>> me. I am quite keen to follow your steps and try it out at my side here.
>>
>> Looking forward to hear from you and thanks.
>>
>> Regards,
>> Yan Lin Aung
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Monday, November 7, 2016 11:32 PM, Mathieu Poirier
>> <mathieu.poirier(a)linaro.org> wrote:
>>
>>
>>> ---------- Forwarded message ----------
>>> From: Yan Lin Aung <yan_lin_aung(a)yahoo.com>
>>> To: "coresight(a)lists.linaro.org" <coresight(a)lists.linaro.org>
>>> Cc:
>>> Date: Mon, 7 Nov 2016 03:45:45 +0000 (UTC)
>>> Subject: perf with CoreSight and OpenCSD on TC2 and Juno r2
>>> Hi Linaro Coresight Team,
>>>
>>> I came to know of "Hardware Assisted Tracing on ARM with CoreSight and
>>> OpenCSD" by Mathieu Poirier.
>>> In his presentation, he mentioned the reference platforms to evaluate
>>> perf
>>> with CoreSight and OpenCSD are Vexpress TC2 and Juno (Page 7 on his
>>> slide).
>>>
>>> I just checked the "HOWTO.MD" at OpenCSD github site.
>>> However, there is very limited info on how to get started with Vexpress
>>> TC2 and Juno.
>>>
>>> I have access to the TC2 and Juno r2 platforms.
>>> Please provide a rather detailed version of getting started guide to try
>>> out perf with CoreSight and OpenCSD on either TC2 or Juno r2.
>>
>> Hello Yan Lin,
>>
>> You are correct, the HOWTO.md on github concentrates on CoreSight and
>> doesn't address platform specifics - something like this would be out
>> of scope. I'm not exactly sure of what you are looking for in a
>> "getting started guide"... Both Juno and TC2 are well supported
>> upstream and can be booted with a mainline kernel. The choice of
>> bootloader and user space are entirely up to users and don't affect
>> the CoreSight suite nor its integration with the perf subsystem.
>>
>> The fact that you have access to both platform leads me to believe you
>> are part of a large organisation. As such there is definitely people
>> around you with experience on how to set-up the platforms.
>>
>> I can try to answer specific questions if you have any.
>>
>> Thanks,
>> Mathieu
>>
>>>
>>> Thanx.
>>>
>>> Regards,
>>> Yan Lin Aung
>>>
>>
>>
>
>
On 19 December 2016 at 07:45, Yehuda Yitschak <yehuday(a)marvell.com> wrote:
> Hi Mathieu
>
>
>
> After some more debug I was able to resolve the trace issue I had on
> Linux-4.9-rc1
>
> If you remember I only got trace for CPU2 out of 4 CPUs which was really
> strange
>
>
>
> Turns out the issue comes from some quirk in our busses
>
> Our internal fabric is not able to write 64bit data to registers, only to
> memory
>
> So the address comparators in the ETM got corrupted values and there wasn’t
> any match on address for most CPUs.
>
> For some cryptic reason only CPU2 got somewhat reasonable comparator value
> (still not the intended, but a working one) and so it could generate trace
>
>
>
> Now I am able to generate proper trace consistently.
>
Very good.
>
>
> I was wondering how can I add latency or timing information to the trace
>
> I noticed the cs_etm event can accept an option of “cycacc” and “timestamp“
>
> How can I view this information later ?
>
> Should I use perf script –f ?
That information, when configured on the cmd line, will end up in the
perf.data file. From there it will be decoded and rendered by the
openCSD library.
Mike, can you comment on the format of the information that will be
found in the packet? Perhaps you have an example somewhere of traces
generated by the "cycacc" and "timestamp" option?
You will definitely need to create your own scripts as nothing we have
done so far uses those configuration parameters.
Thanks,
Mathieu
>
>
>
> Thanks a lot
>
>
>
> -------------------
>
> Yehuda Yitschak
>
> Marvell Semiconductor Ltd.
>
>
The patch adds documentation to HOWTO.md on how to use CoreSight ETM to perform
Feedback Directed Optimization.
Sebastian Pop (1):
HOWTO: add example of how to extract coverage files for autoFDO
HOWTO.md | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
--
2.6.3
On 8 December 2016 at 02:04, Chunyan Zhang <zhang.chunyan(a)linaro.org> wrote:
>
> Hi Nicolas,
>
> On 8 December 2016 at 16:07, Nicolas GUION <nicolas.guion(a)st.com> wrote:
>
>> Chunyan,
>>
>> No problem and it offers me the opportunity to inform you that this last
>> months in ST I worked on ARM coresight trace.
>>
>> Several month ago I contacted Mathieu about ARM STM coresight feature.
>> Actually this year we started a new SOC project Accorod5, around A7ss and
>> of course with integration of ARM coresight components. Mathieu described
>> me the status in january, the next steps and especially added me in the
>> group for all patch dedicated to this topic.
>>
>>
>> So I followed the progression of the patch set delivery in official linux
>> stream, and in october I started the integration of this topic in our BSP
>> (based form 4.1)
>>
>> -update the both components (stm_class/coresight) of hwtracing from
>> recent kernel in our old kernel.
>> -integrate the on-going ftrace patch (it was the version 6)
>>
>>
> So happy to know you have been following the progress of this patch
> series, Steven Rostedt has included these for next merge window, it's
> supposed to be merged into 4.10.
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> <http://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.gitfor-n…>
> for-next
>
>>
>> One difference with the Linaro usage that your team usually describes is
>> the capture way, instead to use the target itself
>> we configure the stm to tpiu directly (skip ETF path) and use an external
>> probe to capture the trace (Lauterbach tool),
>> (to cover a long trace session, get the trace for kernel
>> crash/deadlock...)
>>
>
> As an assignee from Spreadtrum, I believe that Spreadtrum also would need
> this functionality.
>
Hello Nicolas,
First and foremost congratulation on the very good integration work. I
have been adamant on that point many times before and today won't be
different - ST has really good tracing technology and knowledge. You guys
have been working on this for a very long time and the results are there.
Au plaisir,
Mathieu
>
>> Here is a view of the T32 output with 2 masters (Cortex A7 and Cortex
>> M3), and 2 STM client for A7 part (Kernel log and FTRACE)
>>
> That's amazing, but I haven't seen the snapshot you mentioned here :)
>
>
>> this snaphot is not the last version, now the Timestamp are correctly
>> handled and the differentiation between the both A7 CPUs has been deported
>> on STMchannel due to a regression of our SOC
>> (our SOC didn't implement correctly the AHB link between the both A7
>> master to STM, so I used the even channel for A7_0 and odd channel for
>> A7_1, it was more or less the only modification from your patch)
>>
>>
>> Thanks for sharing,
> Chunyan
>
>>
>> *Great Job for all this coresight trace development!*
>>
>>
>> br
>>
>> Nicolas
>>
>>
>>
>> On 12/08/2016 08:30 AM, Chunyan Zhang wrote:
>>
>>
>>
>> On 8 December 2016 at 15:04, Nicolas GUION <nicolas.guion(a)st.com> wrote:
>>
>>> Hi Chunyan,
>>>
>>> Are you sure that you pointed the correct Nicolas, cause I'm really far
>>> to know the Dragonboard 410c board?
>>>
>>
>> Ah, my mistake, thanks for telling me :)
>>
>> Chunyan
>>
>>
>>> I'm working in STMicroelectronics and not usual with other boards than
>>> ST ones.
>>>
>>> br
>>> Nicolas
>>>
>>>
>>> On 12/08/2016 07:24 AM, Chunyan Zhang wrote:
>>>
>>> Hi Nicolas,
>>>
>>> I noticed on 96boards forum, some person reported a similar problem "*Dragonboard
>>> not working after failed linux instalation*" [1] which has been
>>> annoying me recently.
>>>
>>> I posted some details on that page the day before yesterday. Could you
>>> give me some suggestion on how to retrieve my Dragon board?
>>>
>>> Many thanks,
>>> Chunyan
>>>
>>>
>>> [1] http://www.96boards.org/forums/topic/dragonboard-not-working
>>> -after-failed-linux-instalation/#post-18901&gsc.tab=0
>>>
>>>
>>>
>>
>>
>
Hi,
could somebody help me understand why the total size of the
recorded ETM trace differs from run to run?
Is this something in my Juno machine setup, or do you also see this?
The maximum length that has been recorded is about 6MB on my setup.
When I am recording the trace of the same program on intel-pt:
$ perf record -e intel_pt//u ./sort
the amount of captured data is deterministic (around 296MB +/- a few KB.)
Thanks,
Sebastian
sort.c is from https://gcc.gnu.org/wiki/AutoFDO/Tutorial
+ gcc sort.c -o sort -O3
++ seq 1 10
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
7779 ms
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.610 MB perf.data ]
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
7789 ms
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.545 MB perf.data ]
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
7797 ms
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.353 MB perf.data ]
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
5949 ms
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.353 MB perf.data ]
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
7807 ms
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 3.287 MB perf.data ]
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
7772 ms
[ perf record: Woken up 3 times to write data ]
Warning:
AUX data lost 2 times out of 4!
[ perf record: Captured and wrote 0.126 MB perf.data ]
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
7811 ms
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data ]
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
7784 ms
[ perf record: Woken up 3 times to write data ]
Warning:
AUX data lost 2 times out of 4!
[ perf record: Captured and wrote 0.619 MB perf.data ]
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
7770 ms
[ perf record: Woken up 2 times to write data ]
Warning:
AUX data lost 1 times out of 1!
[ perf record: Captured and wrote 0.002 MB perf.data ]
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
5934 ms
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.236 MB perf.data ]
Hi Mathieu,
perf/Documentation/intel-pt.txt describes how to make autoFDO work
with Intel-PT recorded traces:
# perf record -e intel_pt//u ./sort
# perf inject -i perf.data -o inj --itrace=i100usl --strip
# create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1
# gcc -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
On the ARM side, I was able to get an ETM trace, and I started working
with my colleague Brian Rzycki on the second step that translates the trace
into branch events.
Attached is the current state of the patch that adds functionality
from intel-pt.c
to cs-etm.c. We are still trying to get more than one branch recorded in the
branch stack before emitting an event, and it looks like what we need is to
decode more than a packet at a time in cs-etm-decoder.c like in
cs_etm_decoder__buffer_packet()
Comments on the early version of the patch are welcome.
Thanks,
Sebastian
Changed openCSD library version number from v0.4.2 to v0.5 and pumped
the kernel version to v4.9.
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
---
HOWTO.md | 36 ++++++++++++++++++------------------
1 file changed, 18 insertions(+), 18 deletions(-)
diff --git a/HOWTO.md b/HOWTO.md
index 239a2cd194c3..6f768ed8a72c 100644
--- a/HOWTO.md
+++ b/HOWTO.md
@@ -7,7 +7,7 @@ This HOWTO explains how to use the perf cmd line tools and the openCSD
library to collect and extract program flow traces generated by the
CoreSight IP blocks on a Linux system. The examples have been generated using
an aarch64 Juno-r0 platform. All information is considered accurate and tested
-using library version v0.4.2 and the latest perf branch `perf-opencsd-4.8`
+using library version v0.5 and the latest perf branch `perf-opencsd-4.9`
on the [OpenCSD github repository][1].
@@ -15,8 +15,8 @@ On Target Trace Acquisition - Perf Record
-----------------------------------------
All the enhancement to the Perf tools that support the new `cs_etm` pmu have
not been upstreamed yet. To get the required functionality branch
-`perf-opencsd-4.8` needs to be downloaded to the target system where
-traces are to be collected. This branch is an upstream v4.8 kernel
+`perf-opencsd-4.9` needs to be downloaded to the target system where
+traces are to be collected. This branch is an upstream v4.9 kernel
supplemented with modifications to the CoreSight framework and drivers to be
usable by the Perf core. The remaining out of tree patches are being
upstreamed incrementally.
@@ -261,14 +261,14 @@ the host's (which has nothing to do with the target) architecture:
Off Target Perf Tools Compilation
---------------------------------
As stated above not all the pieces of the solution have been upstreamed. To
-get all the components branch `perf-opencsd-4.8` needs to be
+get all the components branch `perf-opencsd-4.9` needs to be
obtained:
- linaro@t430:~/linaro/coresight$ git clone -b perf-opencsd-4.8 https://github.com/Linaro/OpenCSD.git perf-opencsd-4.8
+ linaro@t430:~/linaro/coresight$ git clone -b perf-opencsd-4.9 https://github.com/Linaro/OpenCSD.git perf-opencsd-4.9
...
...
- linaro@t430:~/linaro/coresight$ ls perf-opencsd-4.8/
+ linaro@t430:~/linaro/coresight$ ls perf-opencsd-4.9/
arch certs CREDITS Documentation firmware include ipc Kconfig lib Makefile net REPORTING-BUGS scripts sound usr
block COPYING crypto drivers fs init Kbuild kernel MAINTAINERS mm README samples security tools virt
@@ -279,12 +279,12 @@ successful, but handling of CoreSight trace data won't be supported.
**See perf-test-scripts below for assistance in creating a build and test enviroment.**
- linaro@t430:~/linaro/coresight$ cd perf-opencsd-4.8
- linaro@t430:~/linaro/coresight/perf-opencsd-4.8$ export CSTRACE_PATH=~/linaro/coresight/my-opencsd/decoder
- linaro@t430:~/linaro/coresight/perf-opencsd-4.8$ make -C tools/perf
+ linaro@t430:~/linaro/coresight$ cd perf-opencsd-4.9
+ linaro@t430:~/linaro/coresight/perf-opencsd-4.9$ export CSTRACE_PATH=~/linaro/coresight/my-opencsd/decoder
+ linaro@t430:~/linaro/coresight/perf-opencsd-4.9$ make -C tools/perf
...
...
- linaro@t430:~/linaro/coresight/perf-opencsd-4.8$ ls -l tools/perf/perf
+ linaro@t430:~/linaro/coresight/perf-opencsd-4.9$ ls -l tools/perf/perf
-rwxrwxr-x 1 linaro linaro 6276360 Mar 3 10:05 tools/perf/perf
@@ -323,7 +323,7 @@ to be sure everything is clean.
linaro@t430:~/linaro/coresight/sept20$ rm -rf ~/.debug
linaro@t430:~/linaro/coresight/sept20$ cp -dpR .debug ~/
linaro@t430:~/linaro/coresight/sept20$ export LD_LIBRARY_PATH=~/linaro/coresight/my-opencsd/decoder/lib/linux64/dbg/
- linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-4.8/tools/perf/perf report --stdio
+ linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-4.9/tools/perf/perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
@@ -367,7 +367,7 @@ to be sure everything is clean.
Additional data can be obtained, which contains a dump of the trace packets received using the command
- mjl@ubuntu-vbox:./perf-opencsd-4.8/coresight/tools/perf/perf report --stdio --dump
+ mjl@ubuntu-vbox:./perf-opencsd-4.9/coresight/tools/perf/perf report --stdio --dump
resulting a large amount of data, trace looking like:-
@@ -416,10 +416,10 @@ Trace Decoding with Perf Script
Working with perf scripts needs more command line options but yields
interesting results.
- linaro@t430:~/linaro/coresight/sept20$ export EXEC_PATH=/home/linaro/coresight/perf-opencsd-4.8/tools/perf/
+ linaro@t430:~/linaro/coresight/sept20$ export EXEC_PATH=/home/linaro/coresight/perf-opencsd-4.9/tools/perf/
linaro@t430:~/linaro/coresight/sept20$ export SCRIPT_PATH=$EXEC_PATH/scripts/python/
linaro@t430:~/linaro/coresight/sept20$ export XTOOL_PATH=/your/aarch64/toolchain/path/bin/
- linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-4.8/tools/perf/perf --exec-path=${EXEC_PATH} script --script=python:${SCRIPT_PATH}/cs-trace-disasm.py -- -d ${XTOOL_PATH}/aarch64-linux-gnu-objdump
+ linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-4.9/tools/perf/perf --exec-path=${EXEC_PATH} script --script=python:${SCRIPT_PATH}/cs-trace-disasm.py -- -d ${XTOOL_PATH}/aarch64-linux-gnu-objdump
7f89f24d80: 910003e0 mov x0, sp
7f89f24d84: 94000d53 bl 7f89f282d0 <free@plt+0x3790>
@@ -451,18 +451,18 @@ Kernel Trace Decoding
When dealing with kernel space traces the vmlinux file has to be communicated
explicitely to perf using the "--vmlinux" command line option:
- linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-4.8/tools/perf/perf report --stdio --vmlinux=./vmlinux
+ linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-4.9/tools/perf/perf report --stdio --vmlinux=./vmlinux
...
...
- linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-4.8/tools/perf/perf script --vmlinux=./vmlinux
+ linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-4.9/tools/perf/perf script --vmlinux=./vmlinux
When using scripts things get a little more convoluted. Using the same example
an above but for traces but for kernel traces, the command line becomes:
- linaro@t430:~/linaro/coresight/sept20$ export EXEC_PATH=/home/linaro/coresight/perf-opencsd-4.8/tools/perf/
+ linaro@t430:~/linaro/coresight/sept20$ export EXEC_PATH=/home/linaro/coresight/perf-opencsd-4.9/tools/perf/
linaro@t430:~/linaro/coresight/sept20$ export SCRIPT_PATH=$EXEC_PATH/scripts/python/
linaro@t430:~/linaro/coresight/sept20$ export XTOOL_PATH=/your/aarch64/toolchain/path/bin/
- linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-4.8/tools/perf/perf --exec-path=${EXEC_PATH} script \
+ linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-4.9/tools/perf/perf --exec-path=${EXEC_PATH} script \
--vmlinux=./vmlinux \
--script=python:${SCRIPT_PATH}/cs-trace-disasm.py -- \
-d ${XTOOLS_PATH}/aarch64-linux-gnu-objdump \
--
2.7.4