This patchset adds support for CPU-wide trace scenarios and as such, it is
now possible to issue the following commands:
# perf record -e cs_etm/(a)20070000.etr/ -C 2,3 $COMMAND
# perf record -e cs_etm/(a)20070000.etr/ -a $COMMAND
The above will trace all instructions executed by a given processor for as
long as $COMMAND hasn't returned. The solution is designed to work for
both 1:1 and N:1 source/sink topologies, though the former hasn't been
tested for lack of access to HW.
Most of the changes revolve around allowing more than one event to use
a sink when operated from perf. More specifically the first event to
use a sink switches it on while the last one is tasked to aggregate traces
and switching off the device.
This is the kernel part of the solution, with the user space portion to be
released in a separate set. All the patches have been rebased on
yesterday's linux next and hosted here[1]. Everything has been tested on
Juno. I have not CC'ed the kernel mailing list because of the ongoing
merge window.
Review and comments would be most appreciated.
Regards,
Mathieu
[1]. https://git.linaro.org/people/mathieu.poirier/coresight.git/log/?h=next-201…
Mathieu Poirier (20):
coresight: pmu: Adding ITRACE property to cs_etm PMU
coresight: etm4x: Add kernel configuration for CONTEXTID
coresight: etm4x: Configure tracers to emit timestamps
coresight: Adding return code to sink::disable() operation
coresight: Move reference counting inside sink drivers
coresight: Refactor sink::disable() functions
coresight: Refactor sink::update() functions
coresight: perf: Refactor function etm_setup_aux()
coresight: perf: Refactor function free_event_data()
coresight: Introduce the notion of process ID to the framework
coresight: tmc-etr: Refactor function tmc_etr_setup_perf_buf()
coresight: tmc-etr: Introduce the notion of process ID to ETR devices
coresight: tmc-etr: Allow events to use the same ETR buffer
coresight: tmc-etr: Add support for CPU-wide trace scenarios
coresight: tmc-etf: Add support for CPU-wide trace scenarios
coresight: etb10: Add support for CPU-wide trace scenarios
coresight: Refactor sink::alloc_buffer() functions
coresight: Add function coresight_sink_is_shared()
coresight: tmc-etr: Make ETR aware of topology
coresight: Use event->cpu to determine session type
drivers/hwtracing/coresight/coresight-etb10.c | 79 +++++-
.../hwtracing/coresight/coresight-etm-perf.c | 47 +++-
drivers/hwtracing/coresight/coresight-etm4x.c | 114 +++++++-
drivers/hwtracing/coresight/coresight-priv.h | 1 +
.../hwtracing/coresight/coresight-tmc-etf.c | 84 ++++--
.../hwtracing/coresight/coresight-tmc-etr.c | 265 +++++++++++++++---
drivers/hwtracing/coresight/coresight-tmc.c | 4 +
drivers/hwtracing/coresight/coresight-tmc.h | 11 +
drivers/hwtracing/coresight/coresight-tpiu.c | 9 +-
drivers/hwtracing/coresight/coresight.c | 53 +++-
include/linux/coresight-pmu.h | 2 +
include/linux/coresight.h | 8 +-
tools/include/linux/coresight-pmu.h | 2 +
13 files changed, 568 insertions(+), 111 deletions(-)
--
2.17.1
Hi,
Can I send you a sample of one of our B2B email list based on your
requirement?
Data Fields includes: Contact name, Company name, Job Title, Website,
Industry, SIC Code, Email address, Direct mail address, Telephone number,
Revenue Size, Employee Size, etc.
Kindly just share your requirements by filling in the below table:
Industries : _____________? (Any Industry)
Job Titles : _____________? (Any Titles)
Geography: _____________? (Any Location)
I'll come up with the data counts, costs & few sample contacts for your
review.
Awaiting your response,
Best Regards
Leslie Atkins
Data Analyst
Starting with the v5.1 kernel cycle compiling the perf tools (on and off
target) requires the addition of a new CORESIGHT=1 command line flag.
See the following commit for details:
1c3b28fd7ae8 ("perf coresight: Do not test for libopencsd by default")
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
---
HOWTO.md | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/HOWTO.md b/HOWTO.md
index 551b085c9c78..3b93b3d392aa 100644
--- a/HOWTO.md
+++ b/HOWTO.md
@@ -21,10 +21,10 @@ supplemented with modifications to the CoreSight framework and drivers to be
usable by the Perf core. The remaining out of tree patches are being
upstreamed incrementally.
-From there compiling the perf tools with `make -C tools/perf` will yield a
-`perf` executable that will support CoreSight trace collection. Note that if
-traces are to be decompressed *off* target, there is no need to download and
-compile the openCSD library (on the target).
+From there compiling the perf tools with `make -C tools/perf CORESIGHT=1` will
+yield a `perf` executable that will support CoreSight trace collection. Note
+that if traces are to be decompressed *off* target, there is no need to download
+and compile the openCSD library (on the target).
Before launching a trace run a sink that will collect trace data needs to be
identified. All CoreSight blocks identified by the framework are registed in
@@ -306,7 +306,7 @@ and needs to be installed on a system prior to compilation. Information about
the status of the openCSD library on a system is given at compile time by the
perf tools build script:
- linaro@t430:~/linaro/linux-kernel$ make VF=1 -C tools/perf
+ linaro@t430:~/linaro/linux-kernel$ make CORESIGHT=1 VF=1 -C tools/perf
Auto-detecting system features:
... dwarf: [ on ]
... dwarf_getlocations: [ on ]
--
2.17.1
Hi,
The OCSD_INSTR_WFI_WFE instruction sub-type is added to the library
headers from version 0.11.0 of openCSD to support later ETMv4
versions.
This does require an update to the perf code in cs-etm-decoder.c to
add this value into the handling code in the default part of the case
statement - e.g.:-
case OCSD_INSTR_ISB:
case OCSD_INSTR_DSB_DMB:
+ case OCSD_INSTR_WFI_WFE:
case OCSD_INSTR_OTHER:
default:
The perf-opencsd master branch has not had an update to cover this yet.
The present perf decode does not use this value. If you do not
specifically need ETMv4.3 support for authenticated pointer trace then
it is safe to use the latest v0.10.x decoder with the perf-opencsd.
Otherwise you will need to patch locally till a patch is made
available in the repository, or the upstream perf supports the later
OpenCSD.
Regards
Mike
On Tue, 19 Mar 2019 at 07:46, Solomon <notifications(a)github.com> wrote:
>
> After installing the OpenCSD library, I tried to compile perf from perf-opencsd. I used the command make VF=1 -C tools/perf. However, I got the following error:
>
> CC util/intel-pt-decoder/intel-pt-log.o
>
> CC util/cs-etm-decoder/cs-etm-decoder.o
>
> util/cs-etm-decoder/cs-etm-decoder.c: In function ‘cs_etm_decoder__buffer_range’:
>
> util/cs-etm-decoder/cs-etm-decoder.c:370:2: error: enumeration value ‘OCSD_INSTR_WFI_WFE’ not handled in switch [-Werror=switch-enum]
>
> switch (elem->last_i_type) {
>
> ^~~~~~
>
> CC util/intel-pt-decoder/intel-pt-decoder.o
>
> cc1: all warnings being treated as errors
>
> Has anyone had the same issue before?
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub, or mute the thread.
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
Minor update to library released - build fixes.
Fixes issue with Debian build on Sparc. See README for details.
Regards
Mike
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
Hi Mathieu,
Apologies if mailman does not see this as a reply. I'm not sure if Outlook handles In-Reply-To properly.
I'm testing CPU-wide tracing on Zynq Ultrascale+ MPSoC and I have some comments I'd like to share.
Some introduction at first:
- I'm using mainline Linux from a couple of days ago (12ad143e1b80 Merge branch 'perf-urgent-for-linus'...)
- on top of it I have a couple of my changes introducing CoreSight support on US+
- on top of this I cherry-picked your two patch sets with CPU-wide tracing
I prepared a test program that's supposed to generate deterministic trace. I created a function that should,
depending on the argument, create either continuous E atoms or E/N atoms alternately. In main() I spawn
two threads with affinity attributes:
- the first thread is set up as atom E generator, pinned to CPU1
- the other as E/N generator, pinned to CPU2
The main thread is pinned to CPU0.
The atom generator function's body looks like below. If *atom == 'n', branch is not taken, thus atom N should
be generated, and if *atom == 'e', branch is taken and atom E should be generated. After that, another atom
E is expected, since the while loop branches back to the start. It's counter-intuitive when you look at the C code,
but the if-condition is actually evaluated to b.ne instruction, so getting inside the condition happens when the branch
is not taken.
volatile int sum = 0;
while (1) {
// Reference by pointer, so it's not optimized out.
if (*atom == 'n') // compiler creates b.ne here
sum += 0xdeadbeef * (*atom + 1);
}
Here are my observations:
1. -C option works well. I run perf with:
# perf record -e cs_etm/(a)fe940000.etf1/u -C1 ./atom_gen
In perf report I can see lots of E atoms in packets marked with ID:12. If I collect trace with -C2 instead,
I see E/N atoms in packets with ID:14. Everything works as expected each time I trace this application.
2. -a option works unreliable. I run perf with:
# perf record -e cs_etm/(a)fe940000.etf1/u -a ./atom_gen
What I expect is perf.data containing similar output to what I got with -C1 plus what I got with -C2, i.e. ID:12
Atom E packets and ID:14 atom E/N packets. What actually happens is inconsistent each time I try this command.
Sometimes I have no atom packets associated with IDs 12 and 14 but I have some with ID:16. Sometimes I get
ID:14 atoms but no ID:12. Sometimes I get expected trace but still some noise in ID:16 packets, which I would
not expect at all, since the program schedules nothing on CPU3. I wonder if I'm missing something here in my
understanding of CoreSight. Is this behaviour expected?
3. I'm not able to use filters.
I'd like to narrow down tracing to the while(1) loop in trace generator, to filter out noise from other instructions.
However, I find it impossible to use --filter flag along with -C or -a:
# perf record -e cs_etm/(a)fe940000.etf1/u --filter 'filter atom_n_generator @./atom_gen' -a ./atom_gen
failed to set filter "filter 0x90c/0x8c@/root/atom_gen" on event cs_etm/(a)fe940000.etf1/u with 95 (Operation not supported)
It works fine with --per-thread. Is the behaviour expected, or is this a bug?
4. Kernel crashes if used with no -a, -C or --per-thread.
If I call perf with:
# perf record -e cs_etm/(a)fe940000.etf1/u ./atom_gen
I can see some printfs from the program, but immediately kernel gets NULL pointer dereference.
Please find a log below. My serial connection drops characters sometimes, sorry for that.
The crash happens in tmc_enable_etf_sink+0x90, which is:
/* Get a handle on the pid of the process to monitor */
if (handle->event->owner)
pid = task_pid_nr(handle->event->owner);
The handle->event->owner seems to be NULL.
[ 1313.650726Unable to handle kernel NULL pointer dereference at virtual adess 00000000000003b8
[ 1313.659501] Mem abort info:
[ 1313.662281] ESR = 0x96000006
[ 1313.665320] Exption class = DABT (current EL), IL = 32 bits
[ 1313.671232] SET = 0, FnV = 0
[ 1313.674277] EA = 0, S1PTW = 0
[ 1313.677401] Data abort info:
[ 1313.680266] ISV = 0, ISS =x00000006
[ 1313.684085] CM = 0, WnR = 0
[ 1313.687039] us pgtable: 4k pages, 39-bit VAs, pgdp = 000000003b61a770
[ 1313.693644] [00000000000003b8] pgd=000000006c6da003, pud=0000006c6da003, pmd=0000000000000000
[ 1313.702336] Internal err: Oops: 96000006 [#1] SMP
[ 1313.707201] Modules linked in:
[ 1313.710250] CPU: 1 PID: 3255 Comm: multithread-two N tainted 5.0.0-10411-g66431e6376c4-dirty #26
[ 1313.719200] Hdware name: ZynqMP ZCU104 RevA (DT)
[ 1313.723981] pstate: 20000085 (nzCv daIf -PAN -UAO)
[ 1313.728770] pc : tmc_enle_etf_sink+0x90/0x3b0
[ 1313.733286] lr : tmc_enable_etf_sin0x64/0x3b0
[ 1313.737806] sp : ffffff8011263b40
[ 1313.741104] x29: ffffff8011263b40 x28: 0000000000000000
[ 1313.6409] x27: 0000000000000000 x26: ffffffc06d4ce180
[ 1313.7512] x25: 0000000000000001 x24: ffffffc06faa4ce0
[ 1313.757015] x23: 0000000000000002 x22: 0000000000000080
[ 1313.7319] x21: ffffffc06faa4ce0 x20: ffffffc06cf07c00
[ 1313.7676] x19: ffffffc06d560e80 x18: 0000000000000000
[ 1313.772926] x17: 0000000000000000 x16: 0000000000000000
[ 1313.7729] x15: 0000000000000000 x14: ffffff8010879388
[ 1313.78353 x13: 0000000000000000 x12: 0000000002e8fc00
[ 1313.788836] x11: 0000000000000000 x10: 00000000000007f0
[ 1313.7940] x9 : 0000000000000000 x8 : 0000000000000000
[ 1313.799443x7 : 0000000000000030 x6 : ffffffc06c279030
[ 1313.804747] x5 : 0000000000000030 x4 : 0000000000000002
[ 1313.8100] x3 : ffffffc06d560ee8 x2 : 0000000000000001
[ 1313.815354]1 : 0000000000000000 x0 : 0000000000000000
[ 1313.820659] Process multithread-two (pid: 3255, stack limit = 0x00000073629f1e)
[ 1313.828133] Call trace:
[ 1313.830571] tmc_enae_etf_sink+0x90/0x3b0
[ 1313.834748] coresight_enable_path+0xe4/0x1f8
[ 1313.839096] etm_event_start+0x8c/0x120
[313.842923] etm_event_add+0x38/0x58
[ 1313.846492] event_scd_in.isra.61.part.62+0x94/0x1b0
[ 1313.851620] group_sched_in+0xa0/0x1c8
[ 1313.855360] flexible_sched_in+0xac/0x1
[ 1313.859364] visit_groups_merge+0x144/0x1f8
[ 1313.86353 ctx_sched_in.isra.39+0x128/0x138
[ 1313.867887] perf_event_sched_in.isra.41+0x54/0x80
[ 1313.872669] __perf_eventask_sched_in+0x16c/0x180
[ 1313.877540] finish_task_switch+104/0x1d8
[ 1313.881715] schedule_tail+0xc/0x98
[ 1313.885195] ret_from_fork+0x4/0x18
[ 1313.888677] Code: 540016 f9001bb7 f94002a0 f9414400 (b943b817)
[ 1313.894761] ---[ e trace 99bb09dc83a83a1a ]---
Best regards,
Wojciech
Hi,
(+coresight mailing lists.)
Looked at this - -fpic is supposed to generate smaller code then -fPIC.
That said, I've tried both variants for x86_64 and aarch64 builds:
x86_64 showed no change, (gcc 5.4)
cross compiled aarch64 code was 0.45% smaller using -fpic rather than
-fPIC. (gcc 6.2)
native compiled aarch64 code showed no change (gcc 4.9)
While we could add some code to the makefile to dynamically change the
-fPIC/pic option when building on sparc architectures, unless there are
objections on the mailing list, I propose to change to -fPIC across the
board at this point.
This will be released as a 0.11.1 patch (along with another minor build
fix.)
Regards
Mike
On Wed, 13 Mar 2019 at 08:50, John Paul Adrian Glaubitz <
notifications(a)github.com> wrote:
> I have just tested this on sparc64 and can confirm that replacing -fpic
> with -fPIC fixes the issue for me.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/Linaro/OpenCSD/issues/16#issuecomment-472332457>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AMvwsxbzERGcBbzJECyGHDUxn…>
> .
>
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
Version v0.11.0 release of OpenCSD library.
* ETMv4 support updated to cover all ETM versions up to v4.4 (from
v4.1 previously). This covers the latest v8.4 arch cores.
* Updated memory callback function to pass trace ID to client when
requesting memory data.
Allows client to determine source CPU for the request and return
memory images accordingly.
(required for upcoming work on perf support.).
* Other minor fixes and updates.
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK