This patchset adds support for CoreSight CPU-wide trace scenarios. More
specifically it extends the work that was done for per thread scenarios to
handle more than a single trace ID. It also temporally correlate traces
based on timestamp generated by the tracers so that rendering by the perf
mechanic is ordered.
Everything is based on Arnaldo's perf/core branch (46d4c9a05285). I will
send another revision when it is rebased to a 5.2 rc candidate.
Before this set:
# root@juno:/home/linaro# perf record -e cs_etm/(a)20070000.etr/ -C 2,3 sleep 1
failed to mmap with 12 (Cannot allocate memory)
After this set:
# root@juno:/home/linaro# perf record -e cs_etm/(a)20070000.etr/ -C 2,3 sleep 1
[ perf record: Captured and wrote 1.352 MB perf.data ]
Regards,
Mathieu
Changes for V2:
* Fixed error condition in function cs_etm_set_option() (Leo)
* Fixed changelog spelling error (Leo).
* Moved from calloc() to malloc() in cs_etm__etmq_get_traceid_queue()
* Got rid of CS_ETM_PACKET_QUEUE_NR macro
* Fixed indentation problem in function cs_etm__process_traceid_queue() (Leo).
Mathieu Poirier (17):
perf tools: Configure contextID tracing in CPU-wide mode
perf tools: Configure timestsamp generation in CPU-wide mode
perf tools: Configure SWITCH_EVENTS in CPU-wide mode
perf tools: Add handling of itrace start events
perf tools: Add handling of switch-CPU-wide events
perf tools: Refactor error path in cs_etm_decoder__new()
perf tools: Move packet queue out of decoder structure
perf tools: Fix indentation in function
cs_etm__process_decoder_queue()
perf tools: Introduce the concept of trace ID queues
perf tools: Get rid of unused cpu in struct cs_etm_queue
perf tools: Move thread to traceid_queue
perf tools: Move tid/pid to traceid_queue
perf tools: Use traceID aware memory callback API
perf tools: Add support for multiple traceID queues
perf tools: Linking PE contextID with perf thread mechanic
perf tools: Add notion of time to decoding code
perf tools: Add support for CPU-wide trace scenarios
tools/perf/Makefile.config | 3 +
tools/perf/arch/arm/util/cs-etm.c | 186 ++-
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 269 +++--
.../perf/util/cs-etm-decoder/cs-etm-decoder.h | 39 +-
tools/perf/util/cs-etm.c | 1026 +++++++++++++----
tools/perf/util/cs-etm.h | 103 ++
6 files changed, 1252 insertions(+), 374 deletions(-)
--
2.17.1
Update the documentation to reflect the new naming scheme with
latest changes.
Reported-by: Leo Yan <leo.yan(a)linaro.org>
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
---
Documentation/trace/coresight.txt | 34 +++++++++++++++++++---------------
1 file changed, 19 insertions(+), 15 deletions(-)
diff --git a/Documentation/trace/coresight.txt b/Documentation/trace/coresight.txt
index efbc832..7b427cf 100644
--- a/Documentation/trace/coresight.txt
+++ b/Documentation/trace/coresight.txt
@@ -326,16 +326,20 @@ amount of processor cores), the "cs_etm" PMU will be listed only once.
A Coresight PMU works the same way as any other PMU, i.e the name of the PMU is
listed along with configuration options within forward slashes '/'. Since a
Coresight system will typically have more than one sink, the name of the sink to
-work with needs to be specified as an event option. Names for sink to choose
-from are listed in sysFS under ($SYSFS)/bus/coresight/devices:
+work with needs to be specified as an event option.
+On newer kernels the available sinks are listed in sysFS under:
+($SYSFS)/bus/event_source/devices/cs_etm/sinks/
- root@linaro-nano:~# ls /sys/bus/coresight/devices/
- 20010000.etf 20040000.funnel 20100000.stm 22040000.etm
- 22140000.etm 230c0000.funnel 23240000.etm 20030000.tpiu
- 20070000.etr 20120000.replicator 220c0000.funnel
- 23040000.etm 23140000.etm 23340000.etm
+ root@localhost:/sys/bus/event_source/devices/cs_etm/sinks# ls
+ tmc_etf0 tmc_etr0 tpiu0
- root@linaro-nano:~# perf record -e cs_etm/(a)20070000.etr/u --per-thread program
+On older kernels, this may need to be found from the list of coresight devices,
+available under ($SYSFS)/bus/coresight/devices/:
+
+ root@localhost:/sys/bus/coresight/devices# ls
+ etm0 etm1 etm2 etm3 etm4 etm5 funnel0 funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0
+
+ root@linaro-nano:~# perf record -e cs_etm/@tmc_etr0/u --per-thread program
The syntax within the forward slashes '/' is important. The '@' character
tells the parser that a sink is about to be specified and that this is the sink
@@ -352,7 +356,7 @@ perf can be used to record and analyze trace of programs.
Execution can be recorded using 'perf record' with the cs_etm event,
specifying the name of the sink to record to, e.g:
- perf record -e cs_etm/(a)20070000.etr/u --per-thread
+ perf record -e cs_etm/@tmc_etr0/u --per-thread
The 'perf report' and 'perf script' commands can be used to analyze execution,
synthesizing instruction and branch events from the instruction trace.
@@ -381,7 +385,7 @@ sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tuto
Bubble sorting array of 30000 elements
5910 ms
- $ perf record -e cs_etm/(a)20070000.etr/u --per-thread taskset -c 2 ./sort
+ $ perf record -e cs_etm/@tmc_etr0/u --per-thread taskset -c 2 ./sort
Bubble sorting array of 30000 elements
12543 ms
[ perf record: Woken up 35 times to write data ]
@@ -405,7 +409,7 @@ than the program flow through the code.
As with any other CoreSight component, specifics about the STM tracer can be
found in sysfs with more information on each entry being found in [1]:
-root@genericarmv8:~# ls /sys/bus/coresight/devices/20100000.stm
+root@genericarmv8:~# ls /sys/bus/coresight/devices/stm0
enable_source hwevent_select port_enable subsystem uevent
hwevent_enable mgmt port_select traceid
root@genericarmv8:~#
@@ -413,14 +417,14 @@ root@genericarmv8:~#
Like any other source a sink needs to be identified and the STM enabled before
being used:
-root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20010000.etf/enable_sink
-root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20100000.stm/enable_source
+root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/tmc_etf0/enable_sink
+root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/stm0/enable_source
From there user space applications can request and use channels using the devfs
interface provided for that purpose by the generic STM API:
-root@genericarmv8:~# ls -l /dev/20100000.stm
-crw------- 1 root root 10, 61 Jan 3 18:11 /dev/20100000.stm
+root@genericarmv8:~# ls -l /dev/stm0
+crw------- 1 root root 10, 61 Jan 3 18:11 /dev/stm0
root@genericarmv8:~#
Details on how to use the generic STM API can be found here [2].
--
2.7.4
CTIs are defined in the device tree and associated with other CoreSight
devices. The core CoreSight code has been modified to enable the registration
of the CTI devices on the same bus as the other CoreSight components,
but as these are not actually trace generation / capture devices, they
are not part of the Coresight path when generating trace.
However, the definition of the standard CoreSight device has been extended
to include a reference to an associated CTI device, and the enable / disable
trace path operations will auto enable/disable any associated CTI devices at
the same time.
Programming is at present via sysfs - a full API is provided to utilise the
hardware capabilities. As CTI devices are unprogrammed by default, the auto
enable describe above will have no effect until explicit programming takes
place.
A set of device tree bindings specific to the CTI topology has been defined.
Documentation has been updated to describe both the CTI hardware, its use and
programming in sysfs, and the new dts bindings required.
Tested on DB410 board, 5.1-rc5
Changes since v1:
1) Significant restructuring of the source code. Adds cti-sysfs file and
cti device tree file. Patches add per feature rather than per source
file.
2) CPU type power event handling for hotplug moved to CoreSight core,
with generic registration interface provided for all CPU bound CS devices
to use.
3) CTI signal interconnection details in sysfs now generated dynamically
from connection lists in driver. This to fix issue with multi-line sysfs
output in previous version.
4) Full device tree bindings for DB410 and Juno provided (to the extent
that CTI information is available).
5) AMBA driver update for UCI IDs are now upstream so no longer included
in this set.
Mike Leach (13):
drivers: coresight: cti: Initial CoreSight CTI Driver
drivers: coresight: cti: Adds sysfs functionality to CTI driver.
drivers: coresight: cti: Add device tree support for v8 arch CTI
drivers: coresight: cti: Add device tree support for impdef CTI.
drivers: coresight: cti: Enable CTI associated with devices.
drivers: coresight: cti: Add connection information to sysfs
drivers: coresight: cti: Add CoreSight cpu power notifications.
devicetree: bindings: Documentation for CTI bindings.
devicetree: bindings: Add header file with CTI trigger signal type
constants.
drivers: dts: Add CTI options for qcom msm8916
drivers: dts: Juno platform - add CTI entries to device tree.
docs: coresight: Update documentation for CoreSight to cover CTI.
docs: sysfs: coresight: Add sysfs documentation for CTI
.../testing/sysfs-bus-coresight-devices-cti | 225 +++
.../bindings/arm/coresight-ect-cti.txt | 203 +++
.../devicetree/bindings/arm/coresight.txt | 7 +
Documentation/trace/coresight.txt | 139 ++
arch/arm64/boot/dts/arm/juno-base.dtsi | 149 +-
arch/arm64/boot/dts/arm/juno-cs-r1r2.dtsi | 31 +-
arch/arm64/boot/dts/arm/juno-r1.dts | 25 +
arch/arm64/boot/dts/arm/juno-r2.dts | 25 +
arch/arm64/boot/dts/arm/juno.dts | 25 +
arch/arm64/boot/dts/qcom/msm8916.dtsi | 102 +-
drivers/hwtracing/coresight/Kconfig | 13 +
drivers/hwtracing/coresight/Makefile | 4 +
.../hwtracing/coresight/coresight-cti-sysfs.c | 1250 +++++++++++++++++
drivers/hwtracing/coresight/coresight-cti.c | 853 +++++++++++
drivers/hwtracing/coresight/coresight-cti.h | 280 ++++
drivers/hwtracing/coresight/coresight-priv.h | 37 +
drivers/hwtracing/coresight/coresight.c | 185 ++-
.../hwtracing/coresight/of_coresight-cti.c | 447 ++++++
include/dt-bindings/arm/coresight-cti-dt.h | 36 +
include/linux/coresight.h | 30 +
20 files changed, 4056 insertions(+), 10 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-coresight-devices-cti
create mode 100644 Documentation/devicetree/bindings/arm/coresight-ect-cti.txt
create mode 100644 drivers/hwtracing/coresight/coresight-cti-sysfs.c
create mode 100644 drivers/hwtracing/coresight/coresight-cti.c
create mode 100644 drivers/hwtracing/coresight/coresight-cti.h
create mode 100644 drivers/hwtracing/coresight/of_coresight-cti.c
create mode 100644 include/dt-bindings/arm/coresight-cti-dt.h
--
2.20.1
We have a few places where we call smp_processor_id() from preemptible
contexts during the perf buffer handling. We do this to figure out the
numa node for the allocation in case the event is not CPU bound. Use
numa_node_id() instead in such cases to avoid a splat.
Changes since V2:
- Use NUMA_NO_NODE instead of numa_node_id() for event->cpu == -1. (Robin Murphy)
Suzuki K Poulose (4):
coresight: tmc-etr: Do not call smp_processor_id() from preemptible
coresight: tmc-etr: alloc_perf_buf: Do not call smp_processor_id from
preemptible
coresight: tmc-etf: Do not call smp_processor_id from preemptible
coresight: etb10: Do not call smp_processor_id from preemptible
drivers/hwtracing/coresight/coresight-etb10.c | 6 ++----
drivers/hwtracing/coresight/coresight-tmc-etf.c | 6 ++----
drivers/hwtracing/coresight/coresight-tmc-etr.c | 13 ++++---------
3 files changed, 8 insertions(+), 17 deletions(-)
--
2.7.4
This series adds the support for CoreSight devices on ACPI based
platforms. The device connections are encoded as _DSD graph property[0],
with CoreSight specific extensions to indicate the direction of data
flow as described in [1]. Components attached to CPUs are listed
as child devices of the corresponding CPU, removing explicit links
to the CPU like we do in the DT.
The majority of the series cleans up the driver and prepares the subsystem
for platform agnostic firwmare probing, naming scheme, searching etc.
We introduce platform independent helpers to parse the platform supplied
information. Thus we rename the platform handling code from:
of_coresight.c => coresight-platform.c
The CoreSight driver creates shadow devices that appear on the Coresight
bus, in addition to the real devices (e.g, AMBA bus devices). The name
of these devices match the real device. This makes the device name
a bit cryptic for ACPI platform. So this series also introduces a generic
platform agnostic device naming scheme for the shadow Coresight devices.
Towards this we also make changes to the way we lookup devices to resolve
the connections, as we can't use the names to identify the devices. So,
we use the "fwnode_handle" of the real device for the device lookups.
Towards that we clean up the drivers to keep track of the "CoreSight"
device rather than the "real" device. However, all real operations,
like DMA allocation, Power management etc. must be performed on
the real device which is the parent of the shadow device.
Finally we add the support for parsing the ACPI platform data. The power
management support is missing in the ACPI (and this is not specific to
CoreSight). The firmware must ensure that the respective power domains
are turned on.
Applies on v5.2-rc1
Tested on a Juno-r0 board with ACPI bindings patch (Patch 31/30) added on
top of [2]. You would need to make sure that the debug power domain is
turned on before the Linux kernel boots. (e.g, connect the DS-5 to the
Juno board while at UEFI). arm32 code is only compile tested.
[0] ACPI Device Graphs using _DSD (Not available online yet, approved but
awaiting publish and eventually should be linked at).
https://uefi.org/sites/default/files/resources/_DSD-implementation-guide-to…
[1] https://developer.arm.com/docs/den0067/latest/acpi-for-coresighttm-10-platf…
[2] https://github.com/tianocore/edk2-platforms.git
Changes since v3:
- Add tags from Mathieu
Changes since v2:
- Fix the symlink name for ETM devices under cs_etm PMU (Patch by Mathieu)
- Drop patches merged already in the tree.
- Add the tags from Mathieu
- More documentation with examples of ACPI graph in ACPI bindings support.
- Fix ETM4 error return path (Mathieu)
- Drop the patches exposing device links via sysfs, to be posted as separate
series.
- Drop the generic helper for device search by fwnode for a better cleanup
later.
- Split the ACPI bindings support patch for AMBA and platform devices.
- Return integer error for <platform>_get_platform_data() helpers.
- Fix comment about the return code for acpi_get_coresight_cpu().
- Ensure we don't have devices part of multiple graphs (Mathieu).
Changes since v1:
[ http://lists.infradead.org/pipermail/linux-arm-kernel/2019-March/639963.html ]
- Dropped the replicator driver merge changes as they were pulled already.
- Cleanups for Power management in the drivers.
- Reuse platform description for connection information. Also introduce
routines to clean up the platform description to make sure we drop
the references (fwnode_handle).
- Add RFC patches for exposing the device-links via sysfs.
- Drop tracking the device in favour of coresight_device.
- Name etb10 as "etb"
- Fix other comments in v1.
- Use a generic helper for searching with fwnode_handle rather than adding
one for CoreSight.
Mathieu Poirier (1):
coresight: Use coresight device names for sinks in PMU attribute
Suzuki K Poulose (29):
coresight: funnel: Clean up device book keeping
coresight: replicator: Cleanup device tracking
coresight: tmc: Clean up device specific data
coresight: catu: Cleanup device specific data
coresight: tpiu: Clean up device specific data
coresight: stm: Cleanup device specific data
coresight: etm: Clean up device specific data
coresight: etb10: Clean up device specific data
coresight: Rename of_coresight to coresight-platform
coresight: etm3x: Rearrange cp14 access detection
coresight: stm: Rearrange probing the stimulus area
coresight: tmc-etr: Rearrange probing default buffer size
coresight: platform: Make memory allocation helper generic
coresight: Make sure device uses DT for obsolete compatible check
coresight: Introduce generic platform data helper
coresight: Make device to CPU mapping generic
coresight: Remove cpu field from platform data
coresight: Remove name from platform description
coresight: Cleanup coresight_remove_conns
coresight: Reuse platform data structure for connection tracking
coresight: Rearrange platform data probing
coresight: Add support for releasing platform specific data
coresight: platform: Use fwnode handle for device search
coresight: Use fwnode handle instead of device names
coresight: Use platform agnostic names
coresight: stm: ACPI support for parsing stimulus base
coresight: Support for ACPI bindings
coresight: acpi: Support for AMBA components
coresight: acpi: Support for platform devices
drivers/acpi/acpi_amba.c | 9 +
drivers/hwtracing/coresight/Makefile | 3 +-
drivers/hwtracing/coresight/coresight-catu.c | 40 +-
drivers/hwtracing/coresight/coresight-catu.h | 1 -
drivers/hwtracing/coresight/coresight-cpu-debug.c | 3 +-
drivers/hwtracing/coresight/coresight-etb10.c | 51 +-
drivers/hwtracing/coresight/coresight-etm-perf.c | 8 +-
drivers/hwtracing/coresight/coresight-etm.h | 6 +-
.../hwtracing/coresight/coresight-etm3x-sysfs.c | 12 +-
drivers/hwtracing/coresight/coresight-etm3x.c | 45 +-
drivers/hwtracing/coresight/coresight-etm4x.c | 37 +-
drivers/hwtracing/coresight/coresight-etm4x.h | 2 -
drivers/hwtracing/coresight/coresight-funnel.c | 35 +-
drivers/hwtracing/coresight/coresight-platform.c | 810 +++++++++++++++++++++
drivers/hwtracing/coresight/coresight-priv.h | 4 +
drivers/hwtracing/coresight/coresight-replicator.c | 42 +-
drivers/hwtracing/coresight/coresight-stm.c | 118 ++-
drivers/hwtracing/coresight/coresight-tmc-etf.c | 9 +-
drivers/hwtracing/coresight/coresight-tmc-etr.c | 44 +-
drivers/hwtracing/coresight/coresight-tmc.c | 96 +--
drivers/hwtracing/coresight/coresight-tmc.h | 2 -
drivers/hwtracing/coresight/coresight-tpiu.c | 24 +-
drivers/hwtracing/coresight/coresight.c | 164 ++++-
drivers/hwtracing/coresight/of_coresight.c | 297 --------
include/linux/coresight.h | 61 +-
25 files changed, 1332 insertions(+), 591 deletions(-)
create mode 100644 drivers/hwtracing/coresight/coresight-platform.c
delete mode 100644 drivers/hwtracing/coresight/of_coresight.c
ACPI bindings for Juno-r0 (applies on [2] above)
Suzuki K Poulose (1):
edk2-platform: juno: Update ACPI CoreSight Bindings
Platform/ARM/JunoPkg/AcpiTables/Dsdt.asl | 241 +++++++++++++++++++++++++++++++
1 file changed, 241 insertions(+)
--
2.7.4
From: Wojciech Zmuda <wzmuda(a)n7space.com>
This patchset adds time notion to perf instruction and branch samples to allow
coarse time measurement of code blocks execution.
The simplest verification is visibility of the time field in 'perf script' output:
root@zynq:~# perf record -e cs_etm/timestamp,(a)fe970000.etr/u -a sleep 1
Couldn't synthesize bpf events.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.262 MB perf.data ]
root@zynq:~# perf script --ns -F cpu,comm,time
perf [002] 9546.053455325:
perf [002] 9546.053455340:
perf [002] 9546.053455344:
(...)
sleep [003] 9546.060163742:
sleep [003] 9546.060163754:
sleep [003] 9546.060163766:
(...)
ntpd [001] 9546.389083194:
ntpd [001] 9546.389083400:
ntpd [001] 9546.389086319:
(...)
The step above works only if trace has been collected in CPU-wide mode because of some
perf event flags mismatch I'm working on fixing.
Timestamps in subsequent samples are monotonically increasing. The only exception
are discontinuities in trace. From my understanding, we can't timestamp discontinuities
reasonably, since after decoder synchronizes back after trace loss, it needs to wait for
another timestamp packet. Thus, time value of such samples stays at 0x0.
Another way to access these values is to use the perf script engine, which I used for validation
of the feature. The script below calculates timestamp differences of two consecutive branches
sharing the same branch address. This is a simple example of execution time fluctuation detector.
from __future__ import print_function
import os
import sys
sys.path.append(os.environ['PERF_EXEC_PATH'] + \
'/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
from perf_trace_context import *
target_start_addr = int('4005e4', 16) # 0x4005e4 is func() from listing below
branch = dict()
branch['from'] = 0
branches = []
def process_event(s):
global branch
global branches
sample = s['sample']
branch['cpu'] = sample['cpu']
if not branch['from']:
branch['from'] = sample['addr']
branch['ts'] = sample['time']
return
branch['to'] = sample['ip']
if not branch['to']:
branch['from'] = 0
branch['ts'] = 0
return
if branch['from'] and branch['to']:
branches.append(branch.copy())
branch['from'] = 0
return
def trace_end():
global branches
count = 0
timestamp_start = 0
print("Got {0} samples:".format(len(branches)))
for b in branches:
if b['from'] == target_start_addr:
if not timestamp_start:
timestamp_start = b['ts']
continue
print("[{0}]: ts diff = 0x{1:x} - 0x{2:x} = {3:d}".format(count,
b['ts'], timestamp_start, b['ts'] - timestamp_start))
count = count + 1
timestamp_start = b['ts']
The following function was traced:
static int func(int cnt)
{
volatile int x = 0;
static int i;
x += cnt + 0xdeadbeefcafeb00b;
(...) /* repeats ~100 times */
if (i++ % 3 == 0) // Every third execution is longer
usleep(1000);
return x;
}
root@zynq:~# perf record -m,16K -e cs_etm/timestamp,(a)fe970000.etr/u \
--filter 'filter func @./program \
--per-thread ./program
Couldn't synthesize bpf events.
CTRL+C me when you find appropriate.
^C[ perf record: Woken up 12 times to write data ]
[ perf record: Captured and wrote 0.135 MB perf.data ]
root@zynq:~# perf script -s exectime.py
Got 2469 samples:
[0]: ts diff = 0x92f2752e512 - 0x92f274a7ae9 = 551465
[1]: ts diff = 0x92f2752e694 - 0x92f2752e512 = 386
[2]: ts diff = 0x92f2752e817 - 0x92f2752e694 = 387
[3]: ts diff = 0x92f275bef12 - 0x92f2752e817 = 591611
[4]: ts diff = 0x92f275bf093 - 0x92f275bef12 = 385
[5]: ts diff = 0x92f275bf211 - 0x92f275bf093 = 382
[6]: ts diff = 0x92f276451d7 - 0x92f275bf211 = 548806
[7]: ts diff = 0x92f2764535a - 0x92f276451d7 = 387
[8]: ts diff = 0x92f276454d7 - 0x92f2764535a = 381
[9]: ts diff = 0x92f276cb256 - 0x92f276454d7 = 548223
[10]: ts diff = 0x92f276cb3d9 - 0x92f276cb256 = 387
[11]: ts diff = 0x92f276cb556 - 0x92f276cb3d9 = 381
(...)
At the listing above it is visible that every third execution of the function lasted longer
than the other two. It is a naive example and could be enhanced to point to the area that
caused the disruption by examining events 'in the middle' of the traced code range.
Applies cleanly on Mathieu's 5.1-rc3-cpu-wide-v3 branch.
Changes for V2:
- move packet timestamping logic to decoder. Front end only uses this information
to timestamp samples (as suggested by Mathieu).
- leave original behaviour of CPU-wide mode, where decoder is stopped
and front end is triggered about pending queue with timestamp packet.
At the same time, always adjust next and current timestamp in both CPU-wide
and per-thread modes (as suggested by Mathieu).
- when timestamp packet is encountered, timestamp range and discontinuity packets
waiting in the queue, that are not yet consumed by the front end (as suggested by Mathieu).
- don't timestamp exceptions, since they are not turned into branch nor instruction
samples.
- fix timestamping of the last branch sample before discontinuity appears (as suggested by Leo).
Wojciech Zmuda (1):
perf cs-etm: Set time value for samples
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 70 ++++++++++++++++++++-----
tools/perf/util/cs-etm.c | 3 ++
tools/perf/util/cs-etm.h | 1 +
3 files changed, 61 insertions(+), 13 deletions(-)
--
2.11.0
We need a simple method to test Perf with Arm CoreSight drivers, this
could be used for smoke testing when new patch is coming for perf or
CoreSight drivers, and we also can use the test to confirm if the
CoreSight has been enabled successfully on new platforms.
This patch introduces the shell script test_arm_coresight.sh which is
under the 'pert test' framework. Simply to say, the testing rationale
is source oriented testing, it traverses every source (now only refers
to ETM device) and test its all possible sinks. To search the complete
paths from one specific source to its sinks, this patch relies on the
sysfs '/sys/bus/coresight/devices/devX/out:Y' for depth-first search
(DFS) for iteration connected device nodes, if the output device is
detected as one of ETR, ETF, or ETB types then it will test trace data
recording and decoding for this PMU device.
The script runs three output testings for every trace data:
- Test branch samples dumping with 'perf script' command;
- Test branch samples reporting with 'perf report' command;
- Use option '--itrace=i1000i' to insert synthesized instructions events
and the script will check if perf can output the percentage value
successfully based on the instruction samples.
If any device fails for the testing, the test will report failure and
directly exit with error. This test will be only applied on the
platform with PMU event 'cs_etm//', otherwise will skip the testing.
Below is detailed usage for it:
# cd $linux/tools/perf -> This is important so can use shell script
# perf test list
[...]
61: Check Arm CoreSight trace data recording and branch samples
62: Check open filename arg using perf trace + vfs_getname
63: Zstd perf.data compression/decompression
64: Add vfs_getname probe to get syscall args filenames
# perf test 61
61: Check Arm CoreSight trace data recording and branch samples: Ok
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
---
tools/perf/tests/shell/test_arm_coresight.sh | 120 +++++++++++++++++++
1 file changed, 120 insertions(+)
create mode 100755 tools/perf/tests/shell/test_arm_coresight.sh
diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh
new file mode 100755
index 000000000000..7b1fa17a4512
--- /dev/null
+++ b/tools/perf/tests/shell/test_arm_coresight.sh
@@ -0,0 +1,120 @@
+#!/bin/sh
+# Check Arm CoreSight trace data recording and branch samples
+
+# Uses the 'perf record' to record trace data with Arm CoreSight sinks;
+# then verify if there have any branch samples and instruction samples
+# are generated by CoreSight with 'perf script' and 'perf report'
+# commands.
+
+# Leo Yan <leo.yan(a)linaro.org>, 2019
+
+perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
+file=$(mktemp /tmp/temporary_file.XXXXX)
+
+skip_if_no_cs_etm_event() {
+ perf list | grep -q 'cs_etm//' && return 0
+
+ # cs_etm event doesn't exist
+ return 2
+}
+
+skip_if_no_cs_etm_event || exit 2
+
+record_touch_file() {
+ echo "Recording trace (only user mode) with path: CPU$2 => $1"
+ perf record -o ${perfdata} -e cs_etm/@$1/u --per-thread \
+ -- taskset -c $2 touch $file
+}
+
+perf_script_branch_samples() {
+ echo "Looking at perf.data file for dumping branch samples:"
+
+ # Below is an example of the branch samples dumping:
+ # touch 6512 1 branches:u: ffffb220824c strcmp+0xc (/lib/aarch64-linux-gnu/ld-2.27.so)
+ # touch 6512 1 branches:u: ffffb22082e0 strcmp+0xa0 (/lib/aarch64-linux-gnu/ld-2.27.so)
+ # touch 6512 1 branches:u: ffffb2208320 strcmp+0xe0 (/lib/aarch64-linux-gnu/ld-2.27.so)
+ perf script -F,-time -i ${perfdata} | \
+ egrep " +touch +[0-9]+ .* +branches:([u|k]:)? +"
+}
+
+perf_report_branch_samples() {
+ echo "Looking at perf.data file for reporting branch samples:"
+
+ # Below is an example of the branch samples reporting:
+ # 73.04% 73.04% touch libc-2.27.so [.] _dl_addr
+ # 7.71% 7.71% touch libc-2.27.so [.] getenv
+ # 2.59% 2.59% touch ld-2.27.so [.] strcmp
+ perf report --stdio -i ${perfdata} | \
+ egrep " +[0-9]+\.[0-9]+% +[0-9]+\.[0-9]+% +touch "
+}
+
+perf_report_instruction_samples() {
+ echo "Looking at perf.data file for instruction samples:"
+
+ # Below is an example of the instruction samples reporting:
+ # 68.12% touch libc-2.27.so [.] _dl_addr
+ # 5.80% touch libc-2.27.so [.] getenv
+ # 4.35% touch ld-2.27.so [.] _dl_fixup
+ perf report --itrace=i1000i --stdio -i ${perfdata} | \
+ egrep " +[0-9]+\.[0-9]+% +touch"
+}
+
+arm_cs_iterate_devices() {
+ for dev in $1/out\:*; do
+
+ # Skip testing if it's not a directory
+ ! [ -d $dev ] && continue;
+
+ # Read out its symbol link file name
+ path=`readlink -f $dev`
+
+ # Extract device name from path, e.g.
+ # path = '/sys/devices/platform/20010000.etf/tmc_etf0'
+ # `> device_name = 'tmc_etf0'
+ device_name=`echo $path | awk -F/ '{print $(NF)}'`
+
+ echo $device_name | egrep -q "etr|etb|etf"
+
+ # Only test if the output device is ETR/ETB/ETF
+ if [ $? -eq 0 ]; then
+
+ pmu_dev="/sys/bus/event_source/devices/cs_etm/sinks/$device_name"
+
+ # Exit if PMU device node doesn't exist
+ if ! [ -f $pmu_dev ]; then
+ echo "PMU device $pmu_dev doesn't exist"
+ exit 1
+ fi
+
+ record_touch_file $device_name $2 &&
+ perf_script_branch_samples &&
+ perf_report_branch_samples &&
+ perf_report_instruction_samples
+
+ err=$?
+
+ # Exit when find failure
+ [ $err != 0 ] && exit $err
+
+ rm -f ${perfdata}
+ rm -f ${file}
+ fi
+
+ arm_cs_iterate_devices $dev $2
+ done
+}
+
+arm_cs_etm_test() {
+ # Iterate for every ETM device
+ for dev in /sys/bus/coresight/devices/etm*; do
+
+ # Find the ETM device belonging to which CPU
+ cpu=`cat $dev/cpu`
+
+ # Use depth-first search (DFS) to iterate outputs
+ arm_cs_iterate_devices $dev $cpu
+ done
+}
+
+arm_cs_etm_test
+exit 0
--
2.17.1
Coresight device connections are a bit complicated and is not
exposed currently to the user. One has to look at the platform
descriptions (DT bindings or ACPI bindings) to make an understanding.
Given the new naming scheme, it will be helpful to have this information
to choose the appropriate devices for tracing. This patch exposes
the device connections via links in the sysfs directories.
e.g, for a connection devA[OutputPort_X] -> devB[InputPort_Y]
is represented as two symlinks:
/sys/bus/coresight/.../devA/out:X -> /sys/bus/coresight/.../devB
/sys/bus/coresight/.../devB/in:Y -> /sys/bus/coresight/.../devA
Applies on coresight/next tree.
This is split from the ACPI bindings series. No functional changes.
Suzuki K Poulose (3):
coresight: Pass coresight_device for coresight_release_platform_data
coresight: add return value for fixup connections
coresight: Expose device connections via sysfs
drivers/hwtracing/coresight/coresight-platform.c | 2 +-
drivers/hwtracing/coresight/coresight-priv.h | 3 +-
drivers/hwtracing/coresight/coresight.c | 148 +++++++++++++++++++----
include/linux/coresight.h | 4 +
4 files changed, 132 insertions(+), 25 deletions(-)
--
2.7.4
We have a few places where we call smp_processor_id() from preemptible
contexts during the perf buffer handling. We do this to figure out the
numa node for the allocation in case the event is not CPU bound. Use
numa_node_id() instead in such cases to avoid a splat.
Suzuki K Poulose (4):
coresight: tmc-etr: Do not call smp_processor_id() from preemptible
coresight: tmc-etr: alloc_perf_buf: Do not call smp_processor_id from
preemptible
coresight: tmc-etf: Do not call smp_processor_id from preemptible
coresight: etb10: Do not call smp_processor_id from preemptible
drivers/hwtracing/coresight/coresight-etb10.c | 6 ++----
drivers/hwtracing/coresight/coresight-tmc-etf.c | 6 ++----
drivers/hwtracing/coresight/coresight-tmc-etr.c | 13 ++++---------
3 files changed, 8 insertions(+), 17 deletions(-)
--
2.7.4