The current method for allocating trace source ID values to sources is to use a fixed algorithm for CPU based sources of (cpu_num * 2 + 0x10). The STM is allocated ID 0x1.
This fixed algorithm is used in both the CoreSight driver code, and by perf when writing the trace metadata in the AUXTRACE_INFO record.
The method needs replacing as currently:- 1. It is inefficient in using available IDs. 2. Does not scale to larger systems with many cores and the algorithm has no limits so will generate invalid trace IDs for cpu number > 44.
Additionally requirements to allocate additional system IDs on some systems have been seen.
This patch set introduces an API that allows the allocation of trace IDs in a dynamic manner.
Architecturally reserved IDs are never allocated, and the system is limited to allocating only valid IDs.
Each of the current trace sources ETM3.x, ETM4.x and STM is updated to use the new API.
For the ETMx.x devices IDs are allocated on certain events a) When using sysfs, an ID will be allocated on hardware enable, or a read of sysfs TRCTRACEID register and freed when the sysfs reset is written.
b) When using perf, ID is allocated on hardware enable, and freed on hardware disable. IDs are communicated using the AUX_OUTPUT_HW_ID packet. The ID allocator is notified when perf sessions start and stop so CPU based IDs are kept constant throughout any perf session.
Note: This patchset breaks backward compatibility for perf record and perf report.
Because the method for generating the AUXTRACE_INFO meta data has changed, using an older perf record will result in metadata that does not match the trace IDs used in the recorded trace data. This mismatch will cause subsequent decode to fail.
The version of the AUXTRACE_INFO has been updated to reflect the fact that the trace source IDs are no longer present in the metadata. This will mean older versions of perf report cannot decode the file.
Applies to coresight/next [c06475910b52] Tested on DB410c
Changes since v1: (after feedback & discussion with Mathieu & Suzuki).
1) API has changed. The global trace ID map is managed internally, so it is no longer passed in to the API functions.
2) perf record does not use sysfs to find the trace IDs. These are now output as AUX_OUTPUT_HW_ID events. The drivers, perf record, and perf report have been updated accordingly to generate and handle these events.
Mike Leach (13): coresight: trace-id: Add API to dynamically assign Trace ID values coresight: trace-id: update CoreSight core to use Trace ID API coresight: stm: Update STM driver to use Trace ID API coresight: etm4x: Update ETM4 driver to use Trace ID API coresight: etm3x: Update ETM3 driver to use Trace ID API coresight: etmX.X: stm: Remove unused legacy source Trace ID ops coresight: perf: traceid: Add perf notifiers for Trace ID perf: cs-etm: Move mapping of Trace ID and cpu into helper function perf: cs-etm: Update record event to use new Trace ID protocol kernel: events: Export perf_report_aux_output_id() perf: cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packet coresight: events: PERF_RECORD_AUX_OUTPUT_HW_ID used for Trace ID coresight: trace-id: Add debug & test macros to Trace ID allocation
drivers/hwtracing/coresight/Makefile | 2 +- drivers/hwtracing/coresight/coresight-core.c | 49 +--- .../hwtracing/coresight/coresight-etm-perf.c | 17 ++ drivers/hwtracing/coresight/coresight-etm.h | 3 +- .../coresight/coresight-etm3x-core.c | 85 +++--- .../coresight/coresight-etm3x-sysfs.c | 28 +- .../coresight/coresight-etm4x-core.c | 65 ++++- .../coresight/coresight-etm4x-sysfs.c | 32 ++- drivers/hwtracing/coresight/coresight-etm4x.h | 3 + drivers/hwtracing/coresight/coresight-stm.c | 49 +--- .../hwtracing/coresight/coresight-trace-id.c | 263 ++++++++++++++++++ .../hwtracing/coresight/coresight-trace-id.h | 65 +++++ include/linux/coresight-pmu.h | 31 ++- include/linux/coresight.h | 3 - kernel/events/core.c | 1 + tools/include/linux/coresight-pmu.h | 31 ++- tools/perf/arch/arm/util/cs-etm.c | 21 +- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 9 + tools/perf/util/cs-etm.c | 220 +++++++++++++-- tools/perf/util/cs-etm.h | 14 +- 20 files changed, 784 insertions(+), 207 deletions(-) create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
The existing mechanism to assign Trace ID values to sources is limited and does not scale for larger multicore / multi trace source systems.
The API introduces functions that reserve IDs based on availabilty represented by a coresight_trace_id_map structure. This records the used and free IDs in a bitmap.
CPU bound sources such as ETMs use the coresight_trace_id_get_cpu_id / coresight_trace_id_put_cpu_id pair of functions. The API will record the ID associated with the CPU. This ensures that the same ID will be re-used while perf events are active on the CPU. The put_cpu_id function will pend release of the ID until all perf cs_etm sessions are complete.
Non-cpu sources, such as the STM can use coresight_trace_id_get_system_id / coresight_trace_id_put_system_id.
Signed-off-by: Mike Leach mike.leach@linaro.org --- drivers/hwtracing/coresight/Makefile | 2 +- .../hwtracing/coresight/coresight-trace-id.c | 230 ++++++++++++++++++ .../hwtracing/coresight/coresight-trace-id.h | 65 +++++ 3 files changed, 296 insertions(+), 1 deletion(-) create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile index b6c4a48140ec..329a0c704b87 100644 --- a/drivers/hwtracing/coresight/Makefile +++ b/drivers/hwtracing/coresight/Makefile @@ -6,7 +6,7 @@ obj-$(CONFIG_CORESIGHT) += coresight.o coresight-y := coresight-core.o coresight-etm-perf.o coresight-platform.o \ coresight-sysfs.o coresight-syscfg.o coresight-config.o \ coresight-cfg-preload.o coresight-cfg-afdo.o \ - coresight-syscfg-configfs.o + coresight-syscfg-configfs.o coresight-trace-id.o obj-$(CONFIG_CORESIGHT_LINK_AND_SINK_TMC) += coresight-tmc.o coresight-tmc-y := coresight-tmc-core.o coresight-tmc-etf.o \ coresight-tmc-etr.o diff --git a/drivers/hwtracing/coresight/coresight-trace-id.c b/drivers/hwtracing/coresight/coresight-trace-id.c new file mode 100644 index 000000000000..dac9c89ae00d --- /dev/null +++ b/drivers/hwtracing/coresight/coresight-trace-id.c @@ -0,0 +1,230 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2022, Linaro Limited, All rights reserved. + * Author: Mike Leach mike.leach@linaro.org + */ +#include <linux/kernel.h> +#include <linux/types.h> +#include <linux/spinlock.h> + +#include "coresight-trace-id.h" + +/* need to keep data on ids & association with cpus. */ +struct cpu_id_info { + int id; + bool pend_rel; +}; + +/* default trace ID map. Used for systems that do not require per sink mappings */ +static struct coresight_trace_id_map id_map_default; + +/* maintain a record of the current mapping of cpu IDs */ +static DEFINE_PER_CPU(struct cpu_id_info, cpu_ids); + +/* perf session active flag */ +static int perf_cs_etm_session_active; + +/* lock to protect id_map and cpu data */ +static DEFINE_SPINLOCK(id_map_lock); + +/* ID 0 is reserved */ +#define CORESIGHT_TRACE_ID_RES_0 0 + +/* ID 0x70 onwards are reserved */ +#define CORESIGHT_TRACE_ID_RES_RANGE_LO 0x70 +#define CORESIGHT_TRACE_ID_RES_RANGE_HI 0x7F + +#define IS_VALID_ID(id) \ + ((id > CORESIGHT_TRACE_ID_RES_0) && (id < CORESIGHT_TRACE_ID_RES_RANGE_LO)) + +static void coresight_trace_id_set_inuse(int id, struct coresight_trace_id_map *id_map) +{ + if (IS_VALID_ID(id)) + set_bit(id, id_map->avail_ids); +} + +static void coresight_trace_id_clear_inuse(int id, struct coresight_trace_id_map *id_map) +{ + if (IS_VALID_ID(id)) + clear_bit(id, id_map->avail_ids); +} + +static void coresight_trace_id_set_pend_rel(int id, struct coresight_trace_id_map *id_map) +{ + if (IS_VALID_ID(id)) + set_bit(id, id_map->pend_rel_ids); +} + +static void coresight_trace_id_clear_pend_rel(int id, struct coresight_trace_id_map *id_map) +{ + if (IS_VALID_ID(id)) + clear_bit(id, id_map->pend_rel_ids); +} + +static int coresight_trace_id_find_new_id(struct coresight_trace_id_map *id_map) +{ + int id; + + id = find_first_zero_bit(id_map->avail_ids, CORESIGHT_TRACE_IDS_MAX); + if (id >= CORESIGHT_TRACE_IDS_MAX) + id = -EINVAL; + return id; +} + +/* release all pending IDs for all current maps & clear CPU associations */ +static void coresight_trace_id_release_all_pending(void) +{ + struct coresight_trace_id_map *id_map = &id_map_default; + int cpu, bit; + + for_each_set_bit(bit, id_map->pend_rel_ids, CORESIGHT_TRACE_IDS_MAX) { + clear_bit(bit, id_map->avail_ids); + clear_bit(bit, id_map->pend_rel_ids); + } + + for_each_possible_cpu(cpu) { + if (per_cpu(cpu_ids, cpu).pend_rel) { + per_cpu(cpu_ids, cpu).pend_rel = false; + per_cpu(cpu_ids, cpu).id = 0; + } + } +} + +static void coresight_trace_id_init_id_map(struct coresight_trace_id_map *id_map) +{ + int bit; + + /* set all reserved bits as in-use */ + set_bit(CORESIGHT_TRACE_ID_RES_0, id_map->avail_ids); + for (bit = CORESIGHT_TRACE_ID_RES_RANGE_LO; + bit <= CORESIGHT_TRACE_ID_RES_RANGE_HI; bit++) + set_bit(bit, id_map->avail_ids); +} + +static int coresight_trace_id_map_get_cpu_id(int cpu, struct coresight_trace_id_map *id_map) +{ + unsigned long flags; + int id; + + spin_lock_irqsave(&id_map_lock, flags); + + /* check for existing allocation for this CPU */ + id = per_cpu(cpu_ids, cpu).id; + if (id) + goto get_cpu_id_out; + + /* find a new ID */ + id = coresight_trace_id_find_new_id(id_map); + if (id < 0) + goto get_cpu_id_out; + + /* got a valid new ID - save details */ + per_cpu(cpu_ids, cpu).id = id; + per_cpu(cpu_ids, cpu).pend_rel = false; + coresight_trace_id_set_inuse(id, id_map); + coresight_trace_id_clear_pend_rel(id, id_map); + +get_cpu_id_out: + spin_unlock_irqrestore(&id_map_lock, flags); + return id; +} + +static void coresight_trace_id_map_put_cpu_id(int cpu, struct coresight_trace_id_map *id_map) +{ + unsigned long flags; + int id; + + spin_lock_irqsave(&id_map_lock, flags); + id = per_cpu(cpu_ids, cpu).id; + if (!id) + goto put_cpu_id_out; + + if (perf_cs_etm_session_active) { + /* set release at pending if perf still active */ + coresight_trace_id_set_pend_rel(id, id_map); + per_cpu(cpu_ids, cpu).pend_rel = true; + } else { + /* otherwise clear id */ + coresight_trace_id_clear_inuse(id, id_map); + per_cpu(cpu_ids, cpu).id = 0; + } + + put_cpu_id_out: + spin_unlock_irqrestore(&id_map_lock, flags); +} + +static int coresight_trace_id_map_get_system_id(struct coresight_trace_id_map *id_map) +{ + unsigned long flags; + int id; + + spin_lock_irqsave(&id_map_lock, flags); + id = coresight_trace_id_find_new_id(id_map); + if (id > 0) + coresight_trace_id_set_inuse(id, id_map); + spin_unlock_irqrestore(&id_map_lock, flags); + + return id; +} + +static void coresight_trace_id_map_put_system_id(struct coresight_trace_id_map *id_map, int id) +{ + unsigned long flags; + + spin_lock_irqsave(&id_map_lock, flags); + coresight_trace_id_clear_inuse(id, id_map); + spin_unlock_irqrestore(&id_map_lock, flags); +} + +/* API functions */ +int coresight_trace_id_get_cpu_id(int cpu) +{ + return coresight_trace_id_map_get_cpu_id(cpu, &id_map_default); +} +EXPORT_SYMBOL_GPL(coresight_trace_id_get_cpu_id); + +void coresight_trace_id_put_cpu_id(int cpu) +{ + coresight_trace_id_map_put_cpu_id(cpu, &id_map_default); +} +EXPORT_SYMBOL_GPL(coresight_trace_id_put_cpu_id); + +int coresight_trace_id_get_system_id(void) +{ + return coresight_trace_id_map_get_system_id(&id_map_default); +} +EXPORT_SYMBOL_GPL(coresight_trace_id_get_system_id); + +void coresight_trace_id_put_system_id(int id) +{ + coresight_trace_id_map_put_system_id(&id_map_default, id); +} +EXPORT_SYMBOL_GPL(coresight_trace_id_put_system_id); + +void coresight_trace_id_perf_start(void) +{ + unsigned long flags; + + spin_lock_irqsave(&id_map_lock, flags); + perf_cs_etm_session_active++; + spin_unlock_irqrestore(&id_map_lock, flags); +} +EXPORT_SYMBOL_GPL(coresight_trace_id_perf_start); + +void coresight_trace_id_perf_stop(void) +{ + unsigned long flags; + + spin_lock_irqsave(&id_map_lock, flags); + perf_cs_etm_session_active--; + if (!perf_cs_etm_session_active) + coresight_trace_id_release_all_pending(); + spin_unlock_irqrestore(&id_map_lock, flags); +} +EXPORT_SYMBOL_GPL(coresight_trace_id_perf_stop); + +void coresight_trace_id_init_default_map(void) +{ + coresight_trace_id_init_id_map(&id_map_default); +} +EXPORT_SYMBOL_GPL(coresight_trace_id_init_default_map); diff --git a/drivers/hwtracing/coresight/coresight-trace-id.h b/drivers/hwtracing/coresight/coresight-trace-id.h new file mode 100644 index 000000000000..63950087edf6 --- /dev/null +++ b/drivers/hwtracing/coresight/coresight-trace-id.h @@ -0,0 +1,65 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright(C) 2022 Linaro Limited. All rights reserved. + * Author: Mike Leach mike.leach@linaro.org + */ + +#ifndef _CORESIGHT_TRACE_ID_H +#define _CORESIGHT_TRACE_ID_H + +/* + * Coresight trace ID allocation API + * + * With multi cpu systems, and more additional trace sources a scalable + * trace ID reservation system is required. + * + * The system will allocate Ids on a demand basis, and allow them to be + * released when done. + * + * In order to ensure that a consistent cpu / ID matching is maintained + * throughout a perf cs_etm event session - a session in progress flag will + * be maintained, and released IDs not cleared until the perf session is + * complete. This allows the same CPU to be re-allocated its prior ID. + * + * + * Trace ID maps will be created and initialised to prevent architecturally + * reserved IDs from being allocated. + * + * API permits multiple maps to be maintained - for large systems where + * different sets of cpus trace into different independent sinks. + */ + +#include <linux/bitops.h> +#include <linux/types.h> + + +/* architecturally we have 128 IDs some of which are reserved */ +#define CORESIGHT_TRACE_IDS_MAX 128 + +/** + * Trace ID map. + * + * @avail_ids: Bitmap to register available (bit = 0) and in use (bit = 1) IDs. + * Initialised so that the reserved IDs are permanently marked as in use. + * @pend_rel_ids: CPU IDs that have been released by the trace source but not yet marked + * as available, to allow re-allocation to the same CPU during a perf session. + */ +struct coresight_trace_id_map { + DECLARE_BITMAP(avail_ids, CORESIGHT_TRACE_IDS_MAX); + DECLARE_BITMAP(pend_rel_ids, CORESIGHT_TRACE_IDS_MAX); +}; + +/* Allocate and release IDs for a single default trace ID map */ +int coresight_trace_id_get_cpu_id(int cpu); +int coresight_trace_id_get_system_id(void); +void coresight_trace_id_put_cpu_id(int cpu); +void coresight_trace_id_put_system_id(int id); + +/* notifiers for perf session start and stop */ +void coresight_trace_id_perf_start(void); +void coresight_trace_id_perf_stop(void); + +/* initialise the default ID map */ +void coresight_trace_id_init_default_map(void); + +#endif /* _CORESIGHT_TRACE_ID_H */
Hi Mike,
Thanks for the patch, please find my comments inline.
On 04/07/2022 09:11, Mike Leach wrote:
The existing mechanism to assign Trace ID values to sources is limited and does not scale for larger multicore / multi trace source systems.
The API introduces functions that reserve IDs based on availabilty represented by a coresight_trace_id_map structure. This records the used and free IDs in a bitmap.
CPU bound sources such as ETMs use the coresight_trace_id_get_cpu_id / coresight_trace_id_put_cpu_id pair of functions. The API will record the ID associated with the CPU. This ensures that the same ID will be re-used while perf events are active on the CPU. The put_cpu_id function will pend release of the ID until all perf cs_etm sessions are complete.
Non-cpu sources, such as the STM can use coresight_trace_id_get_system_id / coresight_trace_id_put_system_id.
Signed-off-by: Mike Leach mike.leach@linaro.org
drivers/hwtracing/coresight/Makefile | 2 +- .../hwtracing/coresight/coresight-trace-id.c | 230 ++++++++++++++++++ .../hwtracing/coresight/coresight-trace-id.h | 65 +++++ 3 files changed, 296 insertions(+), 1 deletion(-) create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile index b6c4a48140ec..329a0c704b87 100644 --- a/drivers/hwtracing/coresight/Makefile +++ b/drivers/hwtracing/coresight/Makefile @@ -6,7 +6,7 @@ obj-$(CONFIG_CORESIGHT) += coresight.o coresight-y := coresight-core.o coresight-etm-perf.o coresight-platform.o \ coresight-sysfs.o coresight-syscfg.o coresight-config.o \ coresight-cfg-preload.o coresight-cfg-afdo.o \
coresight-syscfg-configfs.o
obj-$(CONFIG_CORESIGHT_LINK_AND_SINK_TMC) += coresight-tmc.o coresight-tmc-y := coresight-tmc-core.o coresight-tmc-etf.o \ coresight-tmc-etr.ocoresight-syscfg-configfs.o coresight-trace-id.o
diff --git a/drivers/hwtracing/coresight/coresight-trace-id.c b/drivers/hwtracing/coresight/coresight-trace-id.c new file mode 100644 index 000000000000..dac9c89ae00d --- /dev/null +++ b/drivers/hwtracing/coresight/coresight-trace-id.c @@ -0,0 +1,230 @@ +// SPDX-License-Identifier: GPL-2.0 +/*
- Copyright (c) 2022, Linaro Limited, All rights reserved.
- Author: Mike Leach mike.leach@linaro.org
- */
+#include <linux/kernel.h> +#include <linux/types.h> +#include <linux/spinlock.h>
+#include "coresight-trace-id.h"
+/* need to keep data on ids & association with cpus. */ +struct cpu_id_info {
- int id;
- bool pend_rel;
+};
+/* default trace ID map. Used for systems that do not require per sink mappings */ +static struct coresight_trace_id_map id_map_default;
+/* maintain a record of the current mapping of cpu IDs */ +static DEFINE_PER_CPU(struct cpu_id_info, cpu_ids);
+/* perf session active flag */ +static int perf_cs_etm_session_active;
+/* lock to protect id_map and cpu data */ +static DEFINE_SPINLOCK(id_map_lock);
+/* ID 0 is reserved */ +#define CORESIGHT_TRACE_ID_RES_0 0
+/* ID 0x70 onwards are reserved */ +#define CORESIGHT_TRACE_ID_RES_RANGE_LO 0x70 +#define CORESIGHT_TRACE_ID_RES_RANGE_HI 0x7F
Since this range is at the end of top, we could clip the MAX_IDS to 0x70 and skip all these unnecessary checks and reservations. Also, by modifying the find_bit and for_each_bit slightly we could get away with this reservation scheme and the IS_VALID(id) checks.
+#define IS_VALID_ID(id) \
- ((id > CORESIGHT_TRACE_ID_RES_0) && (id < CORESIGHT_TRACE_ID_RES_RANGE_LO))
+static void coresight_trace_id_set_inuse(int id, struct coresight_trace_id_map *id_map) +{
- if (IS_VALID_ID(id))
set_bit(id, id_map->avail_ids);
+}
Please see my comment around the definition of avail_ids.
+static void coresight_trace_id_clear_inuse(int id, struct coresight_trace_id_map *id_map) +{
- if (IS_VALID_ID(id))
clear_bit(id, id_map->avail_ids);
+}
This could be :
coresight_trace_id_free_id()
+static void coresight_trace_id_set_pend_rel(int id, struct coresight_trace_id_map *id_map) +{
- if (IS_VALID_ID(id))
set_bit(id, id_map->pend_rel_ids);
+}
+static void coresight_trace_id_clear_pend_rel(int id, struct coresight_trace_id_map *id_map) +{
- if (IS_VALID_ID(id))
clear_bit(id, id_map->pend_rel_ids);
+}
+static int coresight_trace_id_find_new_id(struct coresight_trace_id_map *id_map)
minor nit: Could we call this :
coresight_trace_id_alloc_new_id(id_map) and
+{
- int id;
- id = find_first_zero_bit(id_map->avail_ids, CORESIGHT_TRACE_IDS_MAX);
minor nit: You could also do, to explicitly skip 0.
id = find_next_zero_bit(id_map->avail_ids, 1, CORESIGHT_TRACE_IDS_MAX);
- if (id >= CORESIGHT_TRACE_IDS_MAX)
id = -EINVAL;
Could we also mark the id as in use here itself ? All callers of this function have to do that explicitly, anyways.
- return id;
+}
+/* release all pending IDs for all current maps & clear CPU associations */ +static void coresight_trace_id_release_all_pending(void) +{
- struct coresight_trace_id_map *id_map = &id_map_default;
- int cpu, bit;
int cpu, bit = 1;
- for_each_set_bit(bit, id_map->pend_rel_ids, CORESIGHT_TRACE_IDS_MAX) {
for_each_set_bit_from(bit, id_map...)
clear_bit(bit, id_map->avail_ids);
clear_bit(bit, id_map->pend_rel_ids);
- }
- for_each_possible_cpu(cpu) {
if (per_cpu(cpu_ids, cpu).pend_rel) {
per_cpu(cpu_ids, cpu).pend_rel = false;
per_cpu(cpu_ids, cpu).id = 0;
}
- }
+}
+static void coresight_trace_id_init_id_map(struct coresight_trace_id_map *id_map) +{
- int bit;
- /* set all reserved bits as in-use */
- set_bit(CORESIGHT_TRACE_ID_RES_0, id_map->avail_ids);
- for (bit = CORESIGHT_TRACE_ID_RES_RANGE_LO;
bit <= CORESIGHT_TRACE_ID_RES_RANGE_HI; bit++)
set_bit(bit, id_map->avail_ids);
+}
+static int coresight_trace_id_map_get_cpu_id(int cpu, struct coresight_trace_id_map *id_map) +{
- unsigned long flags;
- int id;
- spin_lock_irqsave(&id_map_lock, flags);
- /* check for existing allocation for this CPU */
- id = per_cpu(cpu_ids, cpu).id;
- if (id)
goto get_cpu_id_out;
- /* find a new ID */
- id = coresight_trace_id_find_new_id(id_map);
- if (id < 0)
goto get_cpu_id_out;
- /* got a valid new ID - save details */
- per_cpu(cpu_ids, cpu).id = id;
- per_cpu(cpu_ids, cpu).pend_rel = false;
- coresight_trace_id_set_inuse(id, id_map);
- coresight_trace_id_clear_pend_rel(id, id_map);
+get_cpu_id_out:
- spin_unlock_irqrestore(&id_map_lock, flags);
- return id;
+}
+static void coresight_trace_id_map_put_cpu_id(int cpu, struct coresight_trace_id_map *id_map) +{
- unsigned long flags;
- int id;
- spin_lock_irqsave(&id_map_lock, flags);
- id = per_cpu(cpu_ids, cpu).id;
- if (!id)
goto put_cpu_id_out;
- if (perf_cs_etm_session_active) {
/* set release at pending if perf still active */
coresight_trace_id_set_pend_rel(id, id_map);
per_cpu(cpu_ids, cpu).pend_rel = true;
- } else {
/* otherwise clear id */
coresight_trace_id_clear_inuse(id, id_map);
per_cpu(cpu_ids, cpu).id = 0;
- }
- put_cpu_id_out:
- spin_unlock_irqrestore(&id_map_lock, flags);
+}
+static int coresight_trace_id_map_get_system_id(struct coresight_trace_id_map *id_map) +{
- unsigned long flags;
- int id;
- spin_lock_irqsave(&id_map_lock, flags);
- id = coresight_trace_id_find_new_id(id_map);
- if (id > 0)
coresight_trace_id_set_inuse(id, id_map);
Please see my suggestion above on moving this to the place where we find the bit.
- spin_unlock_irqrestore(&id_map_lock, flags);
- return id;
+}
+static void coresight_trace_id_map_put_system_id(struct coresight_trace_id_map *id_map, int id) +{
- unsigned long flags;
- spin_lock_irqsave(&id_map_lock, flags);
- coresight_trace_id_clear_inuse(id, id_map);
- spin_unlock_irqrestore(&id_map_lock, flags);
+}
+/* API functions */ +int coresight_trace_id_get_cpu_id(int cpu) +{
- return coresight_trace_id_map_get_cpu_id(cpu, &id_map_default);
+} +EXPORT_SYMBOL_GPL(coresight_trace_id_get_cpu_id);
+void coresight_trace_id_put_cpu_id(int cpu) +{
- coresight_trace_id_map_put_cpu_id(cpu, &id_map_default);
+} +EXPORT_SYMBOL_GPL(coresight_trace_id_put_cpu_id);
+int coresight_trace_id_get_system_id(void) +{
- return coresight_trace_id_map_get_system_id(&id_map_default);
+} +EXPORT_SYMBOL_GPL(coresight_trace_id_get_system_id);
+void coresight_trace_id_put_system_id(int id) +{
- coresight_trace_id_map_put_system_id(&id_map_default, id);
+} +EXPORT_SYMBOL_GPL(coresight_trace_id_put_system_id);
+void coresight_trace_id_perf_start(void) +{
- unsigned long flags;
- spin_lock_irqsave(&id_map_lock, flags);
- perf_cs_etm_session_active++;
- spin_unlock_irqrestore(&id_map_lock, flags);
+} +EXPORT_SYMBOL_GPL(coresight_trace_id_perf_start);
+void coresight_trace_id_perf_stop(void) +{
- unsigned long flags;
- spin_lock_irqsave(&id_map_lock, flags);
- perf_cs_etm_session_active--;
- if (!perf_cs_etm_session_active)
coresight_trace_id_release_all_pending();
- spin_unlock_irqrestore(&id_map_lock, flags);
+} +EXPORT_SYMBOL_GPL(coresight_trace_id_perf_stop);
+void coresight_trace_id_init_default_map(void) +{
- coresight_trace_id_init_id_map(&id_map_default);
+} +EXPORT_SYMBOL_GPL(coresight_trace_id_init_default_map);
We may be able to get rid of this init. Otherwise we may convert this to a module_initcall() in the worst case. No need to export this.
diff --git a/drivers/hwtracing/coresight/coresight-trace-id.h b/drivers/hwtracing/coresight/coresight-trace-id.h new file mode 100644 index 000000000000..63950087edf6 --- /dev/null +++ b/drivers/hwtracing/coresight/coresight-trace-id.h @@ -0,0 +1,65 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/*
- Copyright(C) 2022 Linaro Limited. All rights reserved.
- Author: Mike Leach mike.leach@linaro.org
- */
+#ifndef _CORESIGHT_TRACE_ID_H +#define _CORESIGHT_TRACE_ID_H
+/*
- Coresight trace ID allocation API
- With multi cpu systems, and more additional trace sources a scalable
- trace ID reservation system is required.
- The system will allocate Ids on a demand basis, and allow them to be
- released when done.
- In order to ensure that a consistent cpu / ID matching is maintained
- throughout a perf cs_etm event session - a session in progress flag will
- be maintained, and released IDs not cleared until the perf session is
- complete. This allows the same CPU to be re-allocated its prior ID.
- Trace ID maps will be created and initialised to prevent architecturally
- reserved IDs from being allocated.
- API permits multiple maps to be maintained - for large systems where
- different sets of cpus trace into different independent sinks.
- */
Thanks for the detailed comment above.
+#include <linux/bitops.h> +#include <linux/types.h>
+/* architecturally we have 128 IDs some of which are reserved */ +#define CORESIGHT_TRACE_IDS_MAX 128
Could we restrict the CORESIGHT_TRACE_IDS_MAX to 0x70, clipping the upper range of reserved ids ? That way, we could skip bothering about checking it everywhere.
+/**
- Trace ID map.
- @avail_ids: Bitmap to register available (bit = 0) and in use (bit = 1) IDs.
Initialised so that the reserved IDs are permanently marked as in use.
To be honest this inverses the intution. Could we instead name this used_ids ?
i.e BIT(i) = 1 => implies trace id is in use.
- @pend_rel_ids: CPU IDs that have been released by the trace source but not yet marked
as available, to allow re-allocation to the same CPU during a perf session.
- */
+struct coresight_trace_id_map {
- DECLARE_BITMAP(avail_ids, CORESIGHT_TRACE_IDS_MAX);
- DECLARE_BITMAP(pend_rel_ids, CORESIGHT_TRACE_IDS_MAX);
+};
Also, the definitions are split between the .c and .h. Could we keep all of them at one place, .h preferrably ? Or if this is not at all needed for the consumers of the API, we should keep all of this in the .c file.
I guess in the future, with the sink specific scheme, we may need to expose the helpers which accept an id_map. So may be even move it here.
Thanks Suzuki
Hi Suzuki
On Tue, 19 Jul 2022 at 18:30, Suzuki K Poulose suzuki.poulose@arm.com wrote:
Hi Mike,
Thanks for the patch, please find my comments inline.
On 04/07/2022 09:11, Mike Leach wrote:
The existing mechanism to assign Trace ID values to sources is limited and does not scale for larger multicore / multi trace source systems.
The API introduces functions that reserve IDs based on availabilty represented by a coresight_trace_id_map structure. This records the used and free IDs in a bitmap.
CPU bound sources such as ETMs use the coresight_trace_id_get_cpu_id / coresight_trace_id_put_cpu_id pair of functions. The API will record the ID associated with the CPU. This ensures that the same ID will be re-used while perf events are active on the CPU. The put_cpu_id function will pend release of the ID until all perf cs_etm sessions are complete.
Non-cpu sources, such as the STM can use coresight_trace_id_get_system_id / coresight_trace_id_put_system_id.
Signed-off-by: Mike Leach mike.leach@linaro.org
drivers/hwtracing/coresight/Makefile | 2 +- .../hwtracing/coresight/coresight-trace-id.c | 230 ++++++++++++++++++ .../hwtracing/coresight/coresight-trace-id.h | 65 +++++ 3 files changed, 296 insertions(+), 1 deletion(-) create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile index b6c4a48140ec..329a0c704b87 100644 --- a/drivers/hwtracing/coresight/Makefile +++ b/drivers/hwtracing/coresight/Makefile @@ -6,7 +6,7 @@ obj-$(CONFIG_CORESIGHT) += coresight.o coresight-y := coresight-core.o coresight-etm-perf.o coresight-platform.o \ coresight-sysfs.o coresight-syscfg.o coresight-config.o \ coresight-cfg-preload.o coresight-cfg-afdo.o \
coresight-syscfg-configfs.o
obj-$(CONFIG_CORESIGHT_LINK_AND_SINK_TMC) += coresight-tmc.o coresight-tmc-y := coresight-tmc-core.o coresight-tmc-etf.o \ coresight-tmc-etr.ocoresight-syscfg-configfs.o coresight-trace-id.o
diff --git a/drivers/hwtracing/coresight/coresight-trace-id.c b/drivers/hwtracing/coresight/coresight-trace-id.c new file mode 100644 index 000000000000..dac9c89ae00d --- /dev/null +++ b/drivers/hwtracing/coresight/coresight-trace-id.c @@ -0,0 +1,230 @@ +// SPDX-License-Identifier: GPL-2.0 +/*
- Copyright (c) 2022, Linaro Limited, All rights reserved.
- Author: Mike Leach mike.leach@linaro.org
- */
+#include <linux/kernel.h> +#include <linux/types.h> +#include <linux/spinlock.h>
+#include "coresight-trace-id.h"
+/* need to keep data on ids & association with cpus. */ +struct cpu_id_info {
int id;
bool pend_rel;
+};
+/* default trace ID map. Used for systems that do not require per sink mappings */ +static struct coresight_trace_id_map id_map_default;
+/* maintain a record of the current mapping of cpu IDs */ +static DEFINE_PER_CPU(struct cpu_id_info, cpu_ids);
+/* perf session active flag */ +static int perf_cs_etm_session_active;
+/* lock to protect id_map and cpu data */ +static DEFINE_SPINLOCK(id_map_lock);
+/* ID 0 is reserved */ +#define CORESIGHT_TRACE_ID_RES_0 0
+/* ID 0x70 onwards are reserved */ +#define CORESIGHT_TRACE_ID_RES_RANGE_LO 0x70 +#define CORESIGHT_TRACE_ID_RES_RANGE_HI 0x7F
Since this range is at the end of top, we could clip the MAX_IDS to 0x70 and skip all these unnecessary checks and reservations. Also, by modifying the find_bit and for_each_bit slightly we could get away with this reservation scheme and the IS_VALID(id) checks.
+#define IS_VALID_ID(id) \
((id > CORESIGHT_TRACE_ID_RES_0) && (id < CORESIGHT_TRACE_ID_RES_RANGE_LO))
+static void coresight_trace_id_set_inuse(int id, struct coresight_trace_id_map *id_map) +{
if (IS_VALID_ID(id))
set_bit(id, id_map->avail_ids);
+}
Please see my comment around the definition of avail_ids.
+static void coresight_trace_id_clear_inuse(int id, struct coresight_trace_id_map *id_map) +{
if (IS_VALID_ID(id))
clear_bit(id, id_map->avail_ids);
+}
This could be :
coresight_trace_id_free_id()
+static void coresight_trace_id_set_pend_rel(int id, struct coresight_trace_id_map *id_map) +{
if (IS_VALID_ID(id))
set_bit(id, id_map->pend_rel_ids);
+}
+static void coresight_trace_id_clear_pend_rel(int id, struct coresight_trace_id_map *id_map) +{
if (IS_VALID_ID(id))
clear_bit(id, id_map->pend_rel_ids);
+}
+static int coresight_trace_id_find_new_id(struct coresight_trace_id_map *id_map)
minor nit: Could we call this :
coresight_trace_id_alloc_new_id(id_map) and
+{
int id;
id = find_first_zero_bit(id_map->avail_ids, CORESIGHT_TRACE_IDS_MAX);
minor nit: You could also do, to explicitly skip 0.
id = find_next_zero_bit(id_map->avail_ids, 1, CORESIGHT_TRACE_IDS_MAX);
if (id >= CORESIGHT_TRACE_IDS_MAX)
id = -EINVAL;
Could we also mark the id as in use here itself ? All callers of this function have to do that explicitly, anyways.
return id;
+}
+/* release all pending IDs for all current maps & clear CPU associations */ +static void coresight_trace_id_release_all_pending(void) +{
struct coresight_trace_id_map *id_map = &id_map_default;
int cpu, bit;
int cpu, bit = 1;
for_each_set_bit(bit, id_map->pend_rel_ids, CORESIGHT_TRACE_IDS_MAX) {
for_each_set_bit_from(bit, id_map...)
clear_bit(bit, id_map->avail_ids);
clear_bit(bit, id_map->pend_rel_ids);
}
for_each_possible_cpu(cpu) {
if (per_cpu(cpu_ids, cpu).pend_rel) {
per_cpu(cpu_ids, cpu).pend_rel = false;
per_cpu(cpu_ids, cpu).id = 0;
}
}
+}
+static void coresight_trace_id_init_id_map(struct coresight_trace_id_map *id_map) +{
int bit;
/* set all reserved bits as in-use */
set_bit(CORESIGHT_TRACE_ID_RES_0, id_map->avail_ids);
for (bit = CORESIGHT_TRACE_ID_RES_RANGE_LO;
bit <= CORESIGHT_TRACE_ID_RES_RANGE_HI; bit++)
set_bit(bit, id_map->avail_ids);
+}
+static int coresight_trace_id_map_get_cpu_id(int cpu, struct coresight_trace_id_map *id_map) +{
unsigned long flags;
int id;
spin_lock_irqsave(&id_map_lock, flags);
/* check for existing allocation for this CPU */
id = per_cpu(cpu_ids, cpu).id;
if (id)
goto get_cpu_id_out;
/* find a new ID */
id = coresight_trace_id_find_new_id(id_map);
if (id < 0)
goto get_cpu_id_out;
/* got a valid new ID - save details */
per_cpu(cpu_ids, cpu).id = id;
per_cpu(cpu_ids, cpu).pend_rel = false;
coresight_trace_id_set_inuse(id, id_map);
coresight_trace_id_clear_pend_rel(id, id_map);
+get_cpu_id_out:
spin_unlock_irqrestore(&id_map_lock, flags);
return id;
+}
+static void coresight_trace_id_map_put_cpu_id(int cpu, struct coresight_trace_id_map *id_map) +{
unsigned long flags;
int id;
spin_lock_irqsave(&id_map_lock, flags);
id = per_cpu(cpu_ids, cpu).id;
if (!id)
goto put_cpu_id_out;
if (perf_cs_etm_session_active) {
/* set release at pending if perf still active */
coresight_trace_id_set_pend_rel(id, id_map);
per_cpu(cpu_ids, cpu).pend_rel = true;
} else {
/* otherwise clear id */
coresight_trace_id_clear_inuse(id, id_map);
per_cpu(cpu_ids, cpu).id = 0;
}
- put_cpu_id_out:
spin_unlock_irqrestore(&id_map_lock, flags);
+}
+static int coresight_trace_id_map_get_system_id(struct coresight_trace_id_map *id_map) +{
unsigned long flags;
int id;
spin_lock_irqsave(&id_map_lock, flags);
id = coresight_trace_id_find_new_id(id_map);
if (id > 0)
coresight_trace_id_set_inuse(id, id_map);
Please see my suggestion above on moving this to the place where we find the bit.
spin_unlock_irqrestore(&id_map_lock, flags);
return id;
+}
+static void coresight_trace_id_map_put_system_id(struct coresight_trace_id_map *id_map, int id) +{
unsigned long flags;
spin_lock_irqsave(&id_map_lock, flags);
coresight_trace_id_clear_inuse(id, id_map);
spin_unlock_irqrestore(&id_map_lock, flags);
+}
+/* API functions */ +int coresight_trace_id_get_cpu_id(int cpu) +{
return coresight_trace_id_map_get_cpu_id(cpu, &id_map_default);
+} +EXPORT_SYMBOL_GPL(coresight_trace_id_get_cpu_id);
+void coresight_trace_id_put_cpu_id(int cpu) +{
coresight_trace_id_map_put_cpu_id(cpu, &id_map_default);
+} +EXPORT_SYMBOL_GPL(coresight_trace_id_put_cpu_id);
+int coresight_trace_id_get_system_id(void) +{
return coresight_trace_id_map_get_system_id(&id_map_default);
+} +EXPORT_SYMBOL_GPL(coresight_trace_id_get_system_id);
+void coresight_trace_id_put_system_id(int id) +{
coresight_trace_id_map_put_system_id(&id_map_default, id);
+} +EXPORT_SYMBOL_GPL(coresight_trace_id_put_system_id);
+void coresight_trace_id_perf_start(void) +{
unsigned long flags;
spin_lock_irqsave(&id_map_lock, flags);
perf_cs_etm_session_active++;
spin_unlock_irqrestore(&id_map_lock, flags);
+} +EXPORT_SYMBOL_GPL(coresight_trace_id_perf_start);
+void coresight_trace_id_perf_stop(void) +{
unsigned long flags;
spin_lock_irqsave(&id_map_lock, flags);
perf_cs_etm_session_active--;
if (!perf_cs_etm_session_active)
coresight_trace_id_release_all_pending();
spin_unlock_irqrestore(&id_map_lock, flags);
+} +EXPORT_SYMBOL_GPL(coresight_trace_id_perf_stop);
+void coresight_trace_id_init_default_map(void) +{
coresight_trace_id_init_id_map(&id_map_default);
+} +EXPORT_SYMBOL_GPL(coresight_trace_id_init_default_map);
We may be able to get rid of this init. Otherwise we may convert this to a module_initcall() in the worst case. No need to export this.
diff --git a/drivers/hwtracing/coresight/coresight-trace-id.h b/drivers/hwtracing/coresight/coresight-trace-id.h new file mode 100644 index 000000000000..63950087edf6 --- /dev/null +++ b/drivers/hwtracing/coresight/coresight-trace-id.h @@ -0,0 +1,65 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/*
- Copyright(C) 2022 Linaro Limited. All rights reserved.
- Author: Mike Leach mike.leach@linaro.org
- */
+#ifndef _CORESIGHT_TRACE_ID_H +#define _CORESIGHT_TRACE_ID_H
+/*
- Coresight trace ID allocation API
- With multi cpu systems, and more additional trace sources a scalable
- trace ID reservation system is required.
- The system will allocate Ids on a demand basis, and allow them to be
- released when done.
- In order to ensure that a consistent cpu / ID matching is maintained
- throughout a perf cs_etm event session - a session in progress flag will
- be maintained, and released IDs not cleared until the perf session is
- complete. This allows the same CPU to be re-allocated its prior ID.
- Trace ID maps will be created and initialised to prevent architecturally
- reserved IDs from being allocated.
- API permits multiple maps to be maintained - for large systems where
- different sets of cpus trace into different independent sinks.
- */
Thanks for the detailed comment above.
+#include <linux/bitops.h> +#include <linux/types.h>
+/* architecturally we have 128 IDs some of which are reserved */ +#define CORESIGHT_TRACE_IDS_MAX 128
Could we restrict the CORESIGHT_TRACE_IDS_MAX to 0x70, clipping the upper range of reserved ids ? That way, we could skip bothering about checking it everywhere.
+/**
- Trace ID map.
- @avail_ids: Bitmap to register available (bit = 0) and in use (bit = 1) IDs.
Initialised so that the reserved IDs are permanently marked as in use.
To be honest this inverses the intution. Could we instead name this used_ids ?
i.e BIT(i) = 1 => implies trace id is in use.
- @pend_rel_ids: CPU IDs that have been released by the trace source but not yet marked
as available, to allow re-allocation to the same CPU during a perf session.
- */
+struct coresight_trace_id_map {
DECLARE_BITMAP(avail_ids, CORESIGHT_TRACE_IDS_MAX);
DECLARE_BITMAP(pend_rel_ids, CORESIGHT_TRACE_IDS_MAX);
+};
Also, the definitions are split between the .c and .h. Could we keep all of them at one place, .h preferrably ? Or if this is not at all needed for the consumers of the API, we should keep all of this in the .c file.
I guess in the future, with the sink specific scheme, we may need to expose the helpers which accept an id_map. So may be even move it here.
I have updated the set pretty much along the lines you suggested. However there have been some changes to cope with issues thrown up by lockdep as ever, so the new set has a slightly different approach depending on perf or sysfs
Thanks for the review. New set to follow shortly.
Mike
Thanks Suzuki
Initialises the default trace ID map.
This will be used by all source drivers to be allocated their trace IDs.
The checks for sources to have unique IDs has been removed - this is now guaranteed by the ID allocation mechanisms, and inappropriate where multiple ID maps are in use in larger systems
Signed-off-by: Mike Leach mike.leach@linaro.org --- drivers/hwtracing/coresight/coresight-core.c | 49 ++------------------ 1 file changed, 4 insertions(+), 45 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c index 1edfec1e9d18..be69e05fde1f 100644 --- a/drivers/hwtracing/coresight/coresight-core.c +++ b/drivers/hwtracing/coresight/coresight-core.c @@ -22,6 +22,7 @@ #include "coresight-etm-perf.h" #include "coresight-priv.h" #include "coresight-syscfg.h" +#include "coresight-trace-id.h"
static DEFINE_MUTEX(coresight_mutex); static DEFINE_PER_CPU(struct coresight_device *, csdev_sink); @@ -84,45 +85,6 @@ struct coresight_device *coresight_get_percpu_sink(int cpu) } EXPORT_SYMBOL_GPL(coresight_get_percpu_sink);
-static int coresight_id_match(struct device *dev, void *data) -{ - int trace_id, i_trace_id; - struct coresight_device *csdev, *i_csdev; - - csdev = data; - i_csdev = to_coresight_device(dev); - - /* - * No need to care about oneself and components that are not - * sources or not enabled - */ - if (i_csdev == csdev || !i_csdev->enable || - i_csdev->type != CORESIGHT_DEV_TYPE_SOURCE) - return 0; - - /* Get the source ID for both components */ - trace_id = source_ops(csdev)->trace_id(csdev); - i_trace_id = source_ops(i_csdev)->trace_id(i_csdev); - - /* All you need is one */ - if (trace_id == i_trace_id) - return 1; - - return 0; -} - -static int coresight_source_is_unique(struct coresight_device *csdev) -{ - int trace_id = source_ops(csdev)->trace_id(csdev); - - /* this shouldn't happen */ - if (trace_id < 0) - return 0; - - return !bus_for_each_dev(&coresight_bustype, NULL, - csdev, coresight_id_match); -} - static int coresight_find_link_inport(struct coresight_device *csdev, struct coresight_device *parent) { @@ -431,12 +393,6 @@ static int coresight_enable_source(struct coresight_device *csdev, u32 mode) { int ret;
- if (!coresight_source_is_unique(csdev)) { - dev_warn(&csdev->dev, "traceID %d not unique\n", - source_ops(csdev)->trace_id(csdev)); - return -EINVAL; - } - if (!csdev->enable) { if (source_ops(csdev)->enable) { ret = coresight_control_assoc_ectdev(csdev, true); @@ -1775,6 +1731,9 @@ static int __init coresight_init(void) if (ret) goto exit_bus_unregister;
+ /* initialise the default trace ID map */ + coresight_trace_id_init_default_map(); + /* initialise the coresight syscfg API */ ret = cscfg_init(); if (!ret)
Hi Mike
On 04/07/2022 09:11, Mike Leach wrote:
Initialises the default trace ID map.
This will be used by all source drivers to be allocated their trace IDs.
As per previous patch, we may not need an explicit call from here.
The checks for sources to have unique IDs has been removed - this is now guaranteed by the ID allocation mechanisms, and inappropriate where multiple ID maps are in use in larger systems
And this looks like a candidate for a separate patch, as the sources do not use the new API yet ? Once they do, in the following patches, we could remove this code.
All said, this patch could be renamed and moved to the bottom of the series, with :
"coresight: Remove obsolete trace-id uniqueness checks"
Otherwise, looks good to me.
Signed-off-by: Mike Leach mike.leach@linaro.org
drivers/hwtracing/coresight/coresight-core.c | 49 ++------------------ 1 file changed, 4 insertions(+), 45 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c index 1edfec1e9d18..be69e05fde1f 100644 --- a/drivers/hwtracing/coresight/coresight-core.c +++ b/drivers/hwtracing/coresight/coresight-core.c @@ -22,6 +22,7 @@ #include "coresight-etm-perf.h" #include "coresight-priv.h" #include "coresight-syscfg.h" +#include "coresight-trace-id.h" static DEFINE_MUTEX(coresight_mutex); static DEFINE_PER_CPU(struct coresight_device *, csdev_sink); @@ -84,45 +85,6 @@ struct coresight_device *coresight_get_percpu_sink(int cpu) } EXPORT_SYMBOL_GPL(coresight_get_percpu_sink); -static int coresight_id_match(struct device *dev, void *data) -{
- int trace_id, i_trace_id;
- struct coresight_device *csdev, *i_csdev;
- csdev = data;
- i_csdev = to_coresight_device(dev);
- /*
* No need to care about oneself and components that are not
* sources or not enabled
*/
- if (i_csdev == csdev || !i_csdev->enable ||
i_csdev->type != CORESIGHT_DEV_TYPE_SOURCE)
return 0;
- /* Get the source ID for both components */
- trace_id = source_ops(csdev)->trace_id(csdev);
- i_trace_id = source_ops(i_csdev)->trace_id(i_csdev);
- /* All you need is one */
- if (trace_id == i_trace_id)
return 1;
- return 0;
-}
-static int coresight_source_is_unique(struct coresight_device *csdev) -{
- int trace_id = source_ops(csdev)->trace_id(csdev);
- /* this shouldn't happen */
- if (trace_id < 0)
return 0;
- return !bus_for_each_dev(&coresight_bustype, NULL,
csdev, coresight_id_match);
-}
- static int coresight_find_link_inport(struct coresight_device *csdev, struct coresight_device *parent) {
@@ -431,12 +393,6 @@ static int coresight_enable_source(struct coresight_device *csdev, u32 mode) { int ret;
- if (!coresight_source_is_unique(csdev)) {
dev_warn(&csdev->dev, "traceID %d not unique\n",
source_ops(csdev)->trace_id(csdev));
return -EINVAL;
- }
- if (!csdev->enable) { if (source_ops(csdev)->enable) { ret = coresight_control_assoc_ectdev(csdev, true);
@@ -1775,6 +1731,9 @@ static int __init coresight_init(void) if (ret) goto exit_bus_unregister;
- /* initialise the default trace ID map */
- coresight_trace_id_init_default_map();
- /* initialise the coresight syscfg API */ ret = cscfg_init(); if (!ret)
Suzuki
Updates the STM driver to use the trace ID allocation API. This uses the _system_id calls to allocate an ID on device poll, and release on device remove.
The sysfs access to the STMTRACEIDR register has been changed from RW to RO. Having this value as writable is not appropriate for the new Trace ID scheme - and had potential to cause errors in the previous scheme if values clashed with other sources.
Signed-off-by: Mike Leach mike.leach@linaro.org --- drivers/hwtracing/coresight/coresight-stm.c | 41 +++++++-------------- 1 file changed, 14 insertions(+), 27 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-stm.c b/drivers/hwtracing/coresight/coresight-stm.c index bb14a3a8a921..9ef3e923a930 100644 --- a/drivers/hwtracing/coresight/coresight-stm.c +++ b/drivers/hwtracing/coresight/coresight-stm.c @@ -31,6 +31,7 @@ #include <linux/stm.h>
#include "coresight-priv.h" +#include "coresight-trace-id.h"
#define STMDMASTARTR 0xc04 #define STMDMASTOPR 0xc08 @@ -615,24 +616,7 @@ static ssize_t traceid_show(struct device *dev, val = drvdata->traceid; return sprintf(buf, "%#lx\n", val); } - -static ssize_t traceid_store(struct device *dev, - struct device_attribute *attr, - const char *buf, size_t size) -{ - int ret; - unsigned long val; - struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent); - - ret = kstrtoul(buf, 16, &val); - if (ret) - return ret; - - /* traceid field is 7bit wide on STM32 */ - drvdata->traceid = val & 0x7f; - return size; -} -static DEVICE_ATTR_RW(traceid); +static DEVICE_ATTR_RO(traceid);
#define coresight_stm_reg(name, offset) \ coresight_simple_reg32(struct stm_drvdata, name, offset) @@ -819,14 +803,6 @@ static void stm_init_default_data(struct stm_drvdata *drvdata) */ drvdata->stmsper = ~0x0;
- /* - * The trace ID value for *ETM* tracers start at CPU_ID * 2 + 0x10 and - * anything equal to or higher than 0x70 is reserved. Since 0x00 is - * also reserved the STM trace ID needs to be higher than 0x00 and - * lowner than 0x10. - */ - drvdata->traceid = 0x1; - /* Set invariant transaction timing on all channels */ bitmap_clear(drvdata->chs.guaranteed, 0, drvdata->numsp); } @@ -854,7 +830,7 @@ static void stm_init_generic_data(struct stm_drvdata *drvdata,
static int stm_probe(struct amba_device *adev, const struct amba_id *id) { - int ret; + int ret, trace_id; void __iomem *base; struct device *dev = &adev->dev; struct coresight_platform_data *pdata = NULL; @@ -938,12 +914,22 @@ static int stm_probe(struct amba_device *adev, const struct amba_id *id) goto stm_unregister; }
+ trace_id = coresight_trace_id_get_system_id(); + if (trace_id < 0) { + ret = trace_id; + goto cs_unregister; + } + drvdata->traceid = (u8)trace_id; + pm_runtime_put(&adev->dev);
dev_info(&drvdata->csdev->dev, "%s initialized\n", (char *)coresight_get_uci_data(id)); return 0;
+cs_unregister: + coresight_unregister(drvdata->csdev); + stm_unregister: stm_unregister_device(&drvdata->stm); return ret; @@ -953,6 +939,7 @@ static void stm_remove(struct amba_device *adev) { struct stm_drvdata *drvdata = dev_get_drvdata(&adev->dev);
+ coresight_trace_id_put_system_id(drvdata->traceid); coresight_unregister(drvdata->csdev);
stm_unregister_device(&drvdata->stm);
On 04/07/2022 09:11, Mike Leach wrote:
Updates the STM driver to use the trace ID allocation API. This uses the _system_id calls to allocate an ID on device poll, and release on device remove.
The sysfs access to the STMTRACEIDR register has been changed from RW to RO. Having this value as writable is not appropriate for the new Trace ID scheme - and had potential to cause errors in the previous scheme if values clashed with other sources.
Signed-off-by: Mike Leach mike.leach@linaro.org
drivers/hwtracing/coresight/coresight-stm.c | 41 +++++++-------------- 1 file changed, 14 insertions(+), 27 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-stm.c b/drivers/hwtracing/coresight/coresight-stm.c index bb14a3a8a921..9ef3e923a930 100644 --- a/drivers/hwtracing/coresight/coresight-stm.c +++ b/drivers/hwtracing/coresight/coresight-stm.c @@ -31,6 +31,7 @@ #include <linux/stm.h> #include "coresight-priv.h" +#include "coresight-trace-id.h" #define STMDMASTARTR 0xc04 #define STMDMASTOPR 0xc08 @@ -615,24 +616,7 @@ static ssize_t traceid_show(struct device *dev, val = drvdata->traceid; return sprintf(buf, "%#lx\n", val); }
-static ssize_t traceid_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t size)
-{
- int ret;
- unsigned long val;
- struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
- ret = kstrtoul(buf, 16, &val);
- if (ret)
return ret;
- /* traceid field is 7bit wide on STM32 */
- drvdata->traceid = val & 0x7f;
- return size;
-} -static DEVICE_ATTR_RW(traceid); +static DEVICE_ATTR_RO(traceid); #define coresight_stm_reg(name, offset) \ coresight_simple_reg32(struct stm_drvdata, name, offset) @@ -819,14 +803,6 @@ static void stm_init_default_data(struct stm_drvdata *drvdata) */ drvdata->stmsper = ~0x0;
- /*
* The trace ID value for *ETM* tracers start at CPU_ID * 2 + 0x10 and
* anything equal to or higher than 0x70 is reserved. Since 0x00 is
* also reserved the STM trace ID needs to be higher than 0x00 and
* lowner than 0x10.
*/
- drvdata->traceid = 0x1;
- /* Set invariant transaction timing on all channels */ bitmap_clear(drvdata->chs.guaranteed, 0, drvdata->numsp); }
@@ -854,7 +830,7 @@ static void stm_init_generic_data(struct stm_drvdata *drvdata, static int stm_probe(struct amba_device *adev, const struct amba_id *id) {
- int ret;
- int ret, trace_id; void __iomem *base; struct device *dev = &adev->dev; struct coresight_platform_data *pdata = NULL;
@@ -938,12 +914,22 @@ static int stm_probe(struct amba_device *adev, const struct amba_id *id) goto stm_unregister; }
- trace_id = coresight_trace_id_get_system_id();
- if (trace_id < 0) {
ret = trace_id;
goto cs_unregister;
- }
- drvdata->traceid = (u8)trace_id;
- pm_runtime_put(&adev->dev);
dev_info(&drvdata->csdev->dev, "%s initialized\n", (char *)coresight_get_uci_data(id)); return 0; +cs_unregister:
- coresight_unregister(drvdata->csdev);
- stm_unregister: stm_unregister_device(&drvdata->stm); return ret;
@@ -953,6 +939,7 @@ static void stm_remove(struct amba_device *adev) { struct stm_drvdata *drvdata = dev_get_drvdata(&adev->dev);
- coresight_trace_id_put_system_id(drvdata->traceid);
This makes think that, we should add a WARN_ON() in
coresight_trace_id_put_system_id(id) {
WARN_ON(!coresight_trace_id_is_used(id));
}
Anyways, for this patch:
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
The trace ID API is now used to allocate trace IDs for ETM4.x / ETE devices.
For perf sessions, these will be allocated on enable, and released on disable.
For sysfs sessions, these will be allocated on enable, but only released on reset. This allows the sysfs session to interrogate the Trace ID used after the session is over - maintaining functional consistency with the previous allocation scheme.
The trace ID will also be allocated on read of the mgmt/trctraceid file. This ensures that if perf or sysfs read this before enabling trace, the value will be the one used for the trace session.
Trace ID initialisation is removed from the _probe() function.
Signed-off-by: Mike Leach mike.leach@linaro.org --- .../coresight/coresight-etm4x-core.c | 65 +++++++++++++++++-- .../coresight/coresight-etm4x-sysfs.c | 32 ++++++++- drivers/hwtracing/coresight/coresight-etm4x.h | 3 + 3 files changed, 91 insertions(+), 9 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index 87299e99dabb..3f4f7ddd14ec 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -42,6 +42,7 @@ #include "coresight-etm4x-cfg.h" #include "coresight-self-hosted-trace.h" #include "coresight-syscfg.h" +#include "coresight-trace-id.h"
static int boot_enable; module_param(boot_enable, int, 0444); @@ -234,6 +235,38 @@ static int etm4_trace_id(struct coresight_device *csdev) return drvdata->trcid; }
+int etm4_read_alloc_trace_id(struct etmv4_drvdata *drvdata) +{ + int trace_id; + + /* + * This will allocate a trace ID to the cpu, + * or return the one currently allocated. + */ + spin_lock(&drvdata->spinlock); + trace_id = drvdata->trcid; + if (!trace_id) { + trace_id = coresight_trace_id_get_cpu_id(drvdata->cpu); + if (trace_id > 0) + drvdata->trcid = (u8)trace_id; + } + spin_unlock(&drvdata->spinlock); + + if (trace_id <= 0) + pr_err("Failed to allocate trace ID for %s on CPU%d\n", + dev_name(&drvdata->csdev->dev), drvdata->cpu); + + return trace_id; +} + +void etm4_release_trace_id(struct etmv4_drvdata *drvdata) +{ + spin_lock(&drvdata->spinlock); + coresight_trace_id_put_cpu_id(drvdata->cpu); + drvdata->trcid = 0; + spin_unlock(&drvdata->spinlock); +} + struct etm4_enable_arg { struct etmv4_drvdata *drvdata; int rc; @@ -715,9 +748,18 @@ static int etm4_enable_perf(struct coresight_device *csdev, ret = etm4_parse_event_config(csdev, event); if (ret) goto out; + + /* allocate a trace ID */ + ret = etm4_read_alloc_trace_id(drvdata); + if (ret < 0) + goto out; + /* And enable it */ ret = etm4_enable_hw(drvdata);
+ /* failed to enable */ + if (ret) + etm4_release_trace_id(drvdata); out: return ret; } @@ -737,6 +779,11 @@ static int etm4_enable_sysfs(struct coresight_device *csdev) return ret; }
+ /* allocate a trace ID */ + ret = etm4_read_alloc_trace_id(drvdata); + if (ret < 0) + return ret; + spin_lock(&drvdata->spinlock);
/* @@ -754,6 +801,8 @@ static int etm4_enable_sysfs(struct coresight_device *csdev)
if (!ret) dev_dbg(&csdev->dev, "ETM tracing enabled\n"); + else + etm4_release_trace_id(drvdata); return ret; }
@@ -881,6 +930,9 @@ static int etm4_disable_perf(struct coresight_device *csdev, /* TRCVICTLR::SSSTATUS, bit[9] */ filters->ssstatus = (control & BIT(9));
+ /* release trace ID - this may pend release if perf session is still active */ + etm4_release_trace_id(drvdata); + return 0; }
@@ -906,6 +958,13 @@ static void etm4_disable_sysfs(struct coresight_device *csdev) spin_unlock(&drvdata->spinlock); cpus_read_unlock();
+ /* + * unlike for perf session - we only release trace IDs when resetting + * sysfs. This permits sysfs users to read the trace ID after the trace + * session has completed. This maintains operational behaviour with + * prior trace id allocation method + */ + dev_dbg(&csdev->dev, "ETM tracing disabled\n"); }
@@ -1548,11 +1607,6 @@ static int etm4_dying_cpu(unsigned int cpu) return 0; }
-static void etm4_init_trace_id(struct etmv4_drvdata *drvdata) -{ - drvdata->trcid = coresight_get_trace_id(drvdata->cpu); -} - static int __etm4_cpu_save(struct etmv4_drvdata *drvdata) { int i, ret = 0; @@ -1957,7 +2011,6 @@ static int etm4_probe(struct device *dev, void __iomem *base, u32 etm_pid) if (!desc.name) return -ENOMEM;
- etm4_init_trace_id(drvdata); etm4_set_default(&drvdata->config);
pdata = coresight_get_platform_data(dev); diff --git a/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c b/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c index 6ea8181816fc..c7f896a020d9 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c @@ -266,10 +266,11 @@ static ssize_t reset_store(struct device *dev, config->vmid_mask0 = 0x0; config->vmid_mask1 = 0x0;
- drvdata->trcid = drvdata->cpu + 1; - spin_unlock(&drvdata->spinlock);
+ /* for sysfs - only release trace id when resetting */ + etm4_release_trace_id(drvdata); + cscfg_csdev_reset_feats(to_coresight_device(dev));
return size; @@ -2363,6 +2364,31 @@ static struct attribute *coresight_etmv4_attrs[] = { NULL, };
+/* + * Trace ID allocated dynamically on enable - but also allocate on read + * in case sysfs or perf read before enable to ensure consistent metadata + * information for trace decode + */ +static ssize_t trctraceid_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + int trace_id; + struct etmv4_drvdata *drvdata = dev_get_drvdata(dev->parent); + + trace_id = etm4_read_alloc_trace_id(drvdata); + if (trace_id < 0) + return trace_id; + + return scnprintf(buf, PAGE_SIZE, "0x%x\n", trace_id); +} + +/* mgmt group uses extended attributes - no standard macro available */ +static struct dev_ext_attribute dev_attr_trctraceid = { + __ATTR(trctraceid, 0444, trctraceid_show, NULL), + (void *)(unsigned long)TRCTRACEIDR +}; + struct etmv4_reg { struct coresight_device *csdev; u32 offset; @@ -2499,7 +2525,7 @@ static struct attribute *coresight_etmv4_mgmt_attrs[] = { coresight_etm4x_reg(trcpidr3, TRCPIDR3), coresight_etm4x_reg(trcoslsr, TRCOSLSR), coresight_etm4x_reg(trcconfig, TRCCONFIGR), - coresight_etm4x_reg(trctraceid, TRCTRACEIDR), + &dev_attr_trctraceid.attr.attr, coresight_etm4x_reg(trcdevarch, TRCDEVARCH), NULL, }; diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h index 33869c1d20c3..e0a9d334375d 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.h +++ b/drivers/hwtracing/coresight/coresight-etm4x.h @@ -1094,4 +1094,7 @@ static inline bool etm4x_is_ete(struct etmv4_drvdata *drvdata) { return drvdata->arch >= ETM_ARCH_ETE; } + +int etm4_read_alloc_trace_id(struct etmv4_drvdata *drvdata); +void etm4_release_trace_id(struct etmv4_drvdata *drvdata); #endif
On 04/07/2022 09:11, Mike Leach wrote:
The trace ID API is now used to allocate trace IDs for ETM4.x / ETE devices.
For perf sessions, these will be allocated on enable, and released on disable.
For sysfs sessions, these will be allocated on enable, but only released on reset. This allows the sysfs session to interrogate the Trace ID used after the session is over - maintaining functional consistency with the previous allocation scheme.
The trace ID will also be allocated on read of the mgmt/trctraceid file. This ensures that if perf or sysfs read this before enabling trace, the value will be the one used for the trace session.
Trace ID initialisation is removed from the _probe() function.
Signed-off-by: Mike Leach mike.leach@linaro.org
.../coresight/coresight-etm4x-core.c | 65 +++++++++++++++++-- .../coresight/coresight-etm4x-sysfs.c | 32 ++++++++- drivers/hwtracing/coresight/coresight-etm4x.h | 3 + 3 files changed, 91 insertions(+), 9 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index 87299e99dabb..3f4f7ddd14ec 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -42,6 +42,7 @@ #include "coresight-etm4x-cfg.h" #include "coresight-self-hosted-trace.h" #include "coresight-syscfg.h" +#include "coresight-trace-id.h" static int boot_enable; module_param(boot_enable, int, 0444); @@ -234,6 +235,38 @@ static int etm4_trace_id(struct coresight_device *csdev) return drvdata->trcid; } +int etm4_read_alloc_trace_id(struct etmv4_drvdata *drvdata) +{
- int trace_id;
- /*
* This will allocate a trace ID to the cpu,
* or return the one currently allocated.
*/
- spin_lock(&drvdata->spinlock);
- trace_id = drvdata->trcid;
- if (!trace_id) {
trace_id = coresight_trace_id_get_cpu_id(drvdata->cpu);
if (trace_id > 0)
drvdata->trcid = (u8)trace_id;
- }
- spin_unlock(&drvdata->spinlock);
- if (trace_id <= 0)
pr_err("Failed to allocate trace ID for %s on CPU%d\n",
dev_name(&drvdata->csdev->dev), drvdata->cpu);
dev_err(&drvdata->csdev->dev, ....);
- return trace_id;
+}
+void etm4_release_trace_id(struct etmv4_drvdata *drvdata) +{
- spin_lock(&drvdata->spinlock);
- coresight_trace_id_put_cpu_id(drvdata->cpu);
- drvdata->trcid = 0;
- spin_unlock(&drvdata->spinlock);
+}
- struct etm4_enable_arg { struct etmv4_drvdata *drvdata; int rc;
@@ -715,9 +748,18 @@ static int etm4_enable_perf(struct coresight_device *csdev, ret = etm4_parse_event_config(csdev, event); if (ret) goto out;
- /* allocate a trace ID */
- ret = etm4_read_alloc_trace_id(drvdata);
- if (ret < 0)
goto out;
- /* And enable it */ ret = etm4_enable_hw(drvdata);
- /* failed to enable */
- if (ret)
out: return ret; }etm4_release_trace_id(drvdata);
@@ -737,6 +779,11 @@ static int etm4_enable_sysfs(struct coresight_device *csdev) return ret; }
- /* allocate a trace ID */
- ret = etm4_read_alloc_trace_id(drvdata);
- if (ret < 0)
return ret;
- spin_lock(&drvdata->spinlock);
/* @@ -754,6 +801,8 @@ static int etm4_enable_sysfs(struct coresight_device *csdev) if (!ret) dev_dbg(&csdev->dev, "ETM tracing enabled\n");
- else
return ret; }etm4_release_trace_id(drvdata);
@@ -881,6 +930,9 @@ static int etm4_disable_perf(struct coresight_device *csdev, /* TRCVICTLR::SSSTATUS, bit[9] */ filters->ssstatus = (control & BIT(9));
- /* release trace ID - this may pend release if perf session is still active */
- etm4_release_trace_id(drvdata);
- return 0; }
@@ -906,6 +958,13 @@ static void etm4_disable_sysfs(struct coresight_device *csdev) spin_unlock(&drvdata->spinlock); cpus_read_unlock();
- /*
* unlike for perf session - we only release trace IDs when resetting
* sysfs. This permits sysfs users to read the trace ID after the trace
* session has completed. This maintains operational behaviour with
* prior trace id allocation method
*/
- dev_dbg(&csdev->dev, "ETM tracing disabled\n"); }
@@ -1548,11 +1607,6 @@ static int etm4_dying_cpu(unsigned int cpu) return 0; } -static void etm4_init_trace_id(struct etmv4_drvdata *drvdata) -{
- drvdata->trcid = coresight_get_trace_id(drvdata->cpu);
-}
- static int __etm4_cpu_save(struct etmv4_drvdata *drvdata) { int i, ret = 0;
@@ -1957,7 +2011,6 @@ static int etm4_probe(struct device *dev, void __iomem *base, u32 etm_pid) if (!desc.name) return -ENOMEM;
- etm4_init_trace_id(drvdata); etm4_set_default(&drvdata->config);
pdata = coresight_get_platform_data(dev); diff --git a/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c b/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c index 6ea8181816fc..c7f896a020d9 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c @@ -266,10 +266,11 @@ static ssize_t reset_store(struct device *dev, config->vmid_mask0 = 0x0; config->vmid_mask1 = 0x0;
- drvdata->trcid = drvdata->cpu + 1;
- spin_unlock(&drvdata->spinlock);
- /* for sysfs - only release trace id when resetting */
- etm4_release_trace_id(drvdata);
- cscfg_csdev_reset_feats(to_coresight_device(dev));
return size; @@ -2363,6 +2364,31 @@ static struct attribute *coresight_etmv4_attrs[] = { NULL, }; +/*
- Trace ID allocated dynamically on enable - but also allocate on read
- in case sysfs or perf read before enable to ensure consistent metadata
- information for trace decode
- */
+static ssize_t trctraceid_show(struct device *dev,
struct device_attribute *attr,
char *buf)
+{
- int trace_id;
- struct etmv4_drvdata *drvdata = dev_get_drvdata(dev->parent);
- trace_id = etm4_read_alloc_trace_id(drvdata);
- if (trace_id < 0)
return trace_id;
- return scnprintf(buf, PAGE_SIZE, "0x%x\n", trace_id);
nit: sysfs_emit(buf, "0x%x\n", trace_id);
+}
+/* mgmt group uses extended attributes - no standard macro available */
That doesn't prevent us from using dev_attribute for traceid. In the end, mgmt group is a collection of "struct attribute *". All it matters is for the "show" function to decode how to print the value from the "attribute".
You should be able to use DEVICE_ATTR_RO here ...
+static struct dev_ext_attribute dev_attr_trctraceid = {
__ATTR(trctraceid, 0444, trctraceid_show, NULL),
(void *)(unsigned long)TRCTRACEIDR > +};
... and get rid of this. Otherwise looks fine to me.
Suzuki
struct etmv4_reg { struct coresight_device *csdev; u32 offset; @@ -2499,7 +2525,7 @@ static struct attribute *coresight_etmv4_mgmt_attrs[] = { coresight_etm4x_reg(trcpidr3, TRCPIDR3), coresight_etm4x_reg(trcoslsr, TRCOSLSR), coresight_etm4x_reg(trcconfig, TRCCONFIGR),
- coresight_etm4x_reg(trctraceid, TRCTRACEIDR),
- &dev_attr_trctraceid.attr.attr, coresight_etm4x_reg(trcdevarch, TRCDEVARCH), NULL, };
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h index 33869c1d20c3..e0a9d334375d 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.h +++ b/drivers/hwtracing/coresight/coresight-etm4x.h @@ -1094,4 +1094,7 @@ static inline bool etm4x_is_ete(struct etmv4_drvdata *drvdata) { return drvdata->arch >= ETM_ARCH_ETE; }
+int etm4_read_alloc_trace_id(struct etmv4_drvdata *drvdata); +void etm4_release_trace_id(struct etmv4_drvdata *drvdata); #endif
Use the TraceID API to allocate ETM trace IDs dynamically.
As with the etm4x we allocate on enable / disable for perf, allocate on enable / reset for sysfs.
Additionally we allocate on sysfs file read as both perf and sysfs can read the ID before enabling the hardware.
Remove sysfs option to write trace ID - which is inconsistent with both the dynamic allocation method and the fixed allocation method previously used.
Signed-off-by: Mike Leach mike.leach@linaro.org --- drivers/hwtracing/coresight/coresight-etm.h | 2 + .../coresight/coresight-etm3x-core.c | 68 +++++++++++++++++-- .../coresight/coresight-etm3x-sysfs.c | 28 +++----- 3 files changed, 71 insertions(+), 27 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm.h b/drivers/hwtracing/coresight/coresight-etm.h index f3ab96eaf44e..3667428d38b6 100644 --- a/drivers/hwtracing/coresight/coresight-etm.h +++ b/drivers/hwtracing/coresight/coresight-etm.h @@ -287,4 +287,6 @@ int etm_get_trace_id(struct etm_drvdata *drvdata); void etm_set_default(struct etm_config *config); void etm_config_trace_mode(struct etm_config *config); struct etm_config *get_etm_config(struct etm_drvdata *drvdata); +int etm_read_alloc_trace_id(struct etm_drvdata *drvdata); +void etm_release_trace_id(struct etm_drvdata *drvdata); #endif diff --git a/drivers/hwtracing/coresight/coresight-etm3x-core.c b/drivers/hwtracing/coresight/coresight-etm3x-core.c index d0ab9933472b..273f37be322b 100644 --- a/drivers/hwtracing/coresight/coresight-etm3x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm3x-core.c @@ -32,6 +32,7 @@
#include "coresight-etm.h" #include "coresight-etm-perf.h" +#include "coresight-trace-id.h"
/* * Not really modular but using module_param is the easiest way to @@ -490,18 +491,61 @@ static int etm_trace_id(struct coresight_device *csdev) return etm_get_trace_id(drvdata); }
+int etm_read_alloc_trace_id(struct etm_drvdata *drvdata) +{ + int trace_id; + + /* + * This will allocate a trace ID to the cpu, + * or return the one currently allocated. + */ + spin_lock(&drvdata->spinlock); + trace_id = drvdata->traceid; + if (!trace_id) { + trace_id = coresight_trace_id_get_cpu_id(drvdata->cpu); + if (trace_id > 0) + drvdata->traceid = (u8)trace_id; + } + spin_unlock(&drvdata->spinlock); + + if (trace_id <= 0) + pr_err("Failed to allocate trace ID for %s on CPU%d\n", + dev_name(&drvdata->csdev->dev), drvdata->cpu); + + return trace_id; +} + +void etm_release_trace_id(struct etm_drvdata *drvdata) +{ + spin_lock(&drvdata->spinlock); + coresight_trace_id_put_cpu_id(drvdata->cpu); + drvdata->traceid = 0; + spin_unlock(&drvdata->spinlock); +} + static int etm_enable_perf(struct coresight_device *csdev, struct perf_event *event) { struct etm_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); + int ret;
if (WARN_ON_ONCE(drvdata->cpu != smp_processor_id())) return -EINVAL;
/* Configure the tracer based on the session's specifics */ etm_parse_event_config(drvdata, event); + + /* allocate a trace ID */ + ret = etm_read_alloc_trace_id(drvdata); + if (ret < 0) + return ret; + /* And enable it */ - return etm_enable_hw(drvdata); + ret = etm_enable_hw(drvdata); + + if (ret) + etm_release_trace_id(drvdata); + return ret; }
static int etm_enable_sysfs(struct coresight_device *csdev) @@ -510,6 +554,11 @@ static int etm_enable_sysfs(struct coresight_device *csdev) struct etm_enable_arg arg = { }; int ret;
+ /* allocate a trace ID */ + ret = etm_read_alloc_trace_id(drvdata); + if (ret < 0) + return ret; + spin_lock(&drvdata->spinlock);
/* @@ -532,6 +581,8 @@ static int etm_enable_sysfs(struct coresight_device *csdev)
if (!ret) dev_dbg(&csdev->dev, "ETM tracing enabled\n"); + else + etm_release_trace_id(drvdata); return ret; }
@@ -611,6 +662,8 @@ static void etm_disable_perf(struct coresight_device *csdev) coresight_disclaim_device_unlocked(csdev);
CS_LOCK(drvdata->base); + + etm_release_trace_id(drvdata); }
static void etm_disable_sysfs(struct coresight_device *csdev) @@ -635,6 +688,13 @@ static void etm_disable_sysfs(struct coresight_device *csdev) spin_unlock(&drvdata->spinlock); cpus_read_unlock();
+ /* + * unlike for perf session - we only release trace IDs when resetting + * sysfs. This permits sysfs users to read the trace ID after the trace + * session has completed. This maintains operational behaviour with + * prior trace id allocation method + */ + dev_dbg(&csdev->dev, "ETM tracing disabled\n"); }
@@ -781,11 +841,6 @@ static void etm_init_arch_data(void *info) CS_LOCK(drvdata->base); }
-static void etm_init_trace_id(struct etm_drvdata *drvdata) -{ - drvdata->traceid = coresight_get_trace_id(drvdata->cpu); -} - static int __init etm_hp_setup(void) { int ret; @@ -871,7 +926,6 @@ static int etm_probe(struct amba_device *adev, const struct amba_id *id) if (etm_arch_supported(drvdata->arch) == false) return -EINVAL;
- etm_init_trace_id(drvdata); etm_set_default(&drvdata->config);
pdata = coresight_get_platform_data(dev); diff --git a/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c b/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c index 68fcbf4ce7a8..962d6ac96d64 100644 --- a/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c +++ b/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c @@ -86,6 +86,8 @@ static ssize_t reset_store(struct device *dev,
etm_set_default(config); spin_unlock(&drvdata->spinlock); + /* release trace id outside the spinlock as this fn uses it */ + etm_release_trace_id(drvdata); }
return size; @@ -1189,30 +1191,16 @@ static DEVICE_ATTR_RO(cpu); static ssize_t traceid_show(struct device *dev, struct device_attribute *attr, char *buf) { - unsigned long val; - struct etm_drvdata *drvdata = dev_get_drvdata(dev->parent); - - val = etm_get_trace_id(drvdata); - - return sprintf(buf, "%#lx\n", val); -} - -static ssize_t traceid_store(struct device *dev, - struct device_attribute *attr, - const char *buf, size_t size) -{ - int ret; - unsigned long val; + int trace_id; struct etm_drvdata *drvdata = dev_get_drvdata(dev->parent);
- ret = kstrtoul(buf, 16, &val); - if (ret) - return ret; + trace_id = etm_read_alloc_trace_id(drvdata); + if (trace_id < 0) + return trace_id;
- drvdata->traceid = val & ETM_TRACEID_MASK; - return size; + return sprintf(buf, "%#x\n", trace_id); } -static DEVICE_ATTR_RW(traceid); +static DEVICE_ATTR_RO(traceid);
static struct attribute *coresight_etm_attrs[] = { &dev_attr_nr_addr_cmp.attr,
On 04/07/2022 09:11, Mike Leach wrote:
Use the TraceID API to allocate ETM trace IDs dynamically.
As with the etm4x we allocate on enable / disable for perf, allocate on enable / reset for sysfs.
Additionally we allocate on sysfs file read as both perf and sysfs can read the ID before enabling the hardware.
Remove sysfs option to write trace ID - which is inconsistent with both the dynamic allocation method and the fixed allocation method previously used.
Signed-off-by: Mike Leach mike.leach@linaro.org
drivers/hwtracing/coresight/coresight-etm.h | 2 + .../coresight/coresight-etm3x-core.c | 68 +++++++++++++++++-- .../coresight/coresight-etm3x-sysfs.c | 28 +++----- 3 files changed, 71 insertions(+), 27 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm.h b/drivers/hwtracing/coresight/coresight-etm.h index f3ab96eaf44e..3667428d38b6 100644 --- a/drivers/hwtracing/coresight/coresight-etm.h +++ b/drivers/hwtracing/coresight/coresight-etm.h @@ -287,4 +287,6 @@ int etm_get_trace_id(struct etm_drvdata *drvdata); void etm_set_default(struct etm_config *config); void etm_config_trace_mode(struct etm_config *config); struct etm_config *get_etm_config(struct etm_drvdata *drvdata); +int etm_read_alloc_trace_id(struct etm_drvdata *drvdata); +void etm_release_trace_id(struct etm_drvdata *drvdata); #endif diff --git a/drivers/hwtracing/coresight/coresight-etm3x-core.c b/drivers/hwtracing/coresight/coresight-etm3x-core.c index d0ab9933472b..273f37be322b 100644 --- a/drivers/hwtracing/coresight/coresight-etm3x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm3x-core.c @@ -32,6 +32,7 @@ #include "coresight-etm.h" #include "coresight-etm-perf.h" +#include "coresight-trace-id.h" /*
- Not really modular but using module_param is the easiest way to
@@ -490,18 +491,61 @@ static int etm_trace_id(struct coresight_device *csdev) return etm_get_trace_id(drvdata); } +int etm_read_alloc_trace_id(struct etm_drvdata *drvdata) +{
- int trace_id;
- /*
* This will allocate a trace ID to the cpu,
* or return the one currently allocated.
*/
- spin_lock(&drvdata->spinlock);
- trace_id = drvdata->traceid;
- if (!trace_id) {
trace_id = coresight_trace_id_get_cpu_id(drvdata->cpu);
if (trace_id > 0)
drvdata->traceid = (u8)trace_id;
- }
- spin_unlock(&drvdata->spinlock);
- if (trace_id <= 0)
pr_err("Failed to allocate trace ID for %s on CPU%d\n",
dev_name(&drvdata->csdev->dev), drvdata->cpu);
dev_err(&drvdata->csdev->dev, ....)
- return trace_id;
+}
+void etm_release_trace_id(struct etm_drvdata *drvdata) +{
- spin_lock(&drvdata->spinlock);
- coresight_trace_id_put_cpu_id(drvdata->cpu);
- drvdata->traceid = 0;
- spin_unlock(&drvdata->spinlock);
+}
- static int etm_enable_perf(struct coresight_device *csdev, struct perf_event *event) { struct etm_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
- int ret;
if (WARN_ON_ONCE(drvdata->cpu != smp_processor_id())) return -EINVAL; /* Configure the tracer based on the session's specifics */ etm_parse_event_config(drvdata, event);
- /* allocate a trace ID */
- ret = etm_read_alloc_trace_id(drvdata);
- if (ret < 0)
return ret;
- /* And enable it */
- return etm_enable_hw(drvdata);
- ret = etm_enable_hw(drvdata);
- if (ret)
etm_release_trace_id(drvdata);
- return ret; }
static int etm_enable_sysfs(struct coresight_device *csdev) @@ -510,6 +554,11 @@ static int etm_enable_sysfs(struct coresight_device *csdev) struct etm_enable_arg arg = { }; int ret;
- /* allocate a trace ID */
- ret = etm_read_alloc_trace_id(drvdata);
- if (ret < 0)
return ret;
- spin_lock(&drvdata->spinlock);
/* @@ -532,6 +581,8 @@ static int etm_enable_sysfs(struct coresight_device *csdev) if (!ret) dev_dbg(&csdev->dev, "ETM tracing enabled\n");
- else
return ret; }etm_release_trace_id(drvdata);
@@ -611,6 +662,8 @@ static void etm_disable_perf(struct coresight_device *csdev) coresight_disclaim_device_unlocked(csdev); CS_LOCK(drvdata->base);
- etm_release_trace_id(drvdata); }
static void etm_disable_sysfs(struct coresight_device *csdev) @@ -635,6 +688,13 @@ static void etm_disable_sysfs(struct coresight_device *csdev) spin_unlock(&drvdata->spinlock); cpus_read_unlock();
- /*
* unlike for perf session - we only release trace IDs when resetting
* sysfs. This permits sysfs users to read the trace ID after the trace
* session has completed. This maintains operational behaviour with
* prior trace id allocation method
*/
- dev_dbg(&csdev->dev, "ETM tracing disabled\n"); }
@@ -781,11 +841,6 @@ static void etm_init_arch_data(void *info) CS_LOCK(drvdata->base); } -static void etm_init_trace_id(struct etm_drvdata *drvdata) -{
- drvdata->traceid = coresight_get_trace_id(drvdata->cpu);
-}
- static int __init etm_hp_setup(void) { int ret;
@@ -871,7 +926,6 @@ static int etm_probe(struct amba_device *adev, const struct amba_id *id) if (etm_arch_supported(drvdata->arch) == false) return -EINVAL;
- etm_init_trace_id(drvdata); etm_set_default(&drvdata->config);
pdata = coresight_get_platform_data(dev); diff --git a/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c b/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c index 68fcbf4ce7a8..962d6ac96d64 100644 --- a/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c +++ b/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c @@ -86,6 +86,8 @@ static ssize_t reset_store(struct device *dev, etm_set_default(config); spin_unlock(&drvdata->spinlock);
/* release trace id outside the spinlock as this fn uses it */
}etm_release_trace_id(drvdata);
return size; @@ -1189,30 +1191,16 @@ static DEVICE_ATTR_RO(cpu); static ssize_t traceid_show(struct device *dev, struct device_attribute *attr, char *buf) {
- unsigned long val;
- struct etm_drvdata *drvdata = dev_get_drvdata(dev->parent);
- val = etm_get_trace_id(drvdata);
- return sprintf(buf, "%#lx\n", val);
-}
-static ssize_t traceid_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t size)
-{
- int ret;
- unsigned long val;
- int trace_id; struct etm_drvdata *drvdata = dev_get_drvdata(dev->parent);
- ret = kstrtoul(buf, 16, &val);
- if (ret)
return ret;
- trace_id = etm_read_alloc_trace_id(drvdata);
- if (trace_id < 0)
return trace_id;
- drvdata->traceid = val & ETM_TRACEID_MASK;
- return size;
- return sprintf(buf, "%#x\n", trace_id);
nit: while at this, please could we switch to sysfs_emit(). Rest looks fine to me.
Suzuki
CoreSight sources provide a callback (.trace_id) in the standard source ops which returns the ID to the core code. This was used to check that sources all had a unique Trace ID.
Uniqueness is now gauranteed by the Trace ID allocation system, and the check code has been removed from the core.
This patch removes the unneeded and unused .trace_id source ops from the ops structure and implementations in etm3x, etm4x and stm.
Signed-off-by: Mike Leach mike.leach@linaro.org --- drivers/hwtracing/coresight/coresight-etm.h | 1 - .../coresight/coresight-etm3x-core.c | 37 ------------------- .../coresight/coresight-etm4x-core.c | 8 ---- drivers/hwtracing/coresight/coresight-stm.c | 8 ---- include/linux/coresight.h | 3 -- 5 files changed, 57 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm.h b/drivers/hwtracing/coresight/coresight-etm.h index 3667428d38b6..9a0d08b092ae 100644 --- a/drivers/hwtracing/coresight/coresight-etm.h +++ b/drivers/hwtracing/coresight/coresight-etm.h @@ -283,7 +283,6 @@ static inline unsigned int etm_readl(struct etm_drvdata *drvdata, u32 off) }
extern const struct attribute_group *coresight_etm_groups[]; -int etm_get_trace_id(struct etm_drvdata *drvdata); void etm_set_default(struct etm_config *config); void etm_config_trace_mode(struct etm_config *config); struct etm_config *get_etm_config(struct etm_drvdata *drvdata); diff --git a/drivers/hwtracing/coresight/coresight-etm3x-core.c b/drivers/hwtracing/coresight/coresight-etm3x-core.c index 273f37be322b..911d961dd736 100644 --- a/drivers/hwtracing/coresight/coresight-etm3x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm3x-core.c @@ -455,42 +455,6 @@ static int etm_cpu_id(struct coresight_device *csdev) return drvdata->cpu; }
-int etm_get_trace_id(struct etm_drvdata *drvdata) -{ - unsigned long flags; - int trace_id = -1; - struct device *etm_dev; - - if (!drvdata) - goto out; - - etm_dev = drvdata->csdev->dev.parent; - if (!local_read(&drvdata->mode)) - return drvdata->traceid; - - pm_runtime_get_sync(etm_dev); - - spin_lock_irqsave(&drvdata->spinlock, flags); - - CS_UNLOCK(drvdata->base); - trace_id = (etm_readl(drvdata, ETMTRACEIDR) & ETM_TRACEID_MASK); - CS_LOCK(drvdata->base); - - spin_unlock_irqrestore(&drvdata->spinlock, flags); - pm_runtime_put(etm_dev); - -out: - return trace_id; - -} - -static int etm_trace_id(struct coresight_device *csdev) -{ - struct etm_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); - - return etm_get_trace_id(drvdata); -} - int etm_read_alloc_trace_id(struct etm_drvdata *drvdata) { int trace_id; @@ -731,7 +695,6 @@ static void etm_disable(struct coresight_device *csdev,
static const struct coresight_ops_source etm_source_ops = { .cpu_id = etm_cpu_id, - .trace_id = etm_trace_id, .enable = etm_enable, .disable = etm_disable, }; diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index 3f4f7ddd14ec..b7c7980cc71c 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -228,13 +228,6 @@ static int etm4_cpu_id(struct coresight_device *csdev) return drvdata->cpu; }
-static int etm4_trace_id(struct coresight_device *csdev) -{ - struct etmv4_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); - - return drvdata->trcid; -} - int etm4_read_alloc_trace_id(struct etmv4_drvdata *drvdata) { int trace_id; @@ -998,7 +991,6 @@ static void etm4_disable(struct coresight_device *csdev,
static const struct coresight_ops_source etm4_source_ops = { .cpu_id = etm4_cpu_id, - .trace_id = etm4_trace_id, .enable = etm4_enable, .disable = etm4_disable, }; diff --git a/drivers/hwtracing/coresight/coresight-stm.c b/drivers/hwtracing/coresight/coresight-stm.c index 9ef3e923a930..f4b4232614b0 100644 --- a/drivers/hwtracing/coresight/coresight-stm.c +++ b/drivers/hwtracing/coresight/coresight-stm.c @@ -281,15 +281,7 @@ static void stm_disable(struct coresight_device *csdev, } }
-static int stm_trace_id(struct coresight_device *csdev) -{ - struct stm_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); - - return drvdata->traceid; -} - static const struct coresight_ops_source stm_source_ops = { - .trace_id = stm_trace_id, .enable = stm_enable, .disable = stm_disable, }; diff --git a/include/linux/coresight.h b/include/linux/coresight.h index 9f445f09fcfe..247147c11231 100644 --- a/include/linux/coresight.h +++ b/include/linux/coresight.h @@ -314,14 +314,11 @@ struct coresight_ops_link { * Operations available for sources. * @cpu_id: returns the value of the CPU number this component * is associated to. - * @trace_id: returns the value of the component's trace ID as known - * to the HW. * @enable: enables tracing for a source. * @disable: disables tracing for a source. */ struct coresight_ops_source { int (*cpu_id)(struct coresight_device *csdev); - int (*trace_id)(struct coresight_device *csdev); int (*enable)(struct coresight_device *csdev, struct perf_event *event, u32 mode); void (*disable)(struct coresight_device *csdev,
Hi Mike
Nice diff stat !
Also minor nit on subject:
coresight: source: Remove trace_id() call back
On 04/07/2022 09:11, Mike Leach wrote:
CoreSight sources provide a callback (.trace_id) in the standard source ops which returns the ID to the core code. This was used to check that sources all had a unique Trace ID.
Uniqueness is now gauranteed by the Trace ID allocation system, and the check code has been removed from the core.
This patch removes the unneeded and unused .trace_id source ops from the ops structure and implementations in etm3x, etm4x and stm.
Signed-off-by: Mike Leach mike.leach@linaro.org
drivers/hwtracing/coresight/coresight-etm.h | 1 - .../coresight/coresight-etm3x-core.c | 37 ------------------- .../coresight/coresight-etm4x-core.c | 8 ---- drivers/hwtracing/coresight/coresight-stm.c | 8 ---- include/linux/coresight.h | 3 -- 5 files changed, 57 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm.h b/drivers/hwtracing/coresight/coresight-etm.h index 3667428d38b6..9a0d08b092ae 100644 --- a/drivers/hwtracing/coresight/coresight-etm.h +++ b/drivers/hwtracing/coresight/coresight-etm.h @@ -283,7 +283,6 @@ static inline unsigned int etm_readl(struct etm_drvdata *drvdata, u32 off) } extern const struct attribute_group *coresight_etm_groups[]; -int etm_get_trace_id(struct etm_drvdata *drvdata); void etm_set_default(struct etm_config *config); void etm_config_trace_mode(struct etm_config *config); struct etm_config *get_etm_config(struct etm_drvdata *drvdata); diff --git a/drivers/hwtracing/coresight/coresight-etm3x-core.c b/drivers/hwtracing/coresight/coresight-etm3x-core.c index 273f37be322b..911d961dd736 100644 --- a/drivers/hwtracing/coresight/coresight-etm3x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm3x-core.c @@ -455,42 +455,6 @@ static int etm_cpu_id(struct coresight_device *csdev) return drvdata->cpu; } -int etm_get_trace_id(struct etm_drvdata *drvdata) -{
- unsigned long flags;
- int trace_id = -1;
- struct device *etm_dev;
- if (!drvdata)
goto out;
- etm_dev = drvdata->csdev->dev.parent;
- if (!local_read(&drvdata->mode))
return drvdata->traceid;
- pm_runtime_get_sync(etm_dev);
- spin_lock_irqsave(&drvdata->spinlock, flags);
- CS_UNLOCK(drvdata->base);
- trace_id = (etm_readl(drvdata, ETMTRACEIDR) & ETM_TRACEID_MASK);
- CS_LOCK(drvdata->base);
- spin_unlock_irqrestore(&drvdata->spinlock, flags);
- pm_runtime_put(etm_dev);
-out:
- return trace_id;
-}
-static int etm_trace_id(struct coresight_device *csdev) -{
- struct etm_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
- return etm_get_trace_id(drvdata);
-}
- int etm_read_alloc_trace_id(struct etm_drvdata *drvdata) { int trace_id;
@@ -731,7 +695,6 @@ static void etm_disable(struct coresight_device *csdev, static const struct coresight_ops_source etm_source_ops = { .cpu_id = etm_cpu_id,
- .trace_id = etm_trace_id, .enable = etm_enable, .disable = etm_disable, };
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index 3f4f7ddd14ec..b7c7980cc71c 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -228,13 +228,6 @@ static int etm4_cpu_id(struct coresight_device *csdev) return drvdata->cpu; } -static int etm4_trace_id(struct coresight_device *csdev) -{
- struct etmv4_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
- return drvdata->trcid;
-}
- int etm4_read_alloc_trace_id(struct etmv4_drvdata *drvdata) { int trace_id;
@@ -998,7 +991,6 @@ static void etm4_disable(struct coresight_device *csdev, static const struct coresight_ops_source etm4_source_ops = { .cpu_id = etm4_cpu_id,
- .trace_id = etm4_trace_id, .enable = etm4_enable, .disable = etm4_disable, };
diff --git a/drivers/hwtracing/coresight/coresight-stm.c b/drivers/hwtracing/coresight/coresight-stm.c index 9ef3e923a930..f4b4232614b0 100644 --- a/drivers/hwtracing/coresight/coresight-stm.c +++ b/drivers/hwtracing/coresight/coresight-stm.c @@ -281,15 +281,7 @@ static void stm_disable(struct coresight_device *csdev, } } -static int stm_trace_id(struct coresight_device *csdev) -{
- struct stm_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
- return drvdata->traceid;
-}
- static const struct coresight_ops_source stm_source_ops = {
- .trace_id = stm_trace_id, .enable = stm_enable, .disable = stm_disable, };
diff --git a/include/linux/coresight.h b/include/linux/coresight.h index 9f445f09fcfe..247147c11231 100644 --- a/include/linux/coresight.h +++ b/include/linux/coresight.h @@ -314,14 +314,11 @@ struct coresight_ops_link {
- Operations available for sources.
- @cpu_id: returns the value of the CPU number this component
is associated to.
- @trace_id: returns the value of the component's trace ID as known
*/ struct coresight_ops_source { int (*cpu_id)(struct coresight_device *csdev);
to the HW.
- @enable: enables tracing for a source.
- @disable: disables tracing for a source.
- int (*trace_id)(struct coresight_device *csdev); int (*enable)(struct coresight_device *csdev, struct perf_event *event, u32 mode); void (*disable)(struct coresight_device *csdev,
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
Adds in notifier calls to the trace ID allocator that perf events are starting and stopping.
This ensures that Trace IDs associated with CPUs remain the same throughout the perf session, and are only released when all perf sessions are complete.
Signed-off-by: Mike Leach mike.leach@linaro.org --- drivers/hwtracing/coresight/coresight-etm-perf.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index c039b6ae206f..ad3fdc07c60b 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -22,6 +22,7 @@ #include "coresight-etm-perf.h" #include "coresight-priv.h" #include "coresight-syscfg.h" +#include "coresight-trace-id.h"
static struct pmu etm_pmu; static bool etm_perf_up; @@ -228,6 +229,9 @@ static void free_event_data(struct work_struct *work) *ppath = NULL; }
+ /* mark perf event as done for trace id allocator */ + coresight_trace_id_perf_stop(); + free_percpu(event_data->path); kfree(event_data); } @@ -314,6 +318,9 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, sink = user_sink = coresight_get_sink_by_id(id); }
+ /* tell the trace ID allocator that a perf event is starting up */ + coresight_trace_id_perf_start(); + /* check if user wants a coresight configuration selected */ cfg_hash = (u32)((event->attr.config2 & GENMASK_ULL(63, 32)) >> 32); if (cfg_hash) {
Hi Mike,
On 04/07/2022 09:11, Mike Leach wrote:
Adds in notifier calls to the trace ID allocator that perf events are starting and stopping.
This ensures that Trace IDs associated with CPUs remain the same throughout the perf session, and are only released when all perf sessions are complete.
The patch looks fine to me. I think it would be good to add the definition of coresight_trace_id_perf_{stop,start}() in this patch.
Signed-off-by: Mike Leach mike.leach@linaro.org
drivers/hwtracing/coresight/coresight-etm-perf.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index c039b6ae206f..ad3fdc07c60b 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -22,6 +22,7 @@ #include "coresight-etm-perf.h" #include "coresight-priv.h" #include "coresight-syscfg.h" +#include "coresight-trace-id.h" static struct pmu etm_pmu; static bool etm_perf_up; @@ -228,6 +229,9 @@ static void free_event_data(struct work_struct *work) *ppath = NULL; }
- /* mark perf event as done for trace id allocator */
- coresight_trace_id_perf_stop();
- free_percpu(event_data->path); kfree(event_data); }
@@ -314,6 +318,9 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, sink = user_sink = coresight_get_sink_by_id(id); }
- /* tell the trace ID allocator that a perf event is starting up */
- coresight_trace_id_perf_start();
- /* check if user wants a coresight configuration selected */ cfg_hash = (u32)((event->attr.config2 & GENMASK_ULL(63, 32)) >> 32); if (cfg_hash) {
Suzuki
The information to associate Trace ID and CPU will be changing. Drivers will start outputting this as a hardware ID packet in the data file and setting the value in AUXINFO to an unused value.
To prepare for this, we only map Trace ID and CPU data from AUXINFO if the header version and values are valid, and move the mapping into a helper function.
Signed-off-by: Mike Leach mike.leach@linaro.org --- tools/perf/util/cs-etm.c | 53 +++++++++++++++++++++++++++------------- tools/perf/util/cs-etm.h | 14 +++++++++-- 2 files changed, 48 insertions(+), 19 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8b95fb3c4d7b..df9d67901f8d 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -193,6 +193,30 @@ int cs_etm__get_pid_fmt(u8 trace_chan_id, u64 *pid_fmt) return 0; }
+static int cs_etm__map_trace_id(u8 trace_chan_id, u64 *cpu_metadata) +{ + struct int_node *inode; + + /* Get an RB node for this CPU */ + inode = intlist__findnew(traceid_list, trace_chan_id); + + /* Something went wrong, no need to continue */ + if (!inode) + return -ENOMEM; + + /* + * The node for that CPU should not be taken. + * Back out if that's the case. + */ + if (inode->priv) + return -EINVAL; + + /* All good, associate the traceID with the metadata pointer */ + inode->priv = cpu_metadata; + + return 0; +} + void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq, u8 trace_chan_id) { @@ -2886,7 +2910,6 @@ int cs_etm__process_auxtrace_info(union perf_event *event, { struct perf_record_auxtrace_info *auxtrace_info = &event->auxtrace_info; struct cs_etm_auxtrace *etm = NULL; - struct int_node *inode; unsigned int pmu_type; int event_header_size = sizeof(struct perf_event_header); int info_header_size; @@ -2898,6 +2921,7 @@ int cs_etm__process_auxtrace_info(union perf_event *event, u64 *ptr, *hdr = NULL; u64 **metadata = NULL; u64 hdr_version; + u8 trace_chan_id;
/* * sizeof(auxtrace_info_event::type) + @@ -2991,25 +3015,20 @@ int cs_etm__process_auxtrace_info(union perf_event *event, goto err_free_metadata; }
- /* Get an RB node for this CPU */ - inode = intlist__findnew(traceid_list, metadata[j][trcidr_idx]); - - /* Something went wrong, no need to continue */ - if (!inode) { - err = -ENOMEM; - goto err_free_metadata; - } - /* - * The node for that CPU should not be taken. - * Back out if that's the case. + * Associate a trace ID with metadata. + * Later versions of the drivers will make this association using a + * hardware ID packet in the data file, setting the value in AUXINFO to an + * invalid trace ID value. Only map here if the value is valid. */ - if (inode->priv) { - err = -EINVAL; - goto err_free_metadata; + if (hdr_version < CS_AUX_HW_ID_VERSION_MIN) { + trace_chan_id = metadata[j][trcidr_idx]; + if (CS_IS_VALID_TRACE_ID(trace_chan_id)) { + err = cs_etm__map_trace_id(trace_chan_id, metadata[j]); + if (err) + goto err_free_metadata; + } } - /* All good, associate the traceID with the metadata pointer */ - inode->priv = metadata[j]; }
/* diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 90c83f932d9a..712a6f855f0e 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -28,13 +28,17 @@ enum { /* * Update the version for new format. * - * New version 1 format adds a param count to the per cpu metadata. + * Version 1: format adds a param count to the per cpu metadata. * This allows easy adding of new metadata parameters. * Requires that new params always added after current ones. * Also allows client reader to handle file versions that are different by * checking the number of params in the file vs the number expected. + * + * Version 2: Drivers will use PERF_RECORD_AUX_OUTPUT_HW_ID to output + * CoreSight Trace ID. ...TRACEIDR metadata will be set to unused ID. */ -#define CS_HEADER_CURRENT_VERSION 1 +#define CS_HEADER_CURRENT_VERSION 2 +#define CS_AUX_HW_ID_VERSION_MIN 2
/* Beginning of header common to both ETMv3 and V4 */ enum { @@ -85,6 +89,12 @@ enum { CS_ETE_PRIV_MAX };
+/* + * Check for valid CoreSight trace ID. If an invalid value is present in the metadata, + * then IDs are present in the hardware ID packet in the data file. + */ +#define CS_IS_VALID_TRACE_ID(id) ((id > 0) && (id < 0x70)) + /* * ETMv3 exception encoding number: * See Embedded Trace Macrocell specification (ARM IHI 0014Q)
On 04/07/2022 09:11, Mike Leach wrote:
The information to associate Trace ID and CPU will be changing. Drivers will start outputting this as a hardware ID packet in the data file and setting the value in AUXINFO to an unused value.
To prepare for this, we only map Trace ID and CPU data from AUXINFO if the header version and values are valid, and move the mapping into a helper function.
Signed-off-by: Mike Leach mike.leach@linaro.org
tools/perf/util/cs-etm.c | 53 +++++++++++++++++++++++++++------------- tools/perf/util/cs-etm.h | 14 +++++++++-- 2 files changed, 48 insertions(+), 19 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8b95fb3c4d7b..df9d67901f8d 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -193,6 +193,30 @@ int cs_etm__get_pid_fmt(u8 trace_chan_id, u64 *pid_fmt) return 0; } +static int cs_etm__map_trace_id(u8 trace_chan_id, u64 *cpu_metadata) +{
- struct int_node *inode;
- /* Get an RB node for this CPU */
- inode = intlist__findnew(traceid_list, trace_chan_id);
- /* Something went wrong, no need to continue */
- if (!inode)
return -ENOMEM;
- /*
* The node for that CPU should not be taken.
* Back out if that's the case.
*/
- if (inode->priv)
return -EINVAL;
- /* All good, associate the traceID with the metadata pointer */
- inode->priv = cpu_metadata;
- return 0;
+}
void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq, u8 trace_chan_id) { @@ -2886,7 +2910,6 @@ int cs_etm__process_auxtrace_info(union perf_event *event, { struct perf_record_auxtrace_info *auxtrace_info = &event->auxtrace_info; struct cs_etm_auxtrace *etm = NULL;
- struct int_node *inode; unsigned int pmu_type; int event_header_size = sizeof(struct perf_event_header); int info_header_size;
@@ -2898,6 +2921,7 @@ int cs_etm__process_auxtrace_info(union perf_event *event, u64 *ptr, *hdr = NULL; u64 **metadata = NULL; u64 hdr_version;
- u8 trace_chan_id;
/* * sizeof(auxtrace_info_event::type) + @@ -2991,25 +3015,20 @@ int cs_etm__process_auxtrace_info(union perf_event *event, goto err_free_metadata; }
/* Get an RB node for this CPU */
inode = intlist__findnew(traceid_list, metadata[j][trcidr_idx]);
/* Something went wrong, no need to continue */
if (!inode) {
err = -ENOMEM;
goto err_free_metadata;
}
- /*
* The node for that CPU should not be taken.
* Back out if that's the case.
* Associate a trace ID with metadata.
* Later versions of the drivers will make this association using a
* hardware ID packet in the data file, setting the value in AUXINFO to an
*/* invalid trace ID value. Only map here if the value is valid.
if (inode->priv) {
err = -EINVAL;
goto err_free_metadata;
if (hdr_version < CS_AUX_HW_ID_VERSION_MIN) {
trace_chan_id = metadata[j][trcidr_idx];
if (CS_IS_VALID_TRACE_ID(trace_chan_id)) {
err = cs_etm__map_trace_id(trace_chan_id, metadata[j]);
if (err)
goto err_free_metadata;
}}
/* All good, associate the traceID with the metadata pointer */
}inode->priv = metadata[j];
/* diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 90c83f932d9a..712a6f855f0e 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -28,13 +28,17 @@ enum { /*
- Update the version for new format.
- New version 1 format adds a param count to the per cpu metadata.
- Version 1: format adds a param count to the per cpu metadata.
- This allows easy adding of new metadata parameters.
- Requires that new params always added after current ones.
- Also allows client reader to handle file versions that are different by
- checking the number of params in the file vs the number expected.
- Version 2: Drivers will use PERF_RECORD_AUX_OUTPUT_HW_ID to output
*/
- CoreSight Trace ID. ...TRACEIDR metadata will be set to unused ID.
-#define CS_HEADER_CURRENT_VERSION 1 +#define CS_HEADER_CURRENT_VERSION 2 +#define CS_AUX_HW_ID_VERSION_MIN 2
Hi Mike,
I'm starting to look at this set now.
Am I right in thinking that this hard coded value means that new versions of Perf won't work with older drivers? Does this need to be highlighted somewhere in a warning that it's not the Perf version that's the issue but both the Perf and driver version together?
I thought the idea was to search through the file to look for PERF_RECORD_AUX_OUTPUT_HW_ID records (or lack of) and then choose the appropriate decode method. But maybe that's too complicated and there is no requirement for backwards compatibility?
From experience it can be inconvenient when you can't just throw any build of Perf on a system and it supports everything that it knows about. Now we will have Perf builds that know about Coresight but don't work with older drivers.
But then as you say the ID allocation is already broken for some people. It's hard to decide.
James
/* Beginning of header common to both ETMv3 and V4 */ enum { @@ -85,6 +89,12 @@ enum { CS_ETE_PRIV_MAX }; +/*
- Check for valid CoreSight trace ID. If an invalid value is present in the metadata,
- then IDs are present in the hardware ID packet in the data file.
- */
+#define CS_IS_VALID_TRACE_ID(id) ((id > 0) && (id < 0x70))
/*
- ETMv3 exception encoding number:
- See Embedded Trace Macrocell specification (ARM IHI 0014Q)
Hi James
On Tue, 19 Jul 2022 at 15:54, James Clark james.clark@arm.com wrote:
On 04/07/2022 09:11, Mike Leach wrote:
The information to associate Trace ID and CPU will be changing. Drivers will start outputting this as a hardware ID packet in the data file and setting the value in AUXINFO to an unused value.
To prepare for this, we only map Trace ID and CPU data from AUXINFO if the header version and values are valid, and move the mapping into a helper function.
Signed-off-by: Mike Leach mike.leach@linaro.org
tools/perf/util/cs-etm.c | 53 +++++++++++++++++++++++++++------------- tools/perf/util/cs-etm.h | 14 +++++++++-- 2 files changed, 48 insertions(+), 19 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8b95fb3c4d7b..df9d67901f8d 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -193,6 +193,30 @@ int cs_etm__get_pid_fmt(u8 trace_chan_id, u64 *pid_fmt) return 0; }
+static int cs_etm__map_trace_id(u8 trace_chan_id, u64 *cpu_metadata) +{
struct int_node *inode;
/* Get an RB node for this CPU */
inode = intlist__findnew(traceid_list, trace_chan_id);
/* Something went wrong, no need to continue */
if (!inode)
return -ENOMEM;
/*
* The node for that CPU should not be taken.
* Back out if that's the case.
*/
if (inode->priv)
return -EINVAL;
/* All good, associate the traceID with the metadata pointer */
inode->priv = cpu_metadata;
return 0;
+}
void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq, u8 trace_chan_id) { @@ -2886,7 +2910,6 @@ int cs_etm__process_auxtrace_info(union perf_event *event, { struct perf_record_auxtrace_info *auxtrace_info = &event->auxtrace_info; struct cs_etm_auxtrace *etm = NULL;
struct int_node *inode; unsigned int pmu_type; int event_header_size = sizeof(struct perf_event_header); int info_header_size;
@@ -2898,6 +2921,7 @@ int cs_etm__process_auxtrace_info(union perf_event *event, u64 *ptr, *hdr = NULL; u64 **metadata = NULL; u64 hdr_version;
u8 trace_chan_id; /* * sizeof(auxtrace_info_event::type) +
@@ -2991,25 +3015,20 @@ int cs_etm__process_auxtrace_info(union perf_event *event, goto err_free_metadata; }
/* Get an RB node for this CPU */
inode = intlist__findnew(traceid_list, metadata[j][trcidr_idx]);
/* Something went wrong, no need to continue */
if (!inode) {
err = -ENOMEM;
goto err_free_metadata;
}
/*
* The node for that CPU should not be taken.
* Back out if that's the case.
* Associate a trace ID with metadata.
* Later versions of the drivers will make this association using a
* hardware ID packet in the data file, setting the value in AUXINFO to an
* invalid trace ID value. Only map here if the value is valid. */
if (inode->priv) {
err = -EINVAL;
goto err_free_metadata;
if (hdr_version < CS_AUX_HW_ID_VERSION_MIN) {
trace_chan_id = metadata[j][trcidr_idx];
if (CS_IS_VALID_TRACE_ID(trace_chan_id)) {
err = cs_etm__map_trace_id(trace_chan_id, metadata[j]);
if (err)
goto err_free_metadata;
} }
/* All good, associate the traceID with the metadata pointer */
inode->priv = metadata[j]; } /*
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 90c83f932d9a..712a6f855f0e 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -28,13 +28,17 @@ enum { /*
- Update the version for new format.
- New version 1 format adds a param count to the per cpu metadata.
- Version 1: format adds a param count to the per cpu metadata.
- This allows easy adding of new metadata parameters.
- Requires that new params always added after current ones.
- Also allows client reader to handle file versions that are different by
- checking the number of params in the file vs the number expected.
- Version 2: Drivers will use PERF_RECORD_AUX_OUTPUT_HW_ID to output
*/
- CoreSight Trace ID. ...TRACEIDR metadata will be set to unused ID.
-#define CS_HEADER_CURRENT_VERSION 1 +#define CS_HEADER_CURRENT_VERSION 2 +#define CS_AUX_HW_ID_VERSION_MIN 2
Hi Mike,
I'm starting to look at this set now.
Am I right in thinking that this hard coded value means that new versions of Perf won't work with older drivers? Does this need to be highlighted somewhere in a warning that it's not the Perf version that's the issue but both the Perf and driver version together?
Need to differentiate here between perf record, and perf report.
My understanding is that perf record must always match the version of your kernel. If you use an old version of perf record on a newer kernel then you are asking for trouble. Indeed, if I run perf on my x86 dev machine at the moment it whinges: WARNING: perf not found for kernel 5.4.0-122 because the last version of perf I have is for 5.4.0-120.
The new perf report will differentiate between the new and old versions of the perf.data file and act accordingly. For version 1 it will take the IDs from the metadata, for version 2 it will search for the IDs in the packet data. An older perf report will not be able to decode the newer files - though that has always been the case.
Were we to permit and old version of perf report to be used to generate a file using the new drivers, and then attempt to process that file with and older perf report, it would fail miserably.
Regards
Mike
I thought the idea was to search through the file to look for PERF_RECORD_AUX_OUTPUT_HW_ID records (or lack of) and then choose the appropriate decode method. But maybe that's too complicated and there is no requirement for backwards compatibility?
From experience it can be inconvenient when you can't just throw any build of Perf on a system and it supports everything that it knows about. Now we will have Perf builds that know about Coresight but don't work with older drivers.
But then as you say the ID allocation is already broken for some people. It's hard to decide.
James
/* Beginning of header common to both ETMv3 and V4 */ enum { @@ -85,6 +89,12 @@ enum { CS_ETE_PRIV_MAX };
+/*
- Check for valid CoreSight trace ID. If an invalid value is present in the metadata,
- then IDs are present in the hardware ID packet in the data file.
- */
+#define CS_IS_VALID_TRACE_ID(id) ((id > 0) && (id < 0x70))
/*
- ETMv3 exception encoding number:
- See Embedded Trace Macrocell specification (ARM IHI 0014Q)
On 20/07/2022 11:22, Mike Leach wrote:
Hi James
On Tue, 19 Jul 2022 at 15:54, James Clark james.clark@arm.com wrote:
On 04/07/2022 09:11, Mike Leach wrote:
The information to associate Trace ID and CPU will be changing. Drivers will start outputting this as a hardware ID packet in the data file and setting the value in AUXINFO to an unused value.
To prepare for this, we only map Trace ID and CPU data from AUXINFO if the header version and values are valid, and move the mapping into a helper function.
Signed-off-by: Mike Leach mike.leach@linaro.org
tools/perf/util/cs-etm.c | 53 +++++++++++++++++++++++++++------------- tools/perf/util/cs-etm.h | 14 +++++++++-- 2 files changed, 48 insertions(+), 19 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8b95fb3c4d7b..df9d67901f8d 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -193,6 +193,30 @@ int cs_etm__get_pid_fmt(u8 trace_chan_id, u64 *pid_fmt) return 0; }
+static int cs_etm__map_trace_id(u8 trace_chan_id, u64 *cpu_metadata) +{
struct int_node *inode;
/* Get an RB node for this CPU */
inode = intlist__findnew(traceid_list, trace_chan_id);
/* Something went wrong, no need to continue */
if (!inode)
return -ENOMEM;
/*
* The node for that CPU should not be taken.
* Back out if that's the case.
*/
if (inode->priv)
return -EINVAL;
/* All good, associate the traceID with the metadata pointer */
inode->priv = cpu_metadata;
return 0;
+}
void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq, u8 trace_chan_id) { @@ -2886,7 +2910,6 @@ int cs_etm__process_auxtrace_info(union perf_event *event, { struct perf_record_auxtrace_info *auxtrace_info = &event->auxtrace_info; struct cs_etm_auxtrace *etm = NULL;
struct int_node *inode; unsigned int pmu_type; int event_header_size = sizeof(struct perf_event_header); int info_header_size;
@@ -2898,6 +2921,7 @@ int cs_etm__process_auxtrace_info(union perf_event *event, u64 *ptr, *hdr = NULL; u64 **metadata = NULL; u64 hdr_version;
u8 trace_chan_id; /* * sizeof(auxtrace_info_event::type) +
@@ -2991,25 +3015,20 @@ int cs_etm__process_auxtrace_info(union perf_event *event, goto err_free_metadata; }
/* Get an RB node for this CPU */
inode = intlist__findnew(traceid_list, metadata[j][trcidr_idx]);
/* Something went wrong, no need to continue */
if (!inode) {
err = -ENOMEM;
goto err_free_metadata;
}
/*
* The node for that CPU should not be taken.
* Back out if that's the case.
* Associate a trace ID with metadata.
* Later versions of the drivers will make this association using a
* hardware ID packet in the data file, setting the value in AUXINFO to an
* invalid trace ID value. Only map here if the value is valid. */
if (inode->priv) {
err = -EINVAL;
goto err_free_metadata;
if (hdr_version < CS_AUX_HW_ID_VERSION_MIN) {
trace_chan_id = metadata[j][trcidr_idx];
if (CS_IS_VALID_TRACE_ID(trace_chan_id)) {
err = cs_etm__map_trace_id(trace_chan_id, metadata[j]);
if (err)
goto err_free_metadata;
} }
/* All good, associate the traceID with the metadata pointer */
inode->priv = metadata[j]; } /*
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 90c83f932d9a..712a6f855f0e 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -28,13 +28,17 @@ enum { /*
- Update the version for new format.
- New version 1 format adds a param count to the per cpu metadata.
- Version 1: format adds a param count to the per cpu metadata.
- This allows easy adding of new metadata parameters.
- Requires that new params always added after current ones.
- Also allows client reader to handle file versions that are different by
- checking the number of params in the file vs the number expected.
- Version 2: Drivers will use PERF_RECORD_AUX_OUTPUT_HW_ID to output
*/
- CoreSight Trace ID. ...TRACEIDR metadata will be set to unused ID.
-#define CS_HEADER_CURRENT_VERSION 1 +#define CS_HEADER_CURRENT_VERSION 2 +#define CS_AUX_HW_ID_VERSION_MIN 2
Hi Mike,
I'm starting to look at this set now.
Am I right in thinking that this hard coded value means that new versions of Perf won't work with older drivers? Does this need to be highlighted somewhere in a warning that it's not the Perf version that's the issue but both the Perf and driver version together?
Need to differentiate here between perf record, and perf report.
My understanding is that perf record must always match the version of your kernel. If you use an old version of perf record on a newer kernel then you are asking for trouble.
In that case it's probably ok then. Although there are some users using a mainline version of Perf for all the decode fixes, but running on a platform with an older production kernel. I suppose they'd have to backport the new Coresight driver. In this case having Perf support both older and new drivers would simplify this workflow. But if it's not supported then it's not supported.
Indeed, if I run perf on my x86 dev machine at the moment it whinges: WARNING: perf not found for kernel 5.4.0-122 because the last version of perf I have is for 5.4.0-120.
These are printed for the package manager installed version, but not for dev builds. If we know there is some incompatibility I wonder if adding a warning would be easy. Otherwise you'd get the obscure "This file has no samples!" message. Or just leave it to the wrapper script to warn only non-devbuild users?
The new perf report will differentiate between the new and old versions of the perf.data file and act accordingly. For version 1 it will take the IDs from the metadata, for version 2 it will search for the IDs in the packet data. An older perf report will not be able to decode the newer files - though that has always been the case.
Were we to permit and old version of perf report to be used to generate a file using the new drivers, and then attempt to process that file with and older perf report, it would fail miserably.
Regards
Mike
I thought the idea was to search through the file to look for PERF_RECORD_AUX_OUTPUT_HW_ID records (or lack of) and then choose the appropriate decode method. But maybe that's too complicated and there is no requirement for backwards compatibility?
From experience it can be inconvenient when you can't just throw any build of Perf on a system and it supports everything that it knows about. Now we will have Perf builds that know about Coresight but don't work with older drivers.
But then as you say the ID allocation is already broken for some people. It's hard to decide.
James
/* Beginning of header common to both ETMv3 and V4 */ enum { @@ -85,6 +89,12 @@ enum { CS_ETE_PRIV_MAX };
+/*
- Check for valid CoreSight trace ID. If an invalid value is present in the metadata,
- then IDs are present in the hardware ID packet in the data file.
- */
+#define CS_IS_VALID_TRACE_ID(id) ((id > 0) && (id < 0x70))
/*
- ETMv3 exception encoding number:
- See Embedded Trace Macrocell specification (ARM IHI 0014Q)
Em Wed, Jul 20, 2022 at 11:22:37AM +0100, Mike Leach escreveu:
On Tue, 19 Jul 2022 at 15:54, James Clark james.clark@arm.com wrote:
I'm starting to look at this set now.
Am I right in thinking that this hard coded value means that new versions of Perf won't work with older drivers? Does this need to be highlighted somewhere in a warning that it's not the Perf version that's the issue but both the Perf and driver version together?
Need to differentiate here between perf record, and perf report.
My understanding is that perf record must always match the version of your kernel. If you use an old version of perf record on a newer
No, that is not what is intended, one should be able to use whatever perf (record or otherwise) with whatever kernel version.
perf tries to cope with, and if it is not possible to record the way the user asks to then it should emit a helpful error message stating why it is not possible, see:
evsel__disable_missing_features() evsel__detect_missing_features()
Used during a evsel__open()
- Arnaldo
kernel then you are asking for trouble. Indeed, if I run perf on my x86 dev machine at the moment it whinges: WARNING: perf not found for kernel 5.4.0-122 because the last version of perf I have is for 5.4.0-120.
The new perf report will differentiate between the new and old versions of the perf.data file and act accordingly. For version 1 it will take the IDs from the metadata, for version 2 it will search for the IDs in the packet data. An older perf report will not be able to decode the newer files - though that has always been the case.
Were we to permit and old version of perf report to be used to generate a file using the new drivers, and then attempt to process that file with and older perf report, it would fail miserably.
Regards
Mike
I thought the idea was to search through the file to look for PERF_RECORD_AUX_OUTPUT_HW_ID records (or lack of) and then choose the appropriate decode method. But maybe that's too complicated and there is no requirement for backwards compatibility?
From experience it can be inconvenient when you can't just throw any build of Perf on a system and it supports everything that it knows about. Now we will have Perf builds that know about Coresight but don't work with older drivers.
But then as you say the ID allocation is already broken for some people. It's hard to decide.
James
/* Beginning of header common to both ETMv3 and V4 */ enum { @@ -85,6 +89,12 @@ enum { CS_ETE_PRIV_MAX };
+/*
- Check for valid CoreSight trace ID. If an invalid value is present in the metadata,
- then IDs are present in the hardware ID packet in the data file.
- */
+#define CS_IS_VALID_TRACE_ID(id) ((id > 0) && (id < 0x70))
/*
- ETMv3 exception encoding number:
- See Embedded Trace Macrocell specification (ARM IHI 0014Q)
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK
Trace IDs are now dynamically allocated.
Previously used the static association algorithm that is no longer used. The 'cpu * 2 + seed' was outdated and broken for systems with high core counts (>46). as it did not scale and was broken for larger core counts.
Trace ID is as unknown in AUXINFO record, and the ID / CPU association will now be sent in PERF_RECORD_AUX_OUTPUT_HW_ID record.
Remove legacy Trace ID allocation algorithm.
Signed-off-by: Mike Leach mike.leach@linaro.org --- include/linux/coresight-pmu.h | 19 +++++++------------ tools/include/linux/coresight-pmu.h | 19 +++++++------------ tools/perf/arch/arm/util/cs-etm.c | 21 ++++++++++++--------- 3 files changed, 26 insertions(+), 33 deletions(-)
diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h index 4ac5c081af93..9f7ee380266b 100644 --- a/include/linux/coresight-pmu.h +++ b/include/linux/coresight-pmu.h @@ -8,7 +8,13 @@ #define _LINUX_CORESIGHT_PMU_H
#define CORESIGHT_ETM_PMU_NAME "cs_etm" -#define CORESIGHT_ETM_PMU_SEED 0x10 + +/* + * Metadata now contains an unused trace ID - IDs are transmitted using a + * PERF_RECORD_AUX_OUTPUT_HW_ID record. + * Value architecturally defined as reserved in CoreSight. + */ +#define CS_UNUSED_TRACE_ID 0x7F
/* * Below are the definition of bit offsets for perf option, and works as @@ -32,15 +38,4 @@ #define ETM4_CFG_BIT_RETSTK 12 #define ETM4_CFG_BIT_VMID_OPT 15
-static inline int coresight_get_trace_id(int cpu) -{ - /* - * A trace ID of value 0 is invalid, so let's start at some - * random value that fits in 7 bits and go from there. Since - * the common convention is to have data trace IDs be I(N) + 1, - * set instruction trace IDs as a function of the CPU number. - */ - return (CORESIGHT_ETM_PMU_SEED + (cpu * 2)); -} - #endif diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h index 6c2fd6cc5a98..31d007fab3a6 100644 --- a/tools/include/linux/coresight-pmu.h +++ b/tools/include/linux/coresight-pmu.h @@ -8,7 +8,13 @@ #define _LINUX_CORESIGHT_PMU_H
#define CORESIGHT_ETM_PMU_NAME "cs_etm" -#define CORESIGHT_ETM_PMU_SEED 0x10 + +/* + * Metadata now contains an unused trace ID - IDs are transmitted using a + * PERF_RECORD_AUX_OUTPUT_HW_ID record. + * Value architecturally defined as reserved in CoreSight. + */ +#define CS_UNUSED_TRACE_ID 0x7F
/* * Below are the definition of bit offsets for perf option, and works as @@ -34,15 +40,4 @@ #define ETM4_CFG_BIT_RETSTK 12 #define ETM4_CFG_BIT_VMID_OPT 15
-static inline int coresight_get_trace_id(int cpu) -{ - /* - * A trace ID of value 0 is invalid, so let's start at some - * random value that fits in 7 bits and go from there. Since - * the common convention is to have data trace IDs be I(N) + 1, - * set instruction trace IDs as a function of the CPU number. - */ - return (CORESIGHT_ETM_PMU_SEED + (cpu * 2)); -} - #endif diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 1b54638d53b0..2d68e6a722ed 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -421,13 +421,16 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, evlist__to_front(evlist, cs_etm_evsel);
/* - * In the case of per-cpu mmaps, we need the CPU on the - * AUX event. We also need the contextID in order to be notified + * get the CPU on the sample - need it to associate trace ID in the + * AUX_OUTPUT_HW_ID event, and the AUX event for per-cpu mmaps. + */ + evsel__set_sample_bit(cs_etm_evsel, CPU); + + /* + * Also the case of per-cpu mmaps, need the contextID in order to be notified * when a context switch happened. */ if (!perf_cpu_map__empty(cpus)) { - evsel__set_sample_bit(cs_etm_evsel, CPU); - err = cs_etm_set_option(itr, cs_etm_evsel, BIT(ETM_OPT_CTXTID) | BIT(ETM_OPT_TS)); if (err) @@ -633,8 +636,9 @@ static void cs_etm_save_etmv4_header(__u64 data[], struct auxtrace_record *itr,
/* Get trace configuration register */ data[CS_ETMV4_TRCCONFIGR] = cs_etmv4_get_config(itr); - /* Get traceID from the framework */ - data[CS_ETMV4_TRCTRACEIDR] = coresight_get_trace_id(cpu); + /* traceID set to unused */ + data[CS_ETMV4_TRCTRACEIDR] = CS_UNUSED_TRACE_ID; + /* Get read-only information from sysFS */ data[CS_ETMV4_TRCIDR0] = cs_etm_get_ro(cs_etm_pmu, cpu, metadata_etmv4_ro[CS_ETMV4_TRCIDR0]); @@ -681,9 +685,8 @@ static void cs_etm_get_metadata(int cpu, u32 *offset, magic = __perf_cs_etmv3_magic; /* Get configuration register */ info->priv[*offset + CS_ETM_ETMCR] = cs_etm_get_config(itr); - /* Get traceID from the framework */ - info->priv[*offset + CS_ETM_ETMTRACEIDR] = - coresight_get_trace_id(cpu); + /* traceID set to unused */ + info->priv[*offset + CS_ETM_ETMTRACEIDR] = CS_UNUSED_TRACE_ID; /* Get read-only information from sysFS */ info->priv[*offset + CS_ETM_ETMCCER] = cs_etm_get_ro(cs_etm_pmu, cpu,
On 04/07/2022 09:11, Mike Leach wrote:
Trace IDs are now dynamically allocated.
Previously used the static association algorithm that is no longer used. The 'cpu * 2 + seed' was outdated and broken for systems with high core counts (>46). as it did not scale and was broken for larger core counts.
Trace ID is as unknown in AUXINFO record, and the ID / CPU association will now be sent in PERF_RECORD_AUX_OUTPUT_HW_ID record.
Remove legacy Trace ID allocation algorithm.
Signed-off-by: Mike Leach mike.leach@linaro.org
include/linux/coresight-pmu.h | 19 +++++++------------ tools/include/linux/coresight-pmu.h | 19 +++++++------------
I usually see mentions that these header updates need to be separate commits because they are merged through different trees.
tools/perf/arch/arm/util/cs-etm.c | 21 ++++++++++++--------- 3 files changed, 26 insertions(+), 33 deletions(-)
diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h index 4ac5c081af93..9f7ee380266b 100644 --- a/include/linux/coresight-pmu.h +++ b/include/linux/coresight-pmu.h @@ -8,7 +8,13 @@ #define _LINUX_CORESIGHT_PMU_H #define CORESIGHT_ETM_PMU_NAME "cs_etm" -#define CORESIGHT_ETM_PMU_SEED 0x10
+/*
- Metadata now contains an unused trace ID - IDs are transmitted using a
- PERF_RECORD_AUX_OUTPUT_HW_ID record.
- Value architecturally defined as reserved in CoreSight.
- */
+#define CS_UNUSED_TRACE_ID 0x7F /*
- Below are the definition of bit offsets for perf option, and works as
@@ -32,15 +38,4 @@ #define ETM4_CFG_BIT_RETSTK 12 #define ETM4_CFG_BIT_VMID_OPT 15 -static inline int coresight_get_trace_id(int cpu) -{
- /*
* A trace ID of value 0 is invalid, so let's start at some
* random value that fits in 7 bits and go from there. Since
* the common convention is to have data trace IDs be I(N) + 1,
* set instruction trace IDs as a function of the CPU number.
*/
- return (CORESIGHT_ETM_PMU_SEED + (cpu * 2));
-}
#endif diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h index 6c2fd6cc5a98..31d007fab3a6 100644 --- a/tools/include/linux/coresight-pmu.h +++ b/tools/include/linux/coresight-pmu.h @@ -8,7 +8,13 @@ #define _LINUX_CORESIGHT_PMU_H #define CORESIGHT_ETM_PMU_NAME "cs_etm" -#define CORESIGHT_ETM_PMU_SEED 0x10
+/*
- Metadata now contains an unused trace ID - IDs are transmitted using a
- PERF_RECORD_AUX_OUTPUT_HW_ID record.
- Value architecturally defined as reserved in CoreSight.
- */
+#define CS_UNUSED_TRACE_ID 0x7F
minor nit: this isn't used in the kernel so only needs to be defined on the tools side.
/*
- Below are the definition of bit offsets for perf option, and works as
@@ -34,15 +40,4 @@ #define ETM4_CFG_BIT_RETSTK 12 #define ETM4_CFG_BIT_VMID_OPT 15 -static inline int coresight_get_trace_id(int cpu) -{
- /*
* A trace ID of value 0 is invalid, so let's start at some
* random value that fits in 7 bits and go from there. Since
* the common convention is to have data trace IDs be I(N) + 1,
* set instruction trace IDs as a function of the CPU number.
*/
- return (CORESIGHT_ETM_PMU_SEED + (cpu * 2));
-}
#endif diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 1b54638d53b0..2d68e6a722ed 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -421,13 +421,16 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, evlist__to_front(evlist, cs_etm_evsel); /*
* In the case of per-cpu mmaps, we need the CPU on the
* AUX event. We also need the contextID in order to be notified
* get the CPU on the sample - need it to associate trace ID in the
* AUX_OUTPUT_HW_ID event, and the AUX event for per-cpu mmaps.
*/
- evsel__set_sample_bit(cs_etm_evsel, CPU);
- /*
* Also the case of per-cpu mmaps, need the contextID in order to be notified
*/ if (!perf_cpu_map__empty(cpus)) {
- when a context switch happened.
evsel__set_sample_bit(cs_etm_evsel, CPU);
- err = cs_etm_set_option(itr, cs_etm_evsel, BIT(ETM_OPT_CTXTID) | BIT(ETM_OPT_TS)); if (err)
@@ -633,8 +636,9 @@ static void cs_etm_save_etmv4_header(__u64 data[], struct auxtrace_record *itr, /* Get trace configuration register */ data[CS_ETMV4_TRCCONFIGR] = cs_etmv4_get_config(itr);
- /* Get traceID from the framework */
- data[CS_ETMV4_TRCTRACEIDR] = coresight_get_trace_id(cpu);
- /* traceID set to unused */
- data[CS_ETMV4_TRCTRACEIDR] = CS_UNUSED_TRACE_ID;
- /* Get read-only information from sysFS */ data[CS_ETMV4_TRCIDR0] = cs_etm_get_ro(cs_etm_pmu, cpu, metadata_etmv4_ro[CS_ETMV4_TRCIDR0]);
@@ -681,9 +685,8 @@ static void cs_etm_get_metadata(int cpu, u32 *offset, magic = __perf_cs_etmv3_magic; /* Get configuration register */ info->priv[*offset + CS_ETM_ETMCR] = cs_etm_get_config(itr);
/* Get traceID from the framework */
info->priv[*offset + CS_ETM_ETMTRACEIDR] =
coresight_get_trace_id(cpu);
/* traceID set to unused */
/* Get read-only information from sysFS */ info->priv[*offset + CS_ETM_ETMCCER] = cs_etm_get_ro(cs_etm_pmu, cpu,info->priv[*offset + CS_ETM_ETMTRACEIDR] = CS_UNUSED_TRACE_ID;
Hi James
On Wed, 20 Jul 2022 at 15:41, James Clark james.clark@arm.com wrote:
On 04/07/2022 09:11, Mike Leach wrote:
Trace IDs are now dynamically allocated.
Previously used the static association algorithm that is no longer used. The 'cpu * 2 + seed' was outdated and broken for systems with high core counts (>46). as it did not scale and was broken for larger core counts.
Trace ID is as unknown in AUXINFO record, and the ID / CPU association will now be sent in PERF_RECORD_AUX_OUTPUT_HW_ID record.
Remove legacy Trace ID allocation algorithm.
Signed-off-by: Mike Leach mike.leach@linaro.org
include/linux/coresight-pmu.h | 19 +++++++------------ tools/include/linux/coresight-pmu.h | 19 +++++++------------
I usually see mentions that these header updates need to be separate commits because they are merged through different trees.
tools/perf/arch/arm/util/cs-etm.c | 21 ++++++++++++--------- 3 files changed, 26 insertions(+), 33 deletions(-)
diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h index 4ac5c081af93..9f7ee380266b 100644 --- a/include/linux/coresight-pmu.h +++ b/include/linux/coresight-pmu.h @@ -8,7 +8,13 @@ #define _LINUX_CORESIGHT_PMU_H
#define CORESIGHT_ETM_PMU_NAME "cs_etm" -#define CORESIGHT_ETM_PMU_SEED 0x10
+/*
- Metadata now contains an unused trace ID - IDs are transmitted using a
- PERF_RECORD_AUX_OUTPUT_HW_ID record.
- Value architecturally defined as reserved in CoreSight.
- */
+#define CS_UNUSED_TRACE_ID 0x7F
/*
- Below are the definition of bit offsets for perf option, and works as
@@ -32,15 +38,4 @@ #define ETM4_CFG_BIT_RETSTK 12 #define ETM4_CFG_BIT_VMID_OPT 15
-static inline int coresight_get_trace_id(int cpu) -{
/*
* A trace ID of value 0 is invalid, so let's start at some
* random value that fits in 7 bits and go from there. Since
* the common convention is to have data trace IDs be I(N) + 1,
* set instruction trace IDs as a function of the CPU number.
*/
return (CORESIGHT_ETM_PMU_SEED + (cpu * 2));
-}
#endif diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h index 6c2fd6cc5a98..31d007fab3a6 100644 --- a/tools/include/linux/coresight-pmu.h +++ b/tools/include/linux/coresight-pmu.h @@ -8,7 +8,13 @@ #define _LINUX_CORESIGHT_PMU_H
#define CORESIGHT_ETM_PMU_NAME "cs_etm" -#define CORESIGHT_ETM_PMU_SEED 0x10
+/*
- Metadata now contains an unused trace ID - IDs are transmitted using a
- PERF_RECORD_AUX_OUTPUT_HW_ID record.
- Value architecturally defined as reserved in CoreSight.
- */
+#define CS_UNUSED_TRACE_ID 0x7F
minor nit: this isn't used in the kernel so only needs to be defined on the tools side.
Unfortunately if the two versions of coresight-pmu.h are different, the build process for perf throws out a warning. So they have to be identical.
Thanks
Mike
/*
- Below are the definition of bit offsets for perf option, and works as
@@ -34,15 +40,4 @@ #define ETM4_CFG_BIT_RETSTK 12 #define ETM4_CFG_BIT_VMID_OPT 15
-static inline int coresight_get_trace_id(int cpu) -{
/*
* A trace ID of value 0 is invalid, so let's start at some
* random value that fits in 7 bits and go from there. Since
* the common convention is to have data trace IDs be I(N) + 1,
* set instruction trace IDs as a function of the CPU number.
*/
return (CORESIGHT_ETM_PMU_SEED + (cpu * 2));
-}
#endif diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 1b54638d53b0..2d68e6a722ed 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -421,13 +421,16 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, evlist__to_front(evlist, cs_etm_evsel);
/*
* In the case of per-cpu mmaps, we need the CPU on the
* AUX event. We also need the contextID in order to be notified
* get the CPU on the sample - need it to associate trace ID in the
* AUX_OUTPUT_HW_ID event, and the AUX event for per-cpu mmaps.
*/
evsel__set_sample_bit(cs_etm_evsel, CPU);
/*
* Also the case of per-cpu mmaps, need the contextID in order to be notified * when a context switch happened. */ if (!perf_cpu_map__empty(cpus)) {
evsel__set_sample_bit(cs_etm_evsel, CPU);
err = cs_etm_set_option(itr, cs_etm_evsel, BIT(ETM_OPT_CTXTID) | BIT(ETM_OPT_TS)); if (err)
@@ -633,8 +636,9 @@ static void cs_etm_save_etmv4_header(__u64 data[], struct auxtrace_record *itr,
/* Get trace configuration register */ data[CS_ETMV4_TRCCONFIGR] = cs_etmv4_get_config(itr);
/* Get traceID from the framework */
data[CS_ETMV4_TRCTRACEIDR] = coresight_get_trace_id(cpu);
/* traceID set to unused */
data[CS_ETMV4_TRCTRACEIDR] = CS_UNUSED_TRACE_ID;
/* Get read-only information from sysFS */ data[CS_ETMV4_TRCIDR0] = cs_etm_get_ro(cs_etm_pmu, cpu, metadata_etmv4_ro[CS_ETMV4_TRCIDR0]);
@@ -681,9 +685,8 @@ static void cs_etm_get_metadata(int cpu, u32 *offset, magic = __perf_cs_etmv3_magic; /* Get configuration register */ info->priv[*offset + CS_ETM_ETMCR] = cs_etm_get_config(itr);
/* Get traceID from the framework */
info->priv[*offset + CS_ETM_ETMTRACEIDR] =
coresight_get_trace_id(cpu);
/* traceID set to unused */
info->priv[*offset + CS_ETM_ETMTRACEIDR] = CS_UNUSED_TRACE_ID; /* Get read-only information from sysFS */ info->priv[*offset + CS_ETM_ETMCCER] = cs_etm_get_ro(cs_etm_pmu, cpu,
On 09/08/2022 17:13, Mike Leach wrote:
Hi James
On Wed, 20 Jul 2022 at 15:41, James Clark james.clark@arm.com wrote:
On 04/07/2022 09:11, Mike Leach wrote:
Trace IDs are now dynamically allocated.
Previously used the static association algorithm that is no longer used. The 'cpu * 2 + seed' was outdated and broken for systems with high core counts (>46). as it did not scale and was broken for larger core counts.
Trace ID is as unknown in AUXINFO record, and the ID / CPU association will now be sent in PERF_RECORD_AUX_OUTPUT_HW_ID record.
Remove legacy Trace ID allocation algorithm.
Signed-off-by: Mike Leach mike.leach@linaro.org
include/linux/coresight-pmu.h | 19 +++++++------------ tools/include/linux/coresight-pmu.h | 19 +++++++------------
I usually see mentions that these header updates need to be separate commits because they are merged through different trees.
tools/perf/arch/arm/util/cs-etm.c | 21 ++++++++++++--------- 3 files changed, 26 insertions(+), 33 deletions(-)
diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h index 4ac5c081af93..9f7ee380266b 100644 --- a/include/linux/coresight-pmu.h +++ b/include/linux/coresight-pmu.h @@ -8,7 +8,13 @@ #define _LINUX_CORESIGHT_PMU_H
#define CORESIGHT_ETM_PMU_NAME "cs_etm" -#define CORESIGHT_ETM_PMU_SEED 0x10
+/*
- Metadata now contains an unused trace ID - IDs are transmitted using a
- PERF_RECORD_AUX_OUTPUT_HW_ID record.
- Value architecturally defined as reserved in CoreSight.
- */
+#define CS_UNUSED_TRACE_ID 0x7F
/*
- Below are the definition of bit offsets for perf option, and works as
@@ -32,15 +38,4 @@ #define ETM4_CFG_BIT_RETSTK 12 #define ETM4_CFG_BIT_VMID_OPT 15
-static inline int coresight_get_trace_id(int cpu) -{
/*
* A trace ID of value 0 is invalid, so let's start at some
* random value that fits in 7 bits and go from there. Since
* the common convention is to have data trace IDs be I(N) + 1,
* set instruction trace IDs as a function of the CPU number.
*/
return (CORESIGHT_ETM_PMU_SEED + (cpu * 2));
-}
#endif diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h index 6c2fd6cc5a98..31d007fab3a6 100644 --- a/tools/include/linux/coresight-pmu.h +++ b/tools/include/linux/coresight-pmu.h @@ -8,7 +8,13 @@ #define _LINUX_CORESIGHT_PMU_H
#define CORESIGHT_ETM_PMU_NAME "cs_etm" -#define CORESIGHT_ETM_PMU_SEED 0x10
+/*
- Metadata now contains an unused trace ID - IDs are transmitted using a
- PERF_RECORD_AUX_OUTPUT_HW_ID record.
- Value architecturally defined as reserved in CoreSight.
- */
+#define CS_UNUSED_TRACE_ID 0x7F
minor nit: this isn't used in the kernel so only needs to be defined on the tools side.
Unfortunately if the two versions of coresight-pmu.h are different, the build process for perf throws out a warning. So they have to be identical.
I was thinking more along the lines of putting it in a header that is only present on the perf side, rather than only having it in one version of a shared header.
Thanks
Mike
/*
- Below are the definition of bit offsets for perf option, and works as
@@ -34,15 +40,4 @@ #define ETM4_CFG_BIT_RETSTK 12 #define ETM4_CFG_BIT_VMID_OPT 15
-static inline int coresight_get_trace_id(int cpu) -{
/*
* A trace ID of value 0 is invalid, so let's start at some
* random value that fits in 7 bits and go from there. Since
* the common convention is to have data trace IDs be I(N) + 1,
* set instruction trace IDs as a function of the CPU number.
*/
return (CORESIGHT_ETM_PMU_SEED + (cpu * 2));
-}
#endif diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 1b54638d53b0..2d68e6a722ed 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -421,13 +421,16 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, evlist__to_front(evlist, cs_etm_evsel);
/*
* In the case of per-cpu mmaps, we need the CPU on the
* AUX event. We also need the contextID in order to be notified
* get the CPU on the sample - need it to associate trace ID in the
* AUX_OUTPUT_HW_ID event, and the AUX event for per-cpu mmaps.
*/
evsel__set_sample_bit(cs_etm_evsel, CPU);
/*
* Also the case of per-cpu mmaps, need the contextID in order to be notified * when a context switch happened. */ if (!perf_cpu_map__empty(cpus)) {
evsel__set_sample_bit(cs_etm_evsel, CPU);
err = cs_etm_set_option(itr, cs_etm_evsel, BIT(ETM_OPT_CTXTID) | BIT(ETM_OPT_TS)); if (err)
@@ -633,8 +636,9 @@ static void cs_etm_save_etmv4_header(__u64 data[], struct auxtrace_record *itr,
/* Get trace configuration register */ data[CS_ETMV4_TRCCONFIGR] = cs_etmv4_get_config(itr);
/* Get traceID from the framework */
data[CS_ETMV4_TRCTRACEIDR] = coresight_get_trace_id(cpu);
/* traceID set to unused */
data[CS_ETMV4_TRCTRACEIDR] = CS_UNUSED_TRACE_ID;
/* Get read-only information from sysFS */ data[CS_ETMV4_TRCIDR0] = cs_etm_get_ro(cs_etm_pmu, cpu, metadata_etmv4_ro[CS_ETMV4_TRCIDR0]);
@@ -681,9 +685,8 @@ static void cs_etm_get_metadata(int cpu, u32 *offset, magic = __perf_cs_etmv3_magic; /* Get configuration register */ info->priv[*offset + CS_ETM_ETMCR] = cs_etm_get_config(itr);
/* Get traceID from the framework */
info->priv[*offset + CS_ETM_ETMTRACEIDR] =
coresight_get_trace_id(cpu);
/* traceID set to unused */
info->priv[*offset + CS_ETM_ETMTRACEIDR] = CS_UNUSED_TRACE_ID; /* Get read-only information from sysFS */ info->priv[*offset + CS_ETM_ETMCCER] = cs_etm_get_ro(cs_etm_pmu, cpu,
CoreSight trace being updated to use the perf_report_aux_output_id() in a similar way to intel-pt.
This function in needs export visibility to allow it to be called from kernel loadable modules, which CoreSight may configured to be built as.
Signed-off-by: Mike Leach mike.leach@linaro.org --- kernel/events/core.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/events/core.c b/kernel/events/core.c index 80782cddb1da..f5835e5833cd 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9117,6 +9117,7 @@ void perf_report_aux_output_id(struct perf_event *event, u64 hw_id)
perf_output_end(&handle); } +EXPORT_SYMBOL_GPL(perf_report_aux_output_id);
static int __perf_event_account_interrupt(struct perf_event *event, int throttle)
On 04/07/2022 09:11, Mike Leach wrote:
CoreSight trace being updated to use the perf_report_aux_output_id() in a similar way to intel-pt.
This function in needs export visibility to allow it to be called from kernel loadable modules, which CoreSight may configured to be built as.
Signed-off-by: Mike Leach mike.leach@linaro.org
kernel/events/core.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/events/core.c b/kernel/events/core.c index 80782cddb1da..f5835e5833cd 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9117,6 +9117,7 @@ void perf_report_aux_output_id(struct perf_event *event, u64 hw_id) perf_output_end(&handle); } +EXPORT_SYMBOL_GPL(perf_report_aux_output_id); static int __perf_event_account_interrupt(struct perf_event *event, int throttle)
Acked-by: Suzuki K Poulose suzuki.poulose@arm.com
When using dynamically assigned CoreSight trace IDs the drivers can output the ID / CPU association as a PERF_RECORD_AUX_OUTPUT_HW_ID packet.
Update cs-etm decoder to handle this packet by setting the CPU/Trace ID mapping.
Signed-off-by: Mike Leach mike.leach@linaro.org --- tools/include/linux/coresight-pmu.h | 14 ++ .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 9 + tools/perf/util/cs-etm.c | 167 +++++++++++++++++- 3 files changed, 185 insertions(+), 5 deletions(-)
diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h index 31d007fab3a6..4e8b3148f939 100644 --- a/tools/include/linux/coresight-pmu.h +++ b/tools/include/linux/coresight-pmu.h @@ -7,6 +7,8 @@ #ifndef _LINUX_CORESIGHT_PMU_H #define _LINUX_CORESIGHT_PMU_H
+#include <linux/bits.h> + #define CORESIGHT_ETM_PMU_NAME "cs_etm"
/* @@ -40,4 +42,16 @@ #define ETM4_CFG_BIT_RETSTK 12 #define ETM4_CFG_BIT_VMID_OPT 15
+/* + * Interpretation of the PERF_RECORD_AUX_OUTPUT_HW_ID payload. + * Used to associate a CPU with the CoreSight Trace ID. + * [63:16] - unused SBZ + * [15:08] - Trace ID + * [07:00] - Version + */ +#define CS_AUX_HW_ID_VERSION_MASK GENMASK_ULL(7, 0) +#define CS_AUX_HW_ID_TRACE_ID_MASK GENMASK_ULL(15, 8) + +#define CS_AUX_HW_ID_CURR_VERSION 0 + #endif diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 31fa3b45134a..d1dd73310707 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -611,6 +611,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( return resp; }
+#define CS_TRACE_ID_MASK GENMASK(6, 0) + static int cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params, struct cs_etm_trace_params *t_params, @@ -625,6 +627,7 @@ cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params, switch (t_params->protocol) { case CS_ETM_PROTO_ETMV3: case CS_ETM_PROTO_PTM: + csid = (t_params->etmv3.reg_idr & CS_TRACE_ID_MASK); cs_etm_decoder__gen_etmv3_config(t_params, &config_etmv3); decoder->decoder_name = (t_params->protocol == CS_ETM_PROTO_ETMV3) ? OCSD_BUILTIN_DCD_ETMV3 : @@ -632,11 +635,13 @@ cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params, trace_config = &config_etmv3; break; case CS_ETM_PROTO_ETMV4i: + csid = (t_params->etmv4.reg_traceidr & CS_TRACE_ID_MASK); cs_etm_decoder__gen_etmv4_config(t_params, &trace_config_etmv4); decoder->decoder_name = OCSD_BUILTIN_DCD_ETMV4I; trace_config = &trace_config_etmv4; break; case CS_ETM_PROTO_ETE: + csid = (t_params->ete.reg_traceidr & CS_TRACE_ID_MASK); cs_etm_decoder__gen_ete_config(t_params, &trace_config_ete); decoder->decoder_name = OCSD_BUILTIN_DCD_ETE; trace_config = &trace_config_ete; @@ -645,6 +650,10 @@ cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params, return -1; }
+ /* if the CPU has no trace ID associated, no decoder needed */ + if (csid == CS_UNUSED_TRACE_ID) + return 0; + if (d_params->operation == CS_ETM_OPERATION_DECODE) { if (ocsd_dt_create_decoder(decoder->dcd_tree, decoder->decoder_name, diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index df9d67901f8d..ffce858f21fd 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -217,6 +217,139 @@ static int cs_etm__map_trace_id(u8 trace_chan_id, u64 *cpu_metadata) return 0; }
+static int cs_etm__metadata_get_trace_id(u8 *trace_chan_id, u64 *cpu_metadata) +{ + u64 cs_etm_magic = cpu_metadata[CS_ETM_MAGIC]; + + switch (cs_etm_magic) { + case __perf_cs_etmv3_magic: + *trace_chan_id = cpu_metadata[CS_ETM_ETMTRACEIDR]; + break; + case __perf_cs_etmv4_magic: + case __perf_cs_ete_magic: + *trace_chan_id = cpu_metadata[CS_ETMV4_TRCTRACEIDR]; + break; + + default: + return -EINVAL; + } + return 0; +} + +static int cs_etm__metadata_set_trace_id(u8 trace_chan_id, u64 *cpu_metadata) +{ + u64 cs_etm_magic = cpu_metadata[CS_ETM_MAGIC]; + + switch (cs_etm_magic) { + case __perf_cs_etmv3_magic: + cpu_metadata[CS_ETM_ETMTRACEIDR] = trace_chan_id; + break; + case __perf_cs_etmv4_magic: + case __perf_cs_ete_magic: + cpu_metadata[CS_ETMV4_TRCTRACEIDR] = trace_chan_id; + break; + + default: + return -EINVAL; + } + return 0; +} + +/* + * FIELD_GET (linux/bitfield.h) not available outside kernel code, + * and the header contains too many dependencies to just copy over, + * so roll our own based on the original + */ +#define __bf_shf(x) (__builtin_ffsll(x) - 1) +#define FIELD_GET(_mask, _reg) \ + ({ \ + (typeof(_mask))(((_reg) & (_mask)) >> __bf_shf(_mask)); \ + }) + +/* + * Handle the PERF_RECORD_AUX_OUTPUT_HW_ID event. + * + * The payload associates the Trace ID and the CPU. + * The routine is tolerant of seeing multiple packets with the same association, + * but a CPU / Trace ID association changing during a session is an error. + */ +static int cs_etm__process_aux_output_hw_id(struct perf_session *session, + union perf_event *event) +{ + struct cs_etm_auxtrace *etm; + struct perf_sample sample; + struct int_node *inode; + struct evsel *evsel; + u64 *cpu_data; + u64 hw_id; + int cpu, version, err; + u8 trace_chan_id, curr_chan_id; + + /* extract and parse the HW ID */ + hw_id = event->aux_output_hw_id.hw_id; + version = FIELD_GET(CS_AUX_HW_ID_VERSION_MASK, hw_id); + trace_chan_id = FIELD_GET(CS_AUX_HW_ID_TRACE_ID_MASK, hw_id); + + /* check that we can handle this version */ + if (version > CS_AUX_HW_ID_CURR_VERSION) + return -EINVAL; + + /* get access to the etm metadata */ + etm = container_of(session->auxtrace, struct cs_etm_auxtrace, auxtrace); + if (!etm || !etm->metadata) + return -EINVAL; + + /* parse the sample to get the CPU */ + evsel = evlist__event2evsel(session->evlist, event); + if (!evsel) + return -EINVAL; + err = evsel__parse_sample(evsel, event, &sample); + if (err) + return err; + cpu = sample.cpu; + if (cpu == -1) { + /* no CPU in the sample - possibly recorded with an old version of perf */ + pr_err("CS_ETM: no CPU AUX_OUTPUT_HW_ID sample. Use compatible perf to record."); + return -EINVAL; + } + + /* + * look to see if the metadata contains a valid trace ID. + * if so we mapped it before and it must be the same as the ID in the packet. + */ + cpu_data = etm->metadata[cpu]; + err = cs_etm__metadata_get_trace_id(&curr_chan_id, cpu_data); + if (err) + return err; + if (CS_IS_VALID_TRACE_ID(curr_chan_id) && (curr_chan_id != trace_chan_id)) { + pr_err("CS_ETM: mismatch between CPU trace ID and HW_ID packet ID\n"); + return -EINVAL; + } + + /* next see if the ID is mapped to a CPU, and it matches the current CPU */ + inode = intlist__find(traceid_list, trace_chan_id); + if (inode) { + cpu_data = inode->priv; + if ((int)cpu_data[CS_ETM_CPU] != cpu) { + pr_err("CS_ETM: map mismatch between HW_ID packet CPU and Trace ID\n"); + return -EINVAL; + } + return 0; + } + + /* not one we've seen before - lets map it */ + err = cs_etm__map_trace_id(trace_chan_id, cpu_data); + if (err) + return err; + + /* + * if we are picking up the association from the packet, need to plug + * the correct trace ID into the metadata for setting up decoders later. + */ + err = cs_etm__metadata_set_trace_id(trace_chan_id, cpu_data); + return err; +} + void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq, u8 trace_chan_id) { @@ -2433,6 +2566,8 @@ static int cs_etm__process_event(struct perf_session *session, return cs_etm__process_itrace_start(etm, event); else if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE) return cs_etm__process_switch_cpu_wide(etm, event); + else if (event->header.type == PERF_RECORD_AUX_OUTPUT_HW_ID) + return cs_etm__process_aux_output_hw_id(session, event);
if (!etm->timeless_decoding && event->header.type == PERF_RECORD_AUX) { /* @@ -2662,7 +2797,7 @@ static void cs_etm__print_auxtrace_info(__u64 *val, int num) for (i = CS_HEADER_VERSION_MAX; cpu < num; cpu++) { if (version == 0) err = cs_etm__print_cpu_metadata_v0(val, &i); - else if (version == 1) + else if (version == 1 || version == 2) err = cs_etm__print_cpu_metadata_v1(val, &i); if (err) return; @@ -2774,11 +2909,16 @@ static int cs_etm__queue_aux_fragment(struct perf_session *session, off_t file_o }
/* - * In per-thread mode, CPU is set to -1, but TID will be set instead. See - * auxtrace_mmap_params__set_idx(). Return 'not found' if neither CPU nor TID match. + * In per-thread mode, auxtrace CPU is set to -1, but TID will be set instead. See + * auxtrace_mmap_params__set_idx(). However, the sample AUX event will contain a + * CPU as we set this always for the AUX_OUTPUT_HW_ID event. + * So now compare only TIDs if auxtrace CPU is -1, and CPUs if auxtrace CPU is not -1. + * Return 'not found' if mismatch. */ - if ((auxtrace_event->cpu == (__u32) -1 && auxtrace_event->tid != sample->tid) || - auxtrace_event->cpu != sample->cpu) + if (auxtrace_event->cpu == (__u32) -1) { + if (auxtrace_event->tid != sample->tid) + return 1; + } else if (auxtrace_event->cpu != sample->cpu) return 1;
if (aux_event->flags & PERF_AUX_FLAG_OVERWRITE) { @@ -2827,6 +2967,15 @@ static int cs_etm__queue_aux_fragment(struct perf_session *session, off_t file_o return 1; }
+static int cs_etm__process_aux_hw_id_cb(struct perf_session *session, union perf_event *event, + u64 offset __maybe_unused, void *data __maybe_unused) +{ + /* look to handle PERF_RECORD_AUX_OUTPUT_HW_ID early to ensure decoders can be set up */ + if (event->header.type == PERF_RECORD_AUX_OUTPUT_HW_ID) + return cs_etm__process_aux_output_hw_id(session, event); + return 0; +} + static int cs_etm__queue_aux_records_cb(struct perf_session *session, union perf_event *event, u64 offset __maybe_unused, void *data __maybe_unused) { @@ -3109,6 +3258,14 @@ int cs_etm__process_auxtrace_info(union perf_event *event, if (err) goto err_delete_thread;
+ /* scan for AUX_OUTPUT_HW_ID records */ + if (hdr_version >= CS_AUX_HW_ID_VERSION_MIN) { + err = perf_session__peek_events(session, session->header.data_offset, + session->header.data_size, + cs_etm__process_aux_hw_id_cb, NULL); + if (err) + goto err_delete_thread; + } err = cs_etm__queue_aux_records(session); if (err) goto err_delete_thread;
On 04/07/2022 09:11, Mike Leach wrote:
When using dynamically assigned CoreSight trace IDs the drivers can output the ID / CPU association as a PERF_RECORD_AUX_OUTPUT_HW_ID packet.
Update cs-etm decoder to handle this packet by setting the CPU/Trace ID mapping.
Signed-off-by: Mike Leach mike.leach@linaro.org
tools/include/linux/coresight-pmu.h | 14 ++ .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 9 + tools/perf/util/cs-etm.c | 167 +++++++++++++++++- 3 files changed, 185 insertions(+), 5 deletions(-)
diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h index 31d007fab3a6..4e8b3148f939 100644 --- a/tools/include/linux/coresight-pmu.h +++ b/tools/include/linux/coresight-pmu.h @@ -7,6 +7,8 @@ #ifndef _LINUX_CORESIGHT_PMU_H #define _LINUX_CORESIGHT_PMU_H +#include <linux/bits.h>
#define CORESIGHT_ETM_PMU_NAME "cs_etm" /* @@ -40,4 +42,16 @@ #define ETM4_CFG_BIT_RETSTK 12 #define ETM4_CFG_BIT_VMID_OPT 15 +/*
- Interpretation of the PERF_RECORD_AUX_OUTPUT_HW_ID payload.
- Used to associate a CPU with the CoreSight Trace ID.
- [63:16] - unused SBZ
- [15:08] - Trace ID
- [07:00] - Version
- */
+#define CS_AUX_HW_ID_VERSION_MASK GENMASK_ULL(7, 0) +#define CS_AUX_HW_ID_TRACE_ID_MASK GENMASK_ULL(15, 8)
+#define CS_AUX_HW_ID_CURR_VERSION 0
#endif diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 31fa3b45134a..d1dd73310707 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -611,6 +611,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( return resp; } +#define CS_TRACE_ID_MASK GENMASK(6, 0)
static int cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params, struct cs_etm_trace_params *t_params, @@ -625,6 +627,7 @@ cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params, switch (t_params->protocol) { case CS_ETM_PROTO_ETMV3: case CS_ETM_PROTO_PTM:
cs_etm_decoder__gen_etmv3_config(t_params, &config_etmv3); decoder->decoder_name = (t_params->protocol == CS_ETM_PROTO_ETMV3) ? OCSD_BUILTIN_DCD_ETMV3 :csid = (t_params->etmv3.reg_idr & CS_TRACE_ID_MASK);
@@ -632,11 +635,13 @@ cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params, trace_config = &config_etmv3; break; case CS_ETM_PROTO_ETMV4i:
cs_etm_decoder__gen_etmv4_config(t_params, &trace_config_etmv4); decoder->decoder_name = OCSD_BUILTIN_DCD_ETMV4I; trace_config = &trace_config_etmv4; break; case CS_ETM_PROTO_ETE:csid = (t_params->etmv4.reg_traceidr & CS_TRACE_ID_MASK);
cs_etm_decoder__gen_ete_config(t_params, &trace_config_ete); decoder->decoder_name = OCSD_BUILTIN_DCD_ETE; trace_config = &trace_config_ete;csid = (t_params->ete.reg_traceidr & CS_TRACE_ID_MASK);
@@ -645,6 +650,10 @@ cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params, return -1; }
- /* if the CPU has no trace ID associated, no decoder needed */
- if (csid == CS_UNUSED_TRACE_ID)
return 0;
- if (d_params->operation == CS_ETM_OPERATION_DECODE) { if (ocsd_dt_create_decoder(decoder->dcd_tree, decoder->decoder_name,
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index df9d67901f8d..ffce858f21fd 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -217,6 +217,139 @@ static int cs_etm__map_trace_id(u8 trace_chan_id, u64 *cpu_metadata) return 0; } +static int cs_etm__metadata_get_trace_id(u8 *trace_chan_id, u64 *cpu_metadata) +{
- u64 cs_etm_magic = cpu_metadata[CS_ETM_MAGIC];
- switch (cs_etm_magic) {
- case __perf_cs_etmv3_magic:
*trace_chan_id = cpu_metadata[CS_ETM_ETMTRACEIDR];
break;
- case __perf_cs_etmv4_magic:
- case __perf_cs_ete_magic:
*trace_chan_id = cpu_metadata[CS_ETMV4_TRCTRACEIDR];
break;
- default:
return -EINVAL;
- }
- return 0;
+}
+static int cs_etm__metadata_set_trace_id(u8 trace_chan_id, u64 *cpu_metadata) +{
- u64 cs_etm_magic = cpu_metadata[CS_ETM_MAGIC];
- switch (cs_etm_magic) {
- case __perf_cs_etmv3_magic:
cpu_metadata[CS_ETM_ETMTRACEIDR] = trace_chan_id;
break;
- case __perf_cs_etmv4_magic:
- case __perf_cs_ete_magic:
cpu_metadata[CS_ETMV4_TRCTRACEIDR] = trace_chan_id;
break;
- default:
return -EINVAL;
- }
- return 0;
+}
+/*
- FIELD_GET (linux/bitfield.h) not available outside kernel code,
- and the header contains too many dependencies to just copy over,
- so roll our own based on the original
- */
+#define __bf_shf(x) (__builtin_ffsll(x) - 1) +#define FIELD_GET(_mask, _reg) \
- ({ \
(typeof(_mask))(((_reg) & (_mask)) >> __bf_shf(_mask)); \
- })
+> +/*
- Handle the PERF_RECORD_AUX_OUTPUT_HW_ID event.
- The payload associates the Trace ID and the CPU.
- The routine is tolerant of seeing multiple packets with the same association,
- but a CPU / Trace ID association changing during a session is an error.
- */
+static int cs_etm__process_aux_output_hw_id(struct perf_session *session,
union perf_event *event)
+{
- struct cs_etm_auxtrace *etm;
- struct perf_sample sample;
- struct int_node *inode;
- struct evsel *evsel;
- u64 *cpu_data;
- u64 hw_id;
- int cpu, version, err;
- u8 trace_chan_id, curr_chan_id;
- /* extract and parse the HW ID */
- hw_id = event->aux_output_hw_id.hw_id;
- version = FIELD_GET(CS_AUX_HW_ID_VERSION_MASK, hw_id);
- trace_chan_id = FIELD_GET(CS_AUX_HW_ID_TRACE_ID_MASK, hw_id);
- /* check that we can handle this version */
- if (version > CS_AUX_HW_ID_CURR_VERSION)
return -EINVAL;
- /* get access to the etm metadata */
- etm = container_of(session->auxtrace, struct cs_etm_auxtrace, auxtrace);
- if (!etm || !etm->metadata)
return -EINVAL;
- /* parse the sample to get the CPU */
- evsel = evlist__event2evsel(session->evlist, event);
- if (!evsel)
return -EINVAL;
- err = evsel__parse_sample(evsel, event, &sample);
- if (err)
return err;
- cpu = sample.cpu;
- if (cpu == -1) {
/* no CPU in the sample - possibly recorded with an old version of perf */
pr_err("CS_ETM: no CPU AUX_OUTPUT_HW_ID sample. Use compatible perf to record.");
return -EINVAL;
- }
- /*
* look to see if the metadata contains a valid trace ID.
* if so we mapped it before and it must be the same as the ID in the packet.
*/
- cpu_data = etm->metadata[cpu];
- err = cs_etm__metadata_get_trace_id(&curr_chan_id, cpu_data);
- if (err)
return err;
- if (CS_IS_VALID_TRACE_ID(curr_chan_id) && (curr_chan_id != trace_chan_id)) {
pr_err("CS_ETM: mismatch between CPU trace ID and HW_ID packet ID\n");
return -EINVAL;
- }
- /* next see if the ID is mapped to a CPU, and it matches the current CPU */
- inode = intlist__find(traceid_list, trace_chan_id);
- if (inode) {
cpu_data = inode->priv;
if ((int)cpu_data[CS_ETM_CPU] != cpu) {
pr_err("CS_ETM: map mismatch between HW_ID packet CPU and Trace ID\n");
return -EINVAL;
}
return 0;
- }
- /* not one we've seen before - lets map it */
- err = cs_etm__map_trace_id(trace_chan_id, cpu_data);
- if (err)
return err;
- /*
* if we are picking up the association from the packet, need to plug
* the correct trace ID into the metadata for setting up decoders later.
*/
- err = cs_etm__metadata_set_trace_id(trace_chan_id, cpu_data);
- return err;
+}
void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq, u8 trace_chan_id) { @@ -2433,6 +2566,8 @@ static int cs_etm__process_event(struct perf_session *session, return cs_etm__process_itrace_start(etm, event); else if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE) return cs_etm__process_switch_cpu_wide(etm, event);
- else if (event->header.type == PERF_RECORD_AUX_OUTPUT_HW_ID)
return cs_etm__process_aux_output_hw_id(session, event);
This shouldn't need to be handled here because of the peek at the beginning. Although it's probably harmless to do it twice, it can make deciphering the flow quite difficult.
if (!etm->timeless_decoding && event->header.type == PERF_RECORD_AUX) { /* @@ -2662,7 +2797,7 @@ static void cs_etm__print_auxtrace_info(__u64 *val, int num) for (i = CS_HEADER_VERSION_MAX; cpu < num; cpu++) { if (version == 0) err = cs_etm__print_cpu_metadata_v0(val, &i);
else if (version == 1)
if (err) return;else if (version == 1 || version == 2) err = cs_etm__print_cpu_metadata_v1(val, &i);
@@ -2774,11 +2909,16 @@ static int cs_etm__queue_aux_fragment(struct perf_session *session, off_t file_o } /*
* In per-thread mode, CPU is set to -1, but TID will be set instead. See
* auxtrace_mmap_params__set_idx(). Return 'not found' if neither CPU nor TID match.
* In per-thread mode, auxtrace CPU is set to -1, but TID will be set instead. See
* auxtrace_mmap_params__set_idx(). However, the sample AUX event will contain a
* CPU as we set this always for the AUX_OUTPUT_HW_ID event.
* So now compare only TIDs if auxtrace CPU is -1, and CPUs if auxtrace CPU is not -1.
*/* Return 'not found' if mismatch.
- if ((auxtrace_event->cpu == (__u32) -1 && auxtrace_event->tid != sample->tid) ||
auxtrace_event->cpu != sample->cpu)
- if (auxtrace_event->cpu == (__u32) -1) {
if (auxtrace_event->tid != sample->tid)
return 1;
- } else if (auxtrace_event->cpu != sample->cpu) return 1;
if (aux_event->flags & PERF_AUX_FLAG_OVERWRITE) { @@ -2827,6 +2967,15 @@ static int cs_etm__queue_aux_fragment(struct perf_session *session, off_t file_o return 1; } +static int cs_etm__process_aux_hw_id_cb(struct perf_session *session, union perf_event *event,
u64 offset __maybe_unused, void *data __maybe_unused)
+{
- /* look to handle PERF_RECORD_AUX_OUTPUT_HW_ID early to ensure decoders can be set up */
- if (event->header.type == PERF_RECORD_AUX_OUTPUT_HW_ID)
return cs_etm__process_aux_output_hw_id(session, event);
- return 0;
+}
I couldn't see the relationship between the two peeks and why they couldn't be done together in one pass. I changed it so cs_etm__process_aux_hw_id_cb() is also called on the peek to queue the aux records and it seemed to work. At least just opening the file and glancing.
If there is some dependency though, I don't think two passes is excessive.
static int cs_etm__queue_aux_records_cb(struct perf_session *session, union perf_event *event, u64 offset __maybe_unused, void *data __maybe_unused) { @@ -3109,6 +3258,14 @@ int cs_etm__process_auxtrace_info(union perf_event *event, if (err) goto err_delete_thread;
- /* scan for AUX_OUTPUT_HW_ID records */
- if (hdr_version >= CS_AUX_HW_ID_VERSION_MIN) {
err = perf_session__peek_events(session, session->header.data_offset,
session->header.data_size,
cs_etm__process_aux_hw_id_cb, NULL);
This no longer works at all with piping because of this line in peek_events:
if (perf_data__is_pipe(session->data)) return -1;
So we should change the warning message to an error and exit earlier:
if (!etm->data_queued) pr_warning("CS ETM warning: Coresight decode and TRBE support requires random file access.\n" "Continuing with best effort decoding in piped mode.\n\n");
And then we can also remove all the now dead code and variables related to piping like:
etm->data_queued = etm->queues.populated; ...
if (!etm->data_queued) { ... }
if (err)
goto err_delete_thread;
- } err = cs_etm__queue_aux_records(session); if (err) goto err_delete_thread;
Hi James
On Wed, 20 Jul 2022 at 17:07, James Clark james.clark@arm.com wrote:
On 04/07/2022 09:11, Mike Leach wrote:
When using dynamically assigned CoreSight trace IDs the drivers can output the ID / CPU association as a PERF_RECORD_AUX_OUTPUT_HW_ID packet.
Update cs-etm decoder to handle this packet by setting the CPU/Trace ID mapping.
Signed-off-by: Mike Leach mike.leach@linaro.org
tools/include/linux/coresight-pmu.h | 14 ++ .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 9 + tools/perf/util/cs-etm.c | 167 +++++++++++++++++- 3 files changed, 185 insertions(+), 5 deletions(-)
diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h index 31d007fab3a6..4e8b3148f939 100644 --- a/tools/include/linux/coresight-pmu.h +++ b/tools/include/linux/coresight-pmu.h @@ -7,6 +7,8 @@ #ifndef _LINUX_CORESIGHT_PMU_H #define _LINUX_CORESIGHT_PMU_H
+#include <linux/bits.h>
#define CORESIGHT_ETM_PMU_NAME "cs_etm"
/* @@ -40,4 +42,16 @@ #define ETM4_CFG_BIT_RETSTK 12 #define ETM4_CFG_BIT_VMID_OPT 15
+/*
- Interpretation of the PERF_RECORD_AUX_OUTPUT_HW_ID payload.
- Used to associate a CPU with the CoreSight Trace ID.
- [63:16] - unused SBZ
- [15:08] - Trace ID
- [07:00] - Version
- */
+#define CS_AUX_HW_ID_VERSION_MASK GENMASK_ULL(7, 0) +#define CS_AUX_HW_ID_TRACE_ID_MASK GENMASK_ULL(15, 8)
+#define CS_AUX_HW_ID_CURR_VERSION 0
#endif diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 31fa3b45134a..d1dd73310707 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -611,6 +611,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( return resp; }
+#define CS_TRACE_ID_MASK GENMASK(6, 0)
static int cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params, struct cs_etm_trace_params *t_params, @@ -625,6 +627,7 @@ cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params, switch (t_params->protocol) { case CS_ETM_PROTO_ETMV3: case CS_ETM_PROTO_PTM:
csid = (t_params->etmv3.reg_idr & CS_TRACE_ID_MASK); cs_etm_decoder__gen_etmv3_config(t_params, &config_etmv3); decoder->decoder_name = (t_params->protocol == CS_ETM_PROTO_ETMV3) ? OCSD_BUILTIN_DCD_ETMV3 :
@@ -632,11 +635,13 @@ cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params, trace_config = &config_etmv3; break; case CS_ETM_PROTO_ETMV4i:
csid = (t_params->etmv4.reg_traceidr & CS_TRACE_ID_MASK); cs_etm_decoder__gen_etmv4_config(t_params, &trace_config_etmv4); decoder->decoder_name = OCSD_BUILTIN_DCD_ETMV4I; trace_config = &trace_config_etmv4; break; case CS_ETM_PROTO_ETE:
csid = (t_params->ete.reg_traceidr & CS_TRACE_ID_MASK); cs_etm_decoder__gen_ete_config(t_params, &trace_config_ete); decoder->decoder_name = OCSD_BUILTIN_DCD_ETE; trace_config = &trace_config_ete;
@@ -645,6 +650,10 @@ cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params, return -1; }
/* if the CPU has no trace ID associated, no decoder needed */
if (csid == CS_UNUSED_TRACE_ID)
return 0;
if (d_params->operation == CS_ETM_OPERATION_DECODE) { if (ocsd_dt_create_decoder(decoder->dcd_tree, decoder->decoder_name,
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index df9d67901f8d..ffce858f21fd 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -217,6 +217,139 @@ static int cs_etm__map_trace_id(u8 trace_chan_id, u64 *cpu_metadata) return 0; }
+static int cs_etm__metadata_get_trace_id(u8 *trace_chan_id, u64 *cpu_metadata) +{
u64 cs_etm_magic = cpu_metadata[CS_ETM_MAGIC];
switch (cs_etm_magic) {
case __perf_cs_etmv3_magic:
*trace_chan_id = cpu_metadata[CS_ETM_ETMTRACEIDR];
break;
case __perf_cs_etmv4_magic:
case __perf_cs_ete_magic:
*trace_chan_id = cpu_metadata[CS_ETMV4_TRCTRACEIDR];
break;
default:
return -EINVAL;
}
return 0;
+}
+static int cs_etm__metadata_set_trace_id(u8 trace_chan_id, u64 *cpu_metadata) +{
u64 cs_etm_magic = cpu_metadata[CS_ETM_MAGIC];
switch (cs_etm_magic) {
case __perf_cs_etmv3_magic:
cpu_metadata[CS_ETM_ETMTRACEIDR] = trace_chan_id;
break;
case __perf_cs_etmv4_magic:
case __perf_cs_ete_magic:
cpu_metadata[CS_ETMV4_TRCTRACEIDR] = trace_chan_id;
break;
default:
return -EINVAL;
}
return 0;
+}
+/*
- FIELD_GET (linux/bitfield.h) not available outside kernel code,
- and the header contains too many dependencies to just copy over,
- so roll our own based on the original
- */
+#define __bf_shf(x) (__builtin_ffsll(x) - 1) +#define FIELD_GET(_mask, _reg) \
({ \
(typeof(_mask))(((_reg) & (_mask)) >> __bf_shf(_mask)); \
})
+> +/*
- Handle the PERF_RECORD_AUX_OUTPUT_HW_ID event.
- The payload associates the Trace ID and the CPU.
- The routine is tolerant of seeing multiple packets with the same association,
- but a CPU / Trace ID association changing during a session is an error.
- */
+static int cs_etm__process_aux_output_hw_id(struct perf_session *session,
union perf_event *event)
+{
struct cs_etm_auxtrace *etm;
struct perf_sample sample;
struct int_node *inode;
struct evsel *evsel;
u64 *cpu_data;
u64 hw_id;
int cpu, version, err;
u8 trace_chan_id, curr_chan_id;
/* extract and parse the HW ID */
hw_id = event->aux_output_hw_id.hw_id;
version = FIELD_GET(CS_AUX_HW_ID_VERSION_MASK, hw_id);
trace_chan_id = FIELD_GET(CS_AUX_HW_ID_TRACE_ID_MASK, hw_id);
/* check that we can handle this version */
if (version > CS_AUX_HW_ID_CURR_VERSION)
return -EINVAL;
/* get access to the etm metadata */
etm = container_of(session->auxtrace, struct cs_etm_auxtrace, auxtrace);
if (!etm || !etm->metadata)
return -EINVAL;
/* parse the sample to get the CPU */
evsel = evlist__event2evsel(session->evlist, event);
if (!evsel)
return -EINVAL;
err = evsel__parse_sample(evsel, event, &sample);
if (err)
return err;
cpu = sample.cpu;
if (cpu == -1) {
/* no CPU in the sample - possibly recorded with an old version of perf */
pr_err("CS_ETM: no CPU AUX_OUTPUT_HW_ID sample. Use compatible perf to record.");
return -EINVAL;
}
/*
* look to see if the metadata contains a valid trace ID.
* if so we mapped it before and it must be the same as the ID in the packet.
*/
cpu_data = etm->metadata[cpu];
err = cs_etm__metadata_get_trace_id(&curr_chan_id, cpu_data);
if (err)
return err;
if (CS_IS_VALID_TRACE_ID(curr_chan_id) && (curr_chan_id != trace_chan_id)) {
pr_err("CS_ETM: mismatch between CPU trace ID and HW_ID packet ID\n");
return -EINVAL;
}
/* next see if the ID is mapped to a CPU, and it matches the current CPU */
inode = intlist__find(traceid_list, trace_chan_id);
if (inode) {
cpu_data = inode->priv;
if ((int)cpu_data[CS_ETM_CPU] != cpu) {
pr_err("CS_ETM: map mismatch between HW_ID packet CPU and Trace ID\n");
return -EINVAL;
}
return 0;
}
/* not one we've seen before - lets map it */
err = cs_etm__map_trace_id(trace_chan_id, cpu_data);
if (err)
return err;
/*
* if we are picking up the association from the packet, need to plug
* the correct trace ID into the metadata for setting up decoders later.
*/
err = cs_etm__metadata_set_trace_id(trace_chan_id, cpu_data);
return err;
+}
void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq, u8 trace_chan_id) { @@ -2433,6 +2566,8 @@ static int cs_etm__process_event(struct perf_session *session, return cs_etm__process_itrace_start(etm, event); else if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE) return cs_etm__process_switch_cpu_wide(etm, event);
else if (event->header.type == PERF_RECORD_AUX_OUTPUT_HW_ID)
return cs_etm__process_aux_output_hw_id(session, event);
This shouldn't need to be handled here because of the peek at the beginning. Although it's probably harmless to do it twice, it can make deciphering the flow quite difficult.
Agreed - this was really belt and braces coding while I was testing - and where PT decoded it. Given the peek events this can be dropped next time.
if (!etm->timeless_decoding && event->header.type == PERF_RECORD_AUX) { /*
@@ -2662,7 +2797,7 @@ static void cs_etm__print_auxtrace_info(__u64 *val, int num) for (i = CS_HEADER_VERSION_MAX; cpu < num; cpu++) { if (version == 0) err = cs_etm__print_cpu_metadata_v0(val, &i);
else if (version == 1)
else if (version == 1 || version == 2) err = cs_etm__print_cpu_metadata_v1(val, &i); if (err) return;
@@ -2774,11 +2909,16 @@ static int cs_etm__queue_aux_fragment(struct perf_session *session, off_t file_o }
/*
* In per-thread mode, CPU is set to -1, but TID will be set instead. See
* auxtrace_mmap_params__set_idx(). Return 'not found' if neither CPU nor TID match.
* In per-thread mode, auxtrace CPU is set to -1, but TID will be set instead. See
* auxtrace_mmap_params__set_idx(). However, the sample AUX event will contain a
* CPU as we set this always for the AUX_OUTPUT_HW_ID event.
* So now compare only TIDs if auxtrace CPU is -1, and CPUs if auxtrace CPU is not -1.
* Return 'not found' if mismatch. */
if ((auxtrace_event->cpu == (__u32) -1 && auxtrace_event->tid != sample->tid) ||
auxtrace_event->cpu != sample->cpu)
if (auxtrace_event->cpu == (__u32) -1) {
if (auxtrace_event->tid != sample->tid)
return 1;
} else if (auxtrace_event->cpu != sample->cpu) return 1; if (aux_event->flags & PERF_AUX_FLAG_OVERWRITE) {
@@ -2827,6 +2967,15 @@ static int cs_etm__queue_aux_fragment(struct perf_session *session, off_t file_o return 1; }
+static int cs_etm__process_aux_hw_id_cb(struct perf_session *session, union perf_event *event,
u64 offset __maybe_unused, void *data __maybe_unused)
+{
/* look to handle PERF_RECORD_AUX_OUTPUT_HW_ID early to ensure decoders can be set up */
if (event->header.type == PERF_RECORD_AUX_OUTPUT_HW_ID)
return cs_etm__process_aux_output_hw_id(session, event);
return 0;
+}
I couldn't see the relationship between the two peeks and why they couldn't be done together in one pass. I changed it so cs_etm__process_aux_hw_id_cb() is also called on the peek to queue the aux records and it seemed to work. At least just opening the file and glancing.
If there is some dependency though, I don't think two passes is excessive.
I initially tried this and there are issues.
During testing I had a --per-thread run with two buffers. One buffer had only a single ID, which appeared as a packet before the buffer was processed. The second had the same ID plus a new ID, which appeared after the first buffer and before the second.
Problem is, under the current system, once data is queued, the decoders are set and it meant a decoder for the second ID was never created, resulting in a bunch of undecoded data. It may well be possible to re-examine how and when decoders are created, but there is currently a built in assumption that all IDs are available before the first buffer is queued, and changing this is well beyond the remit of this patch set.
static int cs_etm__queue_aux_records_cb(struct perf_session *session, union perf_event *event, u64 offset __maybe_unused, void *data __maybe_unused) { @@ -3109,6 +3258,14 @@ int cs_etm__process_auxtrace_info(union perf_event *event, if (err) goto err_delete_thread;
/* scan for AUX_OUTPUT_HW_ID records */
if (hdr_version >= CS_AUX_HW_ID_VERSION_MIN) {
err = perf_session__peek_events(session, session->header.data_offset,
session->header.data_size,
cs_etm__process_aux_hw_id_cb, NULL);
This no longer works at all with piping because of this line in peek_events:
if (perf_data__is_pipe(session->data)) return -1;
Does this not also apply to the: cs_etm__queue_aux_records(session); call immediately after this, which also uses perf_session__peek_events()?
So we should change the warning message to an error and exit earlier:
if (!etm->data_queued) pr_warning("CS ETM warning: Coresight decode and TRBE support requires random file access.\n" "Continuing with best effort decoding in piped mode.\n\n");
And then we can also remove all the now dead code and variables related to piping like:
etm->data_queued = etm->queues.populated; ...
if (!etm->data_queued) { ... }
which means that these were already dead code?
if (err)
goto err_delete_thread;
} err = cs_etm__queue_aux_records(session); if (err) goto err_delete_thread;
Regards
Mike
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK
On 21/07/2022 13:38, Mike Leach wrote:
Hi James
On Wed, 20 Jul 2022 at 17:07, James Clark james.clark@arm.com wrote:
On 04/07/2022 09:11, Mike Leach wrote:
When using dynamically assigned CoreSight trace IDs the drivers can output the ID / CPU association as a PERF_RECORD_AUX_OUTPUT_HW_ID packet.
Update cs-etm decoder to handle this packet by setting the CPU/Trace ID mapping.
Signed-off-by: Mike Leach mike.leach@linaro.org
tools/include/linux/coresight-pmu.h | 14 ++ .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 9 + tools/perf/util/cs-etm.c | 167 +++++++++++++++++- 3 files changed, 185 insertions(+), 5 deletions(-)
diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h index 31d007fab3a6..4e8b3148f939 100644 --- a/tools/include/linux/coresight-pmu.h +++ b/tools/include/linux/coresight-pmu.h @@ -7,6 +7,8 @@ #ifndef _LINUX_CORESIGHT_PMU_H #define _LINUX_CORESIGHT_PMU_H
+#include <linux/bits.h>
#define CORESIGHT_ETM_PMU_NAME "cs_etm"
/* @@ -40,4 +42,16 @@ #define ETM4_CFG_BIT_RETSTK 12 #define ETM4_CFG_BIT_VMID_OPT 15
+/*
- Interpretation of the PERF_RECORD_AUX_OUTPUT_HW_ID payload.
- Used to associate a CPU with the CoreSight Trace ID.
- [63:16] - unused SBZ
- [15:08] - Trace ID
- [07:00] - Version
- */
+#define CS_AUX_HW_ID_VERSION_MASK GENMASK_ULL(7, 0) +#define CS_AUX_HW_ID_TRACE_ID_MASK GENMASK_ULL(15, 8)
+#define CS_AUX_HW_ID_CURR_VERSION 0
#endif diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 31fa3b45134a..d1dd73310707 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -611,6 +611,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( return resp; }
+#define CS_TRACE_ID_MASK GENMASK(6, 0)
static int cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params, struct cs_etm_trace_params *t_params, @@ -625,6 +627,7 @@ cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params, switch (t_params->protocol) { case CS_ETM_PROTO_ETMV3: case CS_ETM_PROTO_PTM:
csid = (t_params->etmv3.reg_idr & CS_TRACE_ID_MASK); cs_etm_decoder__gen_etmv3_config(t_params, &config_etmv3); decoder->decoder_name = (t_params->protocol == CS_ETM_PROTO_ETMV3) ? OCSD_BUILTIN_DCD_ETMV3 :
@@ -632,11 +635,13 @@ cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params, trace_config = &config_etmv3; break; case CS_ETM_PROTO_ETMV4i:
csid = (t_params->etmv4.reg_traceidr & CS_TRACE_ID_MASK); cs_etm_decoder__gen_etmv4_config(t_params, &trace_config_etmv4); decoder->decoder_name = OCSD_BUILTIN_DCD_ETMV4I; trace_config = &trace_config_etmv4; break; case CS_ETM_PROTO_ETE:
csid = (t_params->ete.reg_traceidr & CS_TRACE_ID_MASK); cs_etm_decoder__gen_ete_config(t_params, &trace_config_ete); decoder->decoder_name = OCSD_BUILTIN_DCD_ETE; trace_config = &trace_config_ete;
@@ -645,6 +650,10 @@ cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params, return -1; }
/* if the CPU has no trace ID associated, no decoder needed */
if (csid == CS_UNUSED_TRACE_ID)
return 0;
if (d_params->operation == CS_ETM_OPERATION_DECODE) { if (ocsd_dt_create_decoder(decoder->dcd_tree, decoder->decoder_name,
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index df9d67901f8d..ffce858f21fd 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -217,6 +217,139 @@ static int cs_etm__map_trace_id(u8 trace_chan_id, u64 *cpu_metadata) return 0; }
+static int cs_etm__metadata_get_trace_id(u8 *trace_chan_id, u64 *cpu_metadata) +{
u64 cs_etm_magic = cpu_metadata[CS_ETM_MAGIC];
switch (cs_etm_magic) {
case __perf_cs_etmv3_magic:
*trace_chan_id = cpu_metadata[CS_ETM_ETMTRACEIDR];
break;
case __perf_cs_etmv4_magic:
case __perf_cs_ete_magic:
*trace_chan_id = cpu_metadata[CS_ETMV4_TRCTRACEIDR];
break;
default:
return -EINVAL;
}
return 0;
+}
+static int cs_etm__metadata_set_trace_id(u8 trace_chan_id, u64 *cpu_metadata) +{
u64 cs_etm_magic = cpu_metadata[CS_ETM_MAGIC];
switch (cs_etm_magic) {
case __perf_cs_etmv3_magic:
cpu_metadata[CS_ETM_ETMTRACEIDR] = trace_chan_id;
break;
case __perf_cs_etmv4_magic:
case __perf_cs_ete_magic:
cpu_metadata[CS_ETMV4_TRCTRACEIDR] = trace_chan_id;
break;
default:
return -EINVAL;
}
return 0;
+}
+/*
- FIELD_GET (linux/bitfield.h) not available outside kernel code,
- and the header contains too many dependencies to just copy over,
- so roll our own based on the original
- */
+#define __bf_shf(x) (__builtin_ffsll(x) - 1) +#define FIELD_GET(_mask, _reg) \
({ \
(typeof(_mask))(((_reg) & (_mask)) >> __bf_shf(_mask)); \
})
+> +/*
- Handle the PERF_RECORD_AUX_OUTPUT_HW_ID event.
- The payload associates the Trace ID and the CPU.
- The routine is tolerant of seeing multiple packets with the same association,
- but a CPU / Trace ID association changing during a session is an error.
- */
+static int cs_etm__process_aux_output_hw_id(struct perf_session *session,
union perf_event *event)
+{
struct cs_etm_auxtrace *etm;
struct perf_sample sample;
struct int_node *inode;
struct evsel *evsel;
u64 *cpu_data;
u64 hw_id;
int cpu, version, err;
u8 trace_chan_id, curr_chan_id;
/* extract and parse the HW ID */
hw_id = event->aux_output_hw_id.hw_id;
version = FIELD_GET(CS_AUX_HW_ID_VERSION_MASK, hw_id);
trace_chan_id = FIELD_GET(CS_AUX_HW_ID_TRACE_ID_MASK, hw_id);
/* check that we can handle this version */
if (version > CS_AUX_HW_ID_CURR_VERSION)
return -EINVAL;
/* get access to the etm metadata */
etm = container_of(session->auxtrace, struct cs_etm_auxtrace, auxtrace);
if (!etm || !etm->metadata)
return -EINVAL;
/* parse the sample to get the CPU */
evsel = evlist__event2evsel(session->evlist, event);
if (!evsel)
return -EINVAL;
err = evsel__parse_sample(evsel, event, &sample);
if (err)
return err;
cpu = sample.cpu;
if (cpu == -1) {
/* no CPU in the sample - possibly recorded with an old version of perf */
pr_err("CS_ETM: no CPU AUX_OUTPUT_HW_ID sample. Use compatible perf to record.");
return -EINVAL;
}
/*
* look to see if the metadata contains a valid trace ID.
* if so we mapped it before and it must be the same as the ID in the packet.
*/
cpu_data = etm->metadata[cpu];
err = cs_etm__metadata_get_trace_id(&curr_chan_id, cpu_data);
if (err)
return err;
if (CS_IS_VALID_TRACE_ID(curr_chan_id) && (curr_chan_id != trace_chan_id)) {
pr_err("CS_ETM: mismatch between CPU trace ID and HW_ID packet ID\n");
return -EINVAL;
}
/* next see if the ID is mapped to a CPU, and it matches the current CPU */
inode = intlist__find(traceid_list, trace_chan_id);
if (inode) {
cpu_data = inode->priv;
if ((int)cpu_data[CS_ETM_CPU] != cpu) {
pr_err("CS_ETM: map mismatch between HW_ID packet CPU and Trace ID\n");
return -EINVAL;
}
return 0;
}
/* not one we've seen before - lets map it */
err = cs_etm__map_trace_id(trace_chan_id, cpu_data);
if (err)
return err;
/*
* if we are picking up the association from the packet, need to plug
* the correct trace ID into the metadata for setting up decoders later.
*/
err = cs_etm__metadata_set_trace_id(trace_chan_id, cpu_data);
return err;
+}
void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq, u8 trace_chan_id) { @@ -2433,6 +2566,8 @@ static int cs_etm__process_event(struct perf_session *session, return cs_etm__process_itrace_start(etm, event); else if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE) return cs_etm__process_switch_cpu_wide(etm, event);
else if (event->header.type == PERF_RECORD_AUX_OUTPUT_HW_ID)
return cs_etm__process_aux_output_hw_id(session, event);
This shouldn't need to be handled here because of the peek at the beginning. Although it's probably harmless to do it twice, it can make deciphering the flow quite difficult.
Agreed - this was really belt and braces coding while I was testing - and where PT decoded it. Given the peek events this can be dropped next time.
if (!etm->timeless_decoding && event->header.type == PERF_RECORD_AUX) { /*
@@ -2662,7 +2797,7 @@ static void cs_etm__print_auxtrace_info(__u64 *val, int num) for (i = CS_HEADER_VERSION_MAX; cpu < num; cpu++) { if (version == 0) err = cs_etm__print_cpu_metadata_v0(val, &i);
else if (version == 1)
else if (version == 1 || version == 2) err = cs_etm__print_cpu_metadata_v1(val, &i); if (err) return;
@@ -2774,11 +2909,16 @@ static int cs_etm__queue_aux_fragment(struct perf_session *session, off_t file_o }
/*
* In per-thread mode, CPU is set to -1, but TID will be set instead. See
* auxtrace_mmap_params__set_idx(). Return 'not found' if neither CPU nor TID match.
* In per-thread mode, auxtrace CPU is set to -1, but TID will be set instead. See
* auxtrace_mmap_params__set_idx(). However, the sample AUX event will contain a
* CPU as we set this always for the AUX_OUTPUT_HW_ID event.
* So now compare only TIDs if auxtrace CPU is -1, and CPUs if auxtrace CPU is not -1.
* Return 'not found' if mismatch. */
if ((auxtrace_event->cpu == (__u32) -1 && auxtrace_event->tid != sample->tid) ||
auxtrace_event->cpu != sample->cpu)
if (auxtrace_event->cpu == (__u32) -1) {
if (auxtrace_event->tid != sample->tid)
return 1;
} else if (auxtrace_event->cpu != sample->cpu) return 1; if (aux_event->flags & PERF_AUX_FLAG_OVERWRITE) {
@@ -2827,6 +2967,15 @@ static int cs_etm__queue_aux_fragment(struct perf_session *session, off_t file_o return 1; }
+static int cs_etm__process_aux_hw_id_cb(struct perf_session *session, union perf_event *event,
u64 offset __maybe_unused, void *data __maybe_unused)
+{
/* look to handle PERF_RECORD_AUX_OUTPUT_HW_ID early to ensure decoders can be set up */
if (event->header.type == PERF_RECORD_AUX_OUTPUT_HW_ID)
return cs_etm__process_aux_output_hw_id(session, event);
return 0;
+}
I couldn't see the relationship between the two peeks and why they couldn't be done together in one pass. I changed it so cs_etm__process_aux_hw_id_cb() is also called on the peek to queue the aux records and it seemed to work. At least just opening the file and glancing.
If there is some dependency though, I don't think two passes is excessive.
I initially tried this and there are issues.
During testing I had a --per-thread run with two buffers. One buffer had only a single ID, which appeared as a packet before the buffer was processed. The second had the same ID plus a new ID, which appeared after the first buffer and before the second.
Problem is, under the current system, once data is queued, the decoders are set and it meant a decoder for the second ID was never created, resulting in a bunch of undecoded data. It may well be possible to re-examine how and when decoders are created, but there is currently a built in assumption that all IDs are available before the first buffer is queued, and changing this is well beyond the remit of this patch set.
static int cs_etm__queue_aux_records_cb(struct perf_session *session, union perf_event *event, u64 offset __maybe_unused, void *data __maybe_unused) { @@ -3109,6 +3258,14 @@ int cs_etm__process_auxtrace_info(union perf_event *event, if (err) goto err_delete_thread;
/* scan for AUX_OUTPUT_HW_ID records */
if (hdr_version >= CS_AUX_HW_ID_VERSION_MIN) {
err = perf_session__peek_events(session, session->header.data_offset,
session->header.data_size,
cs_etm__process_aux_hw_id_cb, NULL);
This no longer works at all with piping because of this line in peek_events:
if (perf_data__is_pipe(session->data)) return -1;
Does this not also apply to the: cs_etm__queue_aux_records(session); call immediately after this, which also uses perf_session__peek_events()?
It uses it, but it has a fallback if it isn't available where the buffers aren't split by aux records and are processed whole as they were before the aux split change.
The if statement surrounding it checks if the index was populated first, which only happens in non-piping mode:
if (index && index->nr > 0) return perf_session__peek_events(session, session->header.data_offset,
It seems like it isn't possible to have a fallback for this trace ID change so we can probably drop piping support entirely.
So we should change the warning message to an error and exit earlier:
if (!etm->data_queued) pr_warning("CS ETM warning: Coresight decode and TRBE support requires random file access.\n" "Continuing with best effort decoding in piped mode.\n\n");
And then we can also remove all the now dead code and variables related to piping like:
etm->data_queued = etm->queues.populated; ...
if (!etm->data_queued) { ... }
which means that these were already dead code?
See above
if (err)
goto err_delete_thread;
} err = cs_etm__queue_aux_records(session); if (err) goto err_delete_thread;
Regards
Mike
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK
Use the perf_report_aux_output_id() call to output the CoreSight trace ID and associated CPU as a PERF_RECORD_AUX_OUTPUT_HW_ID record in the perf.data file.
Signed-off-by: Mike Leach mike.leach@linaro.org --- drivers/hwtracing/coresight/coresight-etm-perf.c | 10 ++++++++++ include/linux/coresight-pmu.h | 14 ++++++++++++++ 2 files changed, 24 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index ad3fdc07c60b..531f5d42272b 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -4,6 +4,7 @@ * Author: Mathieu Poirier mathieu.poirier@linaro.org */
+#include <linux/bitfield.h> #include <linux/coresight.h> #include <linux/coresight-pmu.h> #include <linux/cpumask.h> @@ -437,6 +438,7 @@ static void etm_event_start(struct perf_event *event, int flags) struct perf_output_handle *handle = &ctxt->handle; struct coresight_device *sink, *csdev = per_cpu(csdev_src, cpu); struct list_head *path; + u64 hw_id;
if (!csdev) goto fail; @@ -482,6 +484,11 @@ static void etm_event_start(struct perf_event *event, int flags) if (source_ops(csdev)->enable(csdev, event, CS_MODE_PERF)) goto fail_disable_path;
+ /* output cpu / trace ID in perf record */ + hw_id = FIELD_PREP(CS_AUX_HW_ID_VERSION_MASK, CS_AUX_HW_ID_CURR_VERSION) | + FIELD_PREP(CS_AUX_HW_ID_TRACE_ID_MASK, coresight_trace_id_get_cpu_id(cpu)); + perf_report_aux_output_id(event, hw_id); + out: /* Tell the perf core the event is alive */ event->hw.state = 0; @@ -600,6 +607,9 @@ static void etm_event_stop(struct perf_event *event, int mode)
/* Disabling the path make its elements available to other sessions */ coresight_disable_path(path); + + /* release the trace ID we read on event start */ + coresight_trace_id_put_cpu_id(cpu); }
static int etm_event_add(struct perf_event *event, int mode) diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h index 9f7ee380266b..5572d0e10822 100644 --- a/include/linux/coresight-pmu.h +++ b/include/linux/coresight-pmu.h @@ -7,6 +7,8 @@ #ifndef _LINUX_CORESIGHT_PMU_H #define _LINUX_CORESIGHT_PMU_H
+#include <linux/bits.h> + #define CORESIGHT_ETM_PMU_NAME "cs_etm"
/* @@ -38,4 +40,16 @@ #define ETM4_CFG_BIT_RETSTK 12 #define ETM4_CFG_BIT_VMID_OPT 15
+/* + * Interpretation of the PERF_RECORD_AUX_OUTPUT_HW_ID payload. + * Used to associate a CPU with the CoreSight Trace ID. + * [63:16] - unused SBZ + * [15:08] - Trace ID + * [07:00] - Version + */ +#define CS_AUX_HW_ID_VERSION_MASK GENMASK_ULL(7, 0) +#define CS_AUX_HW_ID_TRACE_ID_MASK GENMASK_ULL(15, 8) + +#define CS_AUX_HW_ID_CURR_VERSION 0 + #endif
On 04/07/2022 09:11, Mike Leach wrote:
Use the perf_report_aux_output_id() call to output the CoreSight trace ID and associated CPU as a PERF_RECORD_AUX_OUTPUT_HW_ID record in the perf.data file.
Signed-off-by: Mike Leach mike.leach@linaro.org
drivers/hwtracing/coresight/coresight-etm-perf.c | 10 ++++++++++ include/linux/coresight-pmu.h | 14 ++++++++++++++ 2 files changed, 24 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index ad3fdc07c60b..531f5d42272b 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -4,6 +4,7 @@
- Author: Mathieu Poirier mathieu.poirier@linaro.org
*/ +#include <linux/bitfield.h> #include <linux/coresight.h> #include <linux/coresight-pmu.h> #include <linux/cpumask.h> @@ -437,6 +438,7 @@ static void etm_event_start(struct perf_event *event, int flags) struct perf_output_handle *handle = &ctxt->handle; struct coresight_device *sink, *csdev = per_cpu(csdev_src, cpu); struct list_head *path;
- u64 hw_id;
if (!csdev) goto fail; @@ -482,6 +484,11 @@ static void etm_event_start(struct perf_event *event, int flags) if (source_ops(csdev)->enable(csdev, event, CS_MODE_PERF)) goto fail_disable_path;
- /* output cpu / trace ID in perf record */
- hw_id = FIELD_PREP(CS_AUX_HW_ID_VERSION_MASK, CS_AUX_HW_ID_CURR_VERSION) |
FIELD_PREP(CS_AUX_HW_ID_TRACE_ID_MASK, coresight_trace_id_get_cpu_id(cpu));
- perf_report_aux_output_id(event, hw_id);
- out: /* Tell the perf core the event is alive */ event->hw.state = 0;
@@ -600,6 +607,9 @@ static void etm_event_stop(struct perf_event *event, int mode) /* Disabling the path make its elements available to other sessions */ coresight_disable_path(path);
- /* release the trace ID we read on event start */
- coresight_trace_id_put_cpu_id(cpu); }
static int etm_event_add(struct perf_event *event, int mode) diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h index 9f7ee380266b..5572d0e10822 100644 --- a/include/linux/coresight-pmu.h +++ b/include/linux/coresight-pmu.h @@ -7,6 +7,8 @@ #ifndef _LINUX_CORESIGHT_PMU_H #define _LINUX_CORESIGHT_PMU_H +#include <linux/bits.h>
- #define CORESIGHT_ETM_PMU_NAME "cs_etm"
/* @@ -38,4 +40,16 @@ #define ETM4_CFG_BIT_RETSTK 12 #define ETM4_CFG_BIT_VMID_OPT 15 +/*
- Interpretation of the PERF_RECORD_AUX_OUTPUT_HW_ID payload.
- Used to associate a CPU with the CoreSight Trace ID.
- [63:16] - unused SBZ
- [15:08] - Trace ID
- [07:00] - Version
Could we please re-arrange the fields, such that it is easier to comprehend the TraceID looking at the raw trace dump ? Also to accommodate the future changes.
e.g, [15:00] - Trace ID /* For future expansion, if at all */ [59:16] - RES0 [63:60] - Trace_ID_Version
I think we *might* (not sure yet) end up adding "sinkid" when we have sink specific allocation, so that we can associate the HW_ID of an event to the "AUXTRACE" record (i.e., trace buffer).
So if we need to do that we could:
[15:00] - Trace ID /* For future expansion, if at all */ [47:16] - Trace Pool ID( == 0 if global, == sink_id if sink based) [59:48] - RES0 [63:60] - Trace_ID_Version == 1
Or we could adopt the above straight away.
Thoughts ?
Suzuki
- */
+#define CS_AUX_HW_ID_VERSION_MASK GENMASK_ULL(7, 0) +#define CS_AUX_HW_ID_TRACE_ID_MASK GENMASK_ULL(15, 8)
+#define CS_AUX_HW_ID_CURR_VERSION 0
- #endif
Hi Suzuki,
On Wed, 20 Jul 2022 at 10:30, Suzuki K Poulose suzuki.poulose@arm.com wrote:
On 04/07/2022 09:11, Mike Leach wrote:
Use the perf_report_aux_output_id() call to output the CoreSight trace ID and associated CPU as a PERF_RECORD_AUX_OUTPUT_HW_ID record in the perf.data file.
Signed-off-by: Mike Leach mike.leach@linaro.org
drivers/hwtracing/coresight/coresight-etm-perf.c | 10 ++++++++++ include/linux/coresight-pmu.h | 14 ++++++++++++++ 2 files changed, 24 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index ad3fdc07c60b..531f5d42272b 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -4,6 +4,7 @@
- Author: Mathieu Poirier mathieu.poirier@linaro.org
*/
+#include <linux/bitfield.h> #include <linux/coresight.h> #include <linux/coresight-pmu.h> #include <linux/cpumask.h> @@ -437,6 +438,7 @@ static void etm_event_start(struct perf_event *event, int flags) struct perf_output_handle *handle = &ctxt->handle; struct coresight_device *sink, *csdev = per_cpu(csdev_src, cpu); struct list_head *path;
u64 hw_id; if (!csdev) goto fail;
@@ -482,6 +484,11 @@ static void etm_event_start(struct perf_event *event, int flags) if (source_ops(csdev)->enable(csdev, event, CS_MODE_PERF)) goto fail_disable_path;
/* output cpu / trace ID in perf record */
hw_id = FIELD_PREP(CS_AUX_HW_ID_VERSION_MASK, CS_AUX_HW_ID_CURR_VERSION) |
FIELD_PREP(CS_AUX_HW_ID_TRACE_ID_MASK, coresight_trace_id_get_cpu_id(cpu));
perf_report_aux_output_id(event, hw_id);
- out: /* Tell the perf core the event is alive */ event->hw.state = 0;
@@ -600,6 +607,9 @@ static void etm_event_stop(struct perf_event *event, int mode)
/* Disabling the path make its elements available to other sessions */ coresight_disable_path(path);
/* release the trace ID we read on event start */
coresight_trace_id_put_cpu_id(cpu);
}
static int etm_event_add(struct perf_event *event, int mode)
diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h index 9f7ee380266b..5572d0e10822 100644 --- a/include/linux/coresight-pmu.h +++ b/include/linux/coresight-pmu.h @@ -7,6 +7,8 @@ #ifndef _LINUX_CORESIGHT_PMU_H #define _LINUX_CORESIGHT_PMU_H
+#include <linux/bits.h>
#define CORESIGHT_ETM_PMU_NAME "cs_etm"
/*
@@ -38,4 +40,16 @@ #define ETM4_CFG_BIT_RETSTK 12 #define ETM4_CFG_BIT_VMID_OPT 15
+/*
- Interpretation of the PERF_RECORD_AUX_OUTPUT_HW_ID payload.
- Used to associate a CPU with the CoreSight Trace ID.
- [63:16] - unused SBZ
- [15:08] - Trace ID
- [07:00] - Version
Could we please re-arrange the fields, such that it is easier to comprehend the TraceID looking at the raw trace dump ? Also to accommodate the future changes.
e.g, [15:00] - Trace ID /* For future expansion, if at all */ [59:16] - RES0 [63:60] - Trace_ID_Version
I think we *might* (not sure yet) end up adding "sinkid" when we have sink specific allocation, so that we can associate the HW_ID of an event to the "AUXTRACE" record (i.e., trace buffer).
If we go to per sink trace ID maps, then I can't see how we could avoid needing some sort of ID in here, unless we can determine some other method of specifying which CPUs traced into which trace buffer.
So if we need to do that we could:
[15:00] - Trace ID /* For future expansion, if at all */ [47:16] - Trace Pool ID( == 0 if global, == sink_id if sink based) [59:48] - RES0 [63:60] - Trace_ID_Version == 1
Or we could adopt the above straight away.
I wouldn't want to commit to a size for the sink ID yet. And I would leave trace ID at what it is for now (8 bits). Make the fields represent what is and up-version and update when changes are actually required. I think this packet may be a candidate for delivering other trace related info we may need in future - such as the timestamp source that is being worked on?
Mike
Thoughts ?
Suzuki
- */
+#define CS_AUX_HW_ID_VERSION_MASK GENMASK_ULL(7, 0) +#define CS_AUX_HW_ID_TRACE_ID_MASK GENMASK_ULL(15, 8)
+#define CS_AUX_HW_ID_CURR_VERSION 0
- #endif
Adds in a number of pr_debug macros to allow the debugging and test of the trace ID allocation system.
Signed-off-by: Mike Leach mike.leach@linaro.org --- .../hwtracing/coresight/coresight-trace-id.c | 33 +++++++++++++++++++ 1 file changed, 33 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-trace-id.c b/drivers/hwtracing/coresight/coresight-trace-id.c index dac9c89ae00d..841307e0d899 100644 --- a/drivers/hwtracing/coresight/coresight-trace-id.c +++ b/drivers/hwtracing/coresight/coresight-trace-id.c @@ -71,6 +71,27 @@ static int coresight_trace_id_find_new_id(struct coresight_trace_id_map *id_map) return id; }
+/* #define TRACE_ID_DEBUG 1 */ +#ifdef TRACE_ID_DEBUG +static void coresight_trace_id_dump_table(struct coresight_trace_id_map *id_map, + const char *func_name) +{ + /* currently 2 u64s are sufficient to hold all the ids */ + pr_debug("%s id_map::\n", func_name); + pr_debug("Avial= 0x%016lx%016lx\n", id_map->avail_ids[1], id_map->avail_ids[0]); + pr_debug("Pend = 0x%016lx%016lx\n", id_map->pend_rel_ids[1], id_map->pend_rel_ids[0]); +} +#define DUMP_ID_MAP(map) coresight_trace_id_dump_table(map, __func__) +#define DUMP_ID_CPU(cpu, id) pr_debug("%s called; cpu=%d, id=%d\n", __func__, cpu, id) +#define DUMP_ID(id) pr_debug("%s called; id=%d\n", __func__, id) +#define PERF_SESSION(n) pr_debug("%s perf count %d\n", __func__, n) +#else +#define DUMP_ID_MAP(map) +#define DUMP_ID(id) +#define DUMP_ID_CPU(cpu, id) +#define PERF_SESSION(n) +#endif + /* release all pending IDs for all current maps & clear CPU associations */ static void coresight_trace_id_release_all_pending(void) { @@ -81,6 +102,7 @@ static void coresight_trace_id_release_all_pending(void) clear_bit(bit, id_map->avail_ids); clear_bit(bit, id_map->pend_rel_ids); } + DUMP_ID_MAP(id_map);
for_each_possible_cpu(cpu) { if (per_cpu(cpu_ids, cpu).pend_rel) { @@ -126,6 +148,8 @@ static int coresight_trace_id_map_get_cpu_id(int cpu, struct coresight_trace_id_
get_cpu_id_out: spin_unlock_irqrestore(&id_map_lock, flags); + DUMP_ID_CPU(cpu, id); + DUMP_ID_MAP(id_map); return id; }
@@ -151,6 +175,8 @@ static void coresight_trace_id_map_put_cpu_id(int cpu, struct coresight_trace_id
put_cpu_id_out: spin_unlock_irqrestore(&id_map_lock, flags); + DUMP_ID_CPU(cpu, id); + DUMP_ID_MAP(id_map); }
static int coresight_trace_id_map_get_system_id(struct coresight_trace_id_map *id_map) @@ -164,6 +190,8 @@ static int coresight_trace_id_map_get_system_id(struct coresight_trace_id_map *i coresight_trace_id_set_inuse(id, id_map); spin_unlock_irqrestore(&id_map_lock, flags);
+ DUMP_ID(id); + DUMP_ID_MAP(id_map); return id; }
@@ -174,6 +202,9 @@ static void coresight_trace_id_map_put_system_id(struct coresight_trace_id_map * spin_lock_irqsave(&id_map_lock, flags); coresight_trace_id_clear_inuse(id, id_map); spin_unlock_irqrestore(&id_map_lock, flags); + + DUMP_ID(id); + DUMP_ID_MAP(id_map); }
/* API functions */ @@ -207,6 +238,7 @@ void coresight_trace_id_perf_start(void)
spin_lock_irqsave(&id_map_lock, flags); perf_cs_etm_session_active++; + PERF_SESSION(perf_cs_etm_session_active); spin_unlock_irqrestore(&id_map_lock, flags); } EXPORT_SYMBOL_GPL(coresight_trace_id_perf_start); @@ -217,6 +249,7 @@ void coresight_trace_id_perf_stop(void)
spin_lock_irqsave(&id_map_lock, flags); perf_cs_etm_session_active--; + PERF_SESSION(perf_cs_etm_session_active); if (!perf_cs_etm_session_active) coresight_trace_id_release_all_pending(); spin_unlock_irqrestore(&id_map_lock, flags);
On 04/07/2022 09:11, Mike Leach wrote:
Adds in a number of pr_debug macros to allow the debugging and test of the trace ID allocation system.
Signed-off-by: Mike Leach mike.leach@linaro.org
.../hwtracing/coresight/coresight-trace-id.c | 33 +++++++++++++++++++ 1 file changed, 33 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-trace-id.c b/drivers/hwtracing/coresight/coresight-trace-id.c index dac9c89ae00d..841307e0d899 100644 --- a/drivers/hwtracing/coresight/coresight-trace-id.c +++ b/drivers/hwtracing/coresight/coresight-trace-id.c @@ -71,6 +71,27 @@ static int coresight_trace_id_find_new_id(struct coresight_trace_id_map *id_map) return id; } +/* #define TRACE_ID_DEBUG 1 */ +#ifdef TRACE_ID_DEBUG +static void coresight_trace_id_dump_table(struct coresight_trace_id_map *id_map,
const char *func_name)
+{
- /* currently 2 u64s are sufficient to hold all the ids */
- pr_debug("%s id_map::\n", func_name);
- pr_debug("Avial= 0x%016lx%016lx\n", id_map->avail_ids[1], id_map->avail_ids[0]);
- pr_debug("Pend = 0x%016lx%016lx\n", id_map->pend_rel_ids[1], id_map->pend_rel_ids[0]);
minor nit: You may use bitmap_print_to_pagebuf() to print the bitmaps.
+} +#define DUMP_ID_MAP(map) coresight_trace_id_dump_table(map, __func__) +#define DUMP_ID_CPU(cpu, id) pr_debug("%s called; cpu=%d, id=%d\n", __func__, cpu, id) +#define DUMP_ID(id) pr_debug("%s called; id=%d\n", __func__, id) +#define PERF_SESSION(n) pr_debug("%s perf count %d\n", __func__, n) +#else +#define DUMP_ID_MAP(map) +#define DUMP_ID(id) +#define DUMP_ID_CPU(cpu, id) +#define PERF_SESSION(n) +#endif
- /* release all pending IDs for all current maps & clear CPU associations */ static void coresight_trace_id_release_all_pending(void) {
@@ -81,6 +102,7 @@ static void coresight_trace_id_release_all_pending(void) clear_bit(bit, id_map->avail_ids); clear_bit(bit, id_map->pend_rel_ids); }
- DUMP_ID_MAP(id_map);
for_each_possible_cpu(cpu) { if (per_cpu(cpu_ids, cpu).pend_rel) { @@ -126,6 +148,8 @@ static int coresight_trace_id_map_get_cpu_id(int cpu, struct coresight_trace_id_ get_cpu_id_out: spin_unlock_irqrestore(&id_map_lock, flags);
- DUMP_ID_CPU(cpu, id);
- DUMP_ID_MAP(id_map); return id; }
@@ -151,6 +175,8 @@ static void coresight_trace_id_map_put_cpu_id(int cpu, struct coresight_trace_id put_cpu_id_out: spin_unlock_irqrestore(&id_map_lock, flags);
- DUMP_ID_CPU(cpu, id);
- DUMP_ID_MAP(id_map); }
static int coresight_trace_id_map_get_system_id(struct coresight_trace_id_map *id_map) @@ -164,6 +190,8 @@ static int coresight_trace_id_map_get_system_id(struct coresight_trace_id_map *i coresight_trace_id_set_inuse(id, id_map); spin_unlock_irqrestore(&id_map_lock, flags);
- DUMP_ID(id);
- DUMP_ID_MAP(id_map); return id; }
@@ -174,6 +202,9 @@ static void coresight_trace_id_map_put_system_id(struct coresight_trace_id_map * spin_lock_irqsave(&id_map_lock, flags); coresight_trace_id_clear_inuse(id, id_map); spin_unlock_irqrestore(&id_map_lock, flags);
- DUMP_ID(id);
- DUMP_ID_MAP(id_map); }
/* API functions */ @@ -207,6 +238,7 @@ void coresight_trace_id_perf_start(void)
int n;
spin_lock_irqsave(&id_map_lock, flags); perf_cs_etm_session_active++;
n = perf_cs_etm_session_active++;
spin_unlock_irqrestore(&id_map_lock, flags);
PERF_SESSION(n);
Not a good idea to print something from within spin_lock.
} EXPORT_SYMBOL_GPL(coresight_trace_id_perf_start); @@ -217,6 +249,7 @@ void coresight_trace_id_perf_stop(void) spin_lock_irqsave(&id_map_lock, flags); perf_cs_etm_session_active--;
- PERF_SESSION(perf_cs_etm_session_active);
Same as above.
if (!perf_cs_etm_session_active) coresight_trace_id_release_all_pending(); spin_unlock_irqrestore(&id_map_lock, flags);
Suzuki
On 04/07/2022 09:11, Mike Leach wrote:
The current method for allocating trace source ID values to sources is to use a fixed algorithm for CPU based sources of (cpu_num * 2 + 0x10). The STM is allocated ID 0x1.
This fixed algorithm is used in both the CoreSight driver code, and by perf when writing the trace metadata in the AUXTRACE_INFO record.
The method needs replacing as currently:-
- It is inefficient in using available IDs.
- Does not scale to larger systems with many cores and the algorithm
has no limits so will generate invalid trace IDs for cpu number > 44.
Additionally requirements to allocate additional system IDs on some systems have been seen.
This patch set introduces an API that allows the allocation of trace IDs in a dynamic manner.
I've tested this with various commands like with per-thread mode, attaching, running the tests and also Carsten's new tests. Apart from the possible backwards compatibility issue and the minor code comments it looks good to me.
Architecturally reserved IDs are never allocated, and the system is limited to allocating only valid IDs.
Each of the current trace sources ETM3.x, ETM4.x and STM is updated to use the new API.
For the ETMx.x devices IDs are allocated on certain events a) When using sysfs, an ID will be allocated on hardware enable, or a read of sysfs TRCTRACEID register and freed when the sysfs reset is written.
b) When using perf, ID is allocated on hardware enable, and freed on hardware disable. IDs are communicated using the AUX_OUTPUT_HW_ID packet. The ID allocator is notified when perf sessions start and stop so CPU based IDs are kept constant throughout any perf session.
Note: This patchset breaks backward compatibility for perf record and perf report.
Because the method for generating the AUXTRACE_INFO meta data has changed, using an older perf record will result in metadata that does not match the trace IDs used in the recorded trace data. This mismatch will cause subsequent decode to fail.
The version of the AUXTRACE_INFO has been updated to reflect the fact that the trace source IDs are no longer present in the metadata. This will mean older versions of perf report cannot decode the file.
Applies to coresight/next [c06475910b52] Tested on DB410c
Changes since v1: (after feedback & discussion with Mathieu & Suzuki).
- API has changed. The global trace ID map is managed internally, so it
is no longer passed in to the API functions.
- perf record does not use sysfs to find the trace IDs. These are now
output as AUX_OUTPUT_HW_ID events. The drivers, perf record, and perf report have been updated accordingly to generate and handle these events.
Mike Leach (13): coresight: trace-id: Add API to dynamically assign Trace ID values coresight: trace-id: update CoreSight core to use Trace ID API coresight: stm: Update STM driver to use Trace ID API coresight: etm4x: Update ETM4 driver to use Trace ID API coresight: etm3x: Update ETM3 driver to use Trace ID API coresight: etmX.X: stm: Remove unused legacy source Trace ID ops coresight: perf: traceid: Add perf notifiers for Trace ID perf: cs-etm: Move mapping of Trace ID and cpu into helper function perf: cs-etm: Update record event to use new Trace ID protocol kernel: events: Export perf_report_aux_output_id() perf: cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packet coresight: events: PERF_RECORD_AUX_OUTPUT_HW_ID used for Trace ID coresight: trace-id: Add debug & test macros to Trace ID allocation
drivers/hwtracing/coresight/Makefile | 2 +- drivers/hwtracing/coresight/coresight-core.c | 49 +--- .../hwtracing/coresight/coresight-etm-perf.c | 17 ++ drivers/hwtracing/coresight/coresight-etm.h | 3 +- .../coresight/coresight-etm3x-core.c | 85 +++--- .../coresight/coresight-etm3x-sysfs.c | 28 +- .../coresight/coresight-etm4x-core.c | 65 ++++- .../coresight/coresight-etm4x-sysfs.c | 32 ++- drivers/hwtracing/coresight/coresight-etm4x.h | 3 + drivers/hwtracing/coresight/coresight-stm.c | 49 +--- .../hwtracing/coresight/coresight-trace-id.c | 263 ++++++++++++++++++ .../hwtracing/coresight/coresight-trace-id.h | 65 +++++ include/linux/coresight-pmu.h | 31 ++- include/linux/coresight.h | 3 - kernel/events/core.c | 1 + tools/include/linux/coresight-pmu.h | 31 ++- tools/perf/arch/arm/util/cs-etm.c | 21 +- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 9 + tools/perf/util/cs-etm.c | 220 +++++++++++++-- tools/perf/util/cs-etm.h | 14 +- 20 files changed, 784 insertions(+), 207 deletions(-) create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
Hi James,
Thanks for looking at this.
On Thu, 21 Jul 2022 at 11:27, James Clark james.clark@arm.com wrote:
On 04/07/2022 09:11, Mike Leach wrote:
The current method for allocating trace source ID values to sources is to use a fixed algorithm for CPU based sources of (cpu_num * 2 + 0x10). The STM is allocated ID 0x1.
This fixed algorithm is used in both the CoreSight driver code, and by perf when writing the trace metadata in the AUXTRACE_INFO record.
The method needs replacing as currently:-
- It is inefficient in using available IDs.
- Does not scale to larger systems with many cores and the algorithm
has no limits so will generate invalid trace IDs for cpu number > 44.
Additionally requirements to allocate additional system IDs on some systems have been seen.
This patch set introduces an API that allows the allocation of trace IDs in a dynamic manner.
I've tested this with various commands like with per-thread mode, attaching, running the tests and also Carsten's new tests. Apart from the possible backwards compatibility issue and the minor code comments it looks good to me.
I've looked at the backwards compatibility issue. At present with the current set (K = kernel drivers, P-rec = perf record, P-rep = perf report) ::
K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK) K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (fail) K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (fail) K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
So, with a P-rec generating v2 metadata, P rep will cleanly error out. Where the Kernel ID version and the perf report ID version differ, even the P rep v2 will fail, due to the IDs being different in the file and actual drivers. These fails will simply look like no data present.
There are two possible fixes that improve this:- A) if the v2 kernel uses a sysfs flag to indicate new ID usage, then if this is missing the new perf record can degrade to using the old algorithm to put IDs directly into metadata as it assumes it is running on a v1 kernel. This fixes things then for the P-rep v2 that can look for this & we know there will be no incoming ID packets. B) P-rep v2 can look for new packets irrespective of incoming metadata version, and if it sees them, override them
Compatibility matrix then looks like:: K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK) K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK) K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (OK) K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
There is no solution to using an old version of perf record on a new kernel and getting the old version of perf report to correctly decode the file.
Thoughts?
Mike
Architecturally reserved IDs are never allocated, and the system is limited to allocating only valid IDs.
Each of the current trace sources ETM3.x, ETM4.x and STM is updated to use the new API.
For the ETMx.x devices IDs are allocated on certain events a) When using sysfs, an ID will be allocated on hardware enable, or a read of sysfs TRCTRACEID register and freed when the sysfs reset is written.
b) When using perf, ID is allocated on hardware enable, and freed on hardware disable. IDs are communicated using the AUX_OUTPUT_HW_ID packet. The ID allocator is notified when perf sessions start and stop so CPU based IDs are kept constant throughout any perf session.
Note: This patchset breaks backward compatibility for perf record and perf report.
Because the method for generating the AUXTRACE_INFO meta data has changed, using an older perf record will result in metadata that does not match the trace IDs used in the recorded trace data. This mismatch will cause subsequent decode to fail.
The version of the AUXTRACE_INFO has been updated to reflect the fact that the trace source IDs are no longer present in the metadata. This will mean older versions of perf report cannot decode the file.
Applies to coresight/next [c06475910b52] Tested on DB410c
Changes since v1: (after feedback & discussion with Mathieu & Suzuki).
- API has changed. The global trace ID map is managed internally, so it
is no longer passed in to the API functions.
- perf record does not use sysfs to find the trace IDs. These are now
output as AUX_OUTPUT_HW_ID events. The drivers, perf record, and perf report have been updated accordingly to generate and handle these events.
Mike Leach (13): coresight: trace-id: Add API to dynamically assign Trace ID values coresight: trace-id: update CoreSight core to use Trace ID API coresight: stm: Update STM driver to use Trace ID API coresight: etm4x: Update ETM4 driver to use Trace ID API coresight: etm3x: Update ETM3 driver to use Trace ID API coresight: etmX.X: stm: Remove unused legacy source Trace ID ops coresight: perf: traceid: Add perf notifiers for Trace ID perf: cs-etm: Move mapping of Trace ID and cpu into helper function perf: cs-etm: Update record event to use new Trace ID protocol kernel: events: Export perf_report_aux_output_id() perf: cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packet coresight: events: PERF_RECORD_AUX_OUTPUT_HW_ID used for Trace ID coresight: trace-id: Add debug & test macros to Trace ID allocation
drivers/hwtracing/coresight/Makefile | 2 +- drivers/hwtracing/coresight/coresight-core.c | 49 +--- .../hwtracing/coresight/coresight-etm-perf.c | 17 ++ drivers/hwtracing/coresight/coresight-etm.h | 3 +- .../coresight/coresight-etm3x-core.c | 85 +++--- .../coresight/coresight-etm3x-sysfs.c | 28 +- .../coresight/coresight-etm4x-core.c | 65 ++++- .../coresight/coresight-etm4x-sysfs.c | 32 ++- drivers/hwtracing/coresight/coresight-etm4x.h | 3 + drivers/hwtracing/coresight/coresight-stm.c | 49 +--- .../hwtracing/coresight/coresight-trace-id.c | 263 ++++++++++++++++++ .../hwtracing/coresight/coresight-trace-id.h | 65 +++++ include/linux/coresight-pmu.h | 31 ++- include/linux/coresight.h | 3 - kernel/events/core.c | 1 + tools/include/linux/coresight-pmu.h | 31 ++- tools/perf/arch/arm/util/cs-etm.c | 21 +- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 9 + tools/perf/util/cs-etm.c | 220 +++++++++++++-- tools/perf/util/cs-etm.h | 14 +- 20 files changed, 784 insertions(+), 207 deletions(-) create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
On 21/07/2022 14:54, Mike Leach wrote:
Hi James,
Thanks for looking at this.
On Thu, 21 Jul 2022 at 11:27, James Clark james.clark@arm.com wrote:
On 04/07/2022 09:11, Mike Leach wrote:
The current method for allocating trace source ID values to sources is to use a fixed algorithm for CPU based sources of (cpu_num * 2 + 0x10). The STM is allocated ID 0x1.
This fixed algorithm is used in both the CoreSight driver code, and by perf when writing the trace metadata in the AUXTRACE_INFO record.
The method needs replacing as currently:-
- It is inefficient in using available IDs.
- Does not scale to larger systems with many cores and the algorithm
has no limits so will generate invalid trace IDs for cpu number > 44.
Additionally requirements to allocate additional system IDs on some systems have been seen.
This patch set introduces an API that allows the allocation of trace IDs in a dynamic manner.
I've tested this with various commands like with per-thread mode, attaching, running the tests and also Carsten's new tests. Apart from the possible backwards compatibility issue and the minor code comments it looks good to me.
I've looked at the backwards compatibility issue. At present with the current set (K = kernel drivers, P-rec = perf record, P-rep = perf report) ::
K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK) K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (fail) K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (fail) K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
So, with a P-rec generating v2 metadata, P rep will cleanly error out. Where the Kernel ID version and the perf report ID version differ, even the P rep v2 will fail, due to the IDs being different in the file and actual drivers. These fails will simply look like no data present.
There are two possible fixes that improve this:- A) if the v2 kernel uses a sysfs flag to indicate new ID usage, then if this is missing the new perf record can degrade to using the old algorithm to put IDs directly into metadata as it assumes it is running on a v1 kernel. This fixes things then for the P-rep v2 that can look for this & we know there will be no incoming ID packets. B) P-rep v2 can look for new packets irrespective of incoming metadata version, and if it sees them, override them
Compatibility matrix then looks like:: K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK) K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK) K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (OK) K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
There is no solution to using an old version of perf record on a new kernel and getting the old version of perf report to correctly decode the file.
We had a discussion about this last point on the Friday AutoFDO call. Do you think it's possible to keep the old static ID allocations if num_possible_cpus() < Max Trace ID? This is especially important for simple perf because Android doesn't even have the more than 128 CPUs issue, so technically shouldn't have to have any changes made to it.
Making the dynamic traceID allocation use the same IDs as before whenever possible should allow both old Perf and simpleperf to open the file as before and ignore the AUX_OUTPUT_HW_ID packets.
James
Thoughts?
Mike
Architecturally reserved IDs are never allocated, and the system is limited to allocating only valid IDs.
Each of the current trace sources ETM3.x, ETM4.x and STM is updated to use the new API.
For the ETMx.x devices IDs are allocated on certain events a) When using sysfs, an ID will be allocated on hardware enable, or a read of sysfs TRCTRACEID register and freed when the sysfs reset is written.
b) When using perf, ID is allocated on hardware enable, and freed on hardware disable. IDs are communicated using the AUX_OUTPUT_HW_ID packet. The ID allocator is notified when perf sessions start and stop so CPU based IDs are kept constant throughout any perf session.
Note: This patchset breaks backward compatibility for perf record and perf report.
Because the method for generating the AUXTRACE_INFO meta data has changed, using an older perf record will result in metadata that does not match the trace IDs used in the recorded trace data. This mismatch will cause subsequent decode to fail.
The version of the AUXTRACE_INFO has been updated to reflect the fact that the trace source IDs are no longer present in the metadata. This will mean older versions of perf report cannot decode the file.
Applies to coresight/next [c06475910b52] Tested on DB410c
Changes since v1: (after feedback & discussion with Mathieu & Suzuki).
- API has changed. The global trace ID map is managed internally, so it
is no longer passed in to the API functions.
- perf record does not use sysfs to find the trace IDs. These are now
output as AUX_OUTPUT_HW_ID events. The drivers, perf record, and perf report have been updated accordingly to generate and handle these events.
Mike Leach (13): coresight: trace-id: Add API to dynamically assign Trace ID values coresight: trace-id: update CoreSight core to use Trace ID API coresight: stm: Update STM driver to use Trace ID API coresight: etm4x: Update ETM4 driver to use Trace ID API coresight: etm3x: Update ETM3 driver to use Trace ID API coresight: etmX.X: stm: Remove unused legacy source Trace ID ops coresight: perf: traceid: Add perf notifiers for Trace ID perf: cs-etm: Move mapping of Trace ID and cpu into helper function perf: cs-etm: Update record event to use new Trace ID protocol kernel: events: Export perf_report_aux_output_id() perf: cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packet coresight: events: PERF_RECORD_AUX_OUTPUT_HW_ID used for Trace ID coresight: trace-id: Add debug & test macros to Trace ID allocation
drivers/hwtracing/coresight/Makefile | 2 +- drivers/hwtracing/coresight/coresight-core.c | 49 +--- .../hwtracing/coresight/coresight-etm-perf.c | 17 ++ drivers/hwtracing/coresight/coresight-etm.h | 3 +- .../coresight/coresight-etm3x-core.c | 85 +++--- .../coresight/coresight-etm3x-sysfs.c | 28 +- .../coresight/coresight-etm4x-core.c | 65 ++++- .../coresight/coresight-etm4x-sysfs.c | 32 ++- drivers/hwtracing/coresight/coresight-etm4x.h | 3 + drivers/hwtracing/coresight/coresight-stm.c | 49 +--- .../hwtracing/coresight/coresight-trace-id.c | 263 ++++++++++++++++++ .../hwtracing/coresight/coresight-trace-id.h | 65 +++++ include/linux/coresight-pmu.h | 31 ++- include/linux/coresight.h | 3 - kernel/events/core.c | 1 + tools/include/linux/coresight-pmu.h | 31 ++- tools/perf/arch/arm/util/cs-etm.c | 21 +- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 9 + tools/perf/util/cs-etm.c | 220 +++++++++++++-- tools/perf/util/cs-etm.h | 14 +- 20 files changed, 784 insertions(+), 207 deletions(-) create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
Hi James
On Fri, 22 Jul 2022 at 13:10, James Clark james.clark@arm.com wrote:
On 21/07/2022 14:54, Mike Leach wrote:
Hi James,
Thanks for looking at this.
On Thu, 21 Jul 2022 at 11:27, James Clark james.clark@arm.com wrote:
On 04/07/2022 09:11, Mike Leach wrote:
The current method for allocating trace source ID values to sources is to use a fixed algorithm for CPU based sources of (cpu_num * 2 + 0x10). The STM is allocated ID 0x1.
This fixed algorithm is used in both the CoreSight driver code, and by perf when writing the trace metadata in the AUXTRACE_INFO record.
The method needs replacing as currently:-
- It is inefficient in using available IDs.
- Does not scale to larger systems with many cores and the algorithm
has no limits so will generate invalid trace IDs for cpu number > 44.
Additionally requirements to allocate additional system IDs on some systems have been seen.
This patch set introduces an API that allows the allocation of trace IDs in a dynamic manner.
I've tested this with various commands like with per-thread mode, attaching, running the tests and also Carsten's new tests. Apart from the possible backwards compatibility issue and the minor code comments it looks good to me.
I've looked at the backwards compatibility issue. At present with the current set (K = kernel drivers, P-rec = perf record, P-rep = perf report) ::
K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK) K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (fail) K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (fail) K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
So, with a P-rec generating v2 metadata, P rep will cleanly error out. Where the Kernel ID version and the perf report ID version differ, even the P rep v2 will fail, due to the IDs being different in the file and actual drivers. These fails will simply look like no data present.
There are two possible fixes that improve this:- A) if the v2 kernel uses a sysfs flag to indicate new ID usage, then if this is missing the new perf record can degrade to using the old algorithm to put IDs directly into metadata as it assumes it is running on a v1 kernel. This fixes things then for the P-rep v2 that can look for this & we know there will be no incoming ID packets. B) P-rep v2 can look for new packets irrespective of incoming metadata version, and if it sees them, override them
Compatibility matrix then looks like:: K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK) K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK) K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (OK) K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
There is no solution to using an old version of perf record on a new kernel and getting the old version of perf report to correctly decode the file.
We had a discussion about this last point on the Friday AutoFDO call.
Sorry I missed that - I was on holiday.
Do you think it's possible to keep the old static ID allocations if num_possible_cpus() < Max Trace ID? This is especially important for simple perf because Android doesn't even have the more than 128 CPUs issue, so technically shouldn't have to have any changes made to it.
If android never runs high core count hardware, then that could work. The actual CPU limit is in fact 47, after which point the static algorithm fails.
The question arises what do the kernel drivers do then?
The old perf -record will not realise things are about to go wrong, and continue to blindly allocate incorrect trace IDs. Realistically the new drivers will then switch to use the previously unused IDs, whereby they will mismatch with the blindly allocated perf IDs and the old perf decode process will silently fail to decode any data with IDs that do not match.
If we also removed the metadata version update that goes alongside the ID changes, then old perf-reports would continue to try to decode newer files - again silently failing once the static algorithm has failed. Legacy ID allocation support must be added as a kernel CONFIG option - so that it is up front an obvious to users what is being selected. And we can output appropriate error messages.
This would be a temporary solution at best as there are upcoming issues that will need attention:- 1 ) We need to deal with the fact that customers are adding new CS compatible hardware to their systems, some of which they have hardcoded trace IDs. These hardware allocations will become reservations in the dynamic allocator, with no guarantee they will not clash with the static algorithm.
2) There may be a point in the future where we need to use per Sink ID allocation.
3) We have an outstanding perf issue with ETE + TRBE which never use trace IDs - at present decode works here because all the ETE capabilities on the current systems are identical. Once that changes, perf will need updating to look at the trace metadata on a CPU number basis, not on a trace ID basis.
4) Future architecture updates will render newer trace un-decodable by old perf versions.
The question here is why would Android build an up to date kernel version with the updated CoreSight drivers, but insist on using an outdated perf / simpleperf version?
Regards
Mike
Making the dynamic traceID allocation use the same IDs as before whenever possible should allow both old Perf and simpleperf to open the file as before and ignore the AUX_OUTPUT_HW_ID packets.
James
Thoughts?
Mike
Architecturally reserved IDs are never allocated, and the system is limited to allocating only valid IDs.
Each of the current trace sources ETM3.x, ETM4.x and STM is updated to use the new API.
For the ETMx.x devices IDs are allocated on certain events a) When using sysfs, an ID will be allocated on hardware enable, or a read of sysfs TRCTRACEID register and freed when the sysfs reset is written.
b) When using perf, ID is allocated on hardware enable, and freed on hardware disable. IDs are communicated using the AUX_OUTPUT_HW_ID packet. The ID allocator is notified when perf sessions start and stop so CPU based IDs are kept constant throughout any perf session.
Note: This patchset breaks backward compatibility for perf record and perf report.
Because the method for generating the AUXTRACE_INFO meta data has changed, using an older perf record will result in metadata that does not match the trace IDs used in the recorded trace data. This mismatch will cause subsequent decode to fail.
The version of the AUXTRACE_INFO has been updated to reflect the fact that the trace source IDs are no longer present in the metadata. This will mean older versions of perf report cannot decode the file.
Applies to coresight/next [c06475910b52] Tested on DB410c
Changes since v1: (after feedback & discussion with Mathieu & Suzuki).
- API has changed. The global trace ID map is managed internally, so it
is no longer passed in to the API functions.
- perf record does not use sysfs to find the trace IDs. These are now
output as AUX_OUTPUT_HW_ID events. The drivers, perf record, and perf report have been updated accordingly to generate and handle these events.
Mike Leach (13): coresight: trace-id: Add API to dynamically assign Trace ID values coresight: trace-id: update CoreSight core to use Trace ID API coresight: stm: Update STM driver to use Trace ID API coresight: etm4x: Update ETM4 driver to use Trace ID API coresight: etm3x: Update ETM3 driver to use Trace ID API coresight: etmX.X: stm: Remove unused legacy source Trace ID ops coresight: perf: traceid: Add perf notifiers for Trace ID perf: cs-etm: Move mapping of Trace ID and cpu into helper function perf: cs-etm: Update record event to use new Trace ID protocol kernel: events: Export perf_report_aux_output_id() perf: cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packet coresight: events: PERF_RECORD_AUX_OUTPUT_HW_ID used for Trace ID coresight: trace-id: Add debug & test macros to Trace ID allocation
drivers/hwtracing/coresight/Makefile | 2 +- drivers/hwtracing/coresight/coresight-core.c | 49 +--- .../hwtracing/coresight/coresight-etm-perf.c | 17 ++ drivers/hwtracing/coresight/coresight-etm.h | 3 +- .../coresight/coresight-etm3x-core.c | 85 +++--- .../coresight/coresight-etm3x-sysfs.c | 28 +- .../coresight/coresight-etm4x-core.c | 65 ++++- .../coresight/coresight-etm4x-sysfs.c | 32 ++- drivers/hwtracing/coresight/coresight-etm4x.h | 3 + drivers/hwtracing/coresight/coresight-stm.c | 49 +--- .../hwtracing/coresight/coresight-trace-id.c | 263 ++++++++++++++++++ .../hwtracing/coresight/coresight-trace-id.h | 65 +++++ include/linux/coresight-pmu.h | 31 ++- include/linux/coresight.h | 3 - kernel/events/core.c | 1 + tools/include/linux/coresight-pmu.h | 31 ++- tools/perf/arch/arm/util/cs-etm.c | 21 +- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 9 + tools/perf/util/cs-etm.c | 220 +++++++++++++-- tools/perf/util/cs-etm.h | 14 +- 20 files changed, 784 insertions(+), 207 deletions(-) create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
On 25/07/2022 09:19, Mike Leach wrote:
Hi James
On Fri, 22 Jul 2022 at 13:10, James Clark james.clark@arm.com wrote:
On 21/07/2022 14:54, Mike Leach wrote:
Hi James,
Thanks for looking at this.
On Thu, 21 Jul 2022 at 11:27, James Clark james.clark@arm.com wrote:
On 04/07/2022 09:11, Mike Leach wrote:
The current method for allocating trace source ID values to sources is to use a fixed algorithm for CPU based sources of (cpu_num * 2 + 0x10). The STM is allocated ID 0x1.
This fixed algorithm is used in both the CoreSight driver code, and by perf when writing the trace metadata in the AUXTRACE_INFO record.
The method needs replacing as currently:-
- It is inefficient in using available IDs.
- Does not scale to larger systems with many cores and the algorithm
has no limits so will generate invalid trace IDs for cpu number > 44.
Additionally requirements to allocate additional system IDs on some systems have been seen.
This patch set introduces an API that allows the allocation of trace IDs in a dynamic manner.
I've tested this with various commands like with per-thread mode, attaching, running the tests and also Carsten's new tests. Apart from the possible backwards compatibility issue and the minor code comments it looks good to me.
I've looked at the backwards compatibility issue. At present with the current set (K = kernel drivers, P-rec = perf record, P-rep = perf report) ::
K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK) K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (fail) K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (fail) K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
So, with a P-rec generating v2 metadata, P rep will cleanly error out. Where the Kernel ID version and the perf report ID version differ, even the P rep v2 will fail, due to the IDs being different in the file and actual drivers. These fails will simply look like no data present.
There are two possible fixes that improve this:- A) if the v2 kernel uses a sysfs flag to indicate new ID usage, then if this is missing the new perf record can degrade to using the old algorithm to put IDs directly into metadata as it assumes it is running on a v1 kernel. This fixes things then for the P-rep v2 that can look for this & we know there will be no incoming ID packets. B) P-rep v2 can look for new packets irrespective of incoming metadata version, and if it sees them, override them
Compatibility matrix then looks like:: K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK) K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK) K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (OK) K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
There is no solution to using an old version of perf record on a new kernel and getting the old version of perf report to correctly decode the file.
We had a discussion about this last point on the Friday AutoFDO call.
Sorry I missed that - I was on holiday.
I think I didn't realise it at the time but I was thinking of two separate requirements relating to this, rather than one. So I will list them here first to avoid confusion:
1. New perfs fall back to the legacy ID mappings if they don't see any HW_IDs.
This is to support the AutoFDO workflow when using new (fixed) perfs on old kernels. This only affects the Perf side changes in this patch, not any of the kernel changes.
2. Wherever possible (absent of any reserved ID clashes or CPUs > 47), the new driver continues to use the old static ID allocation.
This is to support not making any changes to simpleperf (or any other tools if they exist) until there is an actual need to. As you say, this is only a temporary measure. This requirement can also be dropped if we make the simpleperf changes at the same time as these driver updates. But it would buy some time. But we can't fix any tools that we don't know about.
There is no requirement to support old perfs on new kernels as far as I can see.
Do you think it's possible to keep the old static ID allocations if num_possible_cpus() < Max Trace ID? This is especially important for simple perf because Android doesn't even have the more than 128 CPUs issue, so technically shouldn't have to have any changes made to it.
If android never runs high core count hardware, then that could work. The actual CPU limit is in fact 47, after which point the static algorithm fails.
The question arises what do the kernel drivers do then?
The old perf -record will not realise things are about to go wrong, and continue to blindly allocate incorrect trace IDs. Realistically the new drivers will then switch to use the previously unused IDs, whereby they will mismatch with the blindly allocated perf IDs and the old perf decode process will silently fail to decode any data with IDs that do not match.
Do you mean this situation occurs if there are more than 47 cores? I think it's fine for things to go wrong in this case because it's already broken. Regardless of whether the perf and kernel versions match or don't match.
The user would have to upgrade both parts in that case no matter what we do.
If we also removed the metadata version update that goes alongside the ID changes, then old perf-reports would continue to try to decode newer files - again silently failing once the static algorithm has failed.
That's true, but I don't think we need to drop the metaversion update. There's no requirement for an old perf-report to open new files, so we can still make this change.
Legacy ID allocation support must be added as a kernel CONFIG option - so that it is up front an obvious to users what is being selected. And we can output appropriate error messages.
This would be a temporary solution at best as there are upcoming issues that will need attention:- 1 ) We need to deal with the fact that customers are adding new CS compatible hardware to their systems, some of which they have hardcoded trace IDs. These hardware allocations will become reservations in the dynamic allocator, with no guarantee they will not clash with the static algorithm.
Maybe instead of the temporary solution we can just make the change to simpleperf at the same time. The only reason to do this would be to buy some time or make the transition period smoother.
But does it need to be a CONFIG option if it only happens when CPUs < 47 or if there is a clash? We can still output the AUX_OUTPUT_HW_ID, but use the old ID allocation scheme. So it would appear to be the new scheme for anyone looking for HW_IDs, but is also compatible with old simpleperf until there is a clash.
- There may be a point in the future where we need to use per Sink ID
allocation.
- We have an outstanding perf issue with ETE + TRBE which never use
trace IDs - at present decode works here because all the ETE capabilities on the current systems are identical. Once that changes, perf will need updating to look at the trace metadata on a CPU number basis, not on a trace ID basis.
That's true, I have this one on my list but didn't get to it yet.
- Future architecture updates will render newer trace un-decodable by
old perf versions.
The question here is why would Android build an up to date kernel version with the updated CoreSight drivers, but insist on using an outdated perf / simpleperf version?
I suppose I was thinking that it might be convenient to not have to make any changes to simpleperf because it will always run on low core counts. But with the other issues about clashes, it looks like changing it is unavoidable.
But for the opposite (old kernel, new perf), supporting that should be pretty easy and the reason for using that combo is to get a perf with decode fixes and run it somewhere that the kernel can't be easily updated.
James
Regards
Mike
Making the dynamic traceID allocation use the same IDs as before whenever possible should allow both old Perf and simpleperf to open the file as before and ignore the AUX_OUTPUT_HW_ID packets.
James
Thoughts?
Mike
Architecturally reserved IDs are never allocated, and the system is limited to allocating only valid IDs.
Each of the current trace sources ETM3.x, ETM4.x and STM is updated to use the new API.
For the ETMx.x devices IDs are allocated on certain events a) When using sysfs, an ID will be allocated on hardware enable, or a read of sysfs TRCTRACEID register and freed when the sysfs reset is written.
b) When using perf, ID is allocated on hardware enable, and freed on hardware disable. IDs are communicated using the AUX_OUTPUT_HW_ID packet. The ID allocator is notified when perf sessions start and stop so CPU based IDs are kept constant throughout any perf session.
Note: This patchset breaks backward compatibility for perf record and perf report.
Because the method for generating the AUXTRACE_INFO meta data has changed, using an older perf record will result in metadata that does not match the trace IDs used in the recorded trace data. This mismatch will cause subsequent decode to fail.
The version of the AUXTRACE_INFO has been updated to reflect the fact that the trace source IDs are no longer present in the metadata. This will mean older versions of perf report cannot decode the file.
Applies to coresight/next [c06475910b52] Tested on DB410c
Changes since v1: (after feedback & discussion with Mathieu & Suzuki).
- API has changed. The global trace ID map is managed internally, so it
is no longer passed in to the API functions.
- perf record does not use sysfs to find the trace IDs. These are now
output as AUX_OUTPUT_HW_ID events. The drivers, perf record, and perf report have been updated accordingly to generate and handle these events.
Mike Leach (13): coresight: trace-id: Add API to dynamically assign Trace ID values coresight: trace-id: update CoreSight core to use Trace ID API coresight: stm: Update STM driver to use Trace ID API coresight: etm4x: Update ETM4 driver to use Trace ID API coresight: etm3x: Update ETM3 driver to use Trace ID API coresight: etmX.X: stm: Remove unused legacy source Trace ID ops coresight: perf: traceid: Add perf notifiers for Trace ID perf: cs-etm: Move mapping of Trace ID and cpu into helper function perf: cs-etm: Update record event to use new Trace ID protocol kernel: events: Export perf_report_aux_output_id() perf: cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packet coresight: events: PERF_RECORD_AUX_OUTPUT_HW_ID used for Trace ID coresight: trace-id: Add debug & test macros to Trace ID allocation
drivers/hwtracing/coresight/Makefile | 2 +- drivers/hwtracing/coresight/coresight-core.c | 49 +--- .../hwtracing/coresight/coresight-etm-perf.c | 17 ++ drivers/hwtracing/coresight/coresight-etm.h | 3 +- .../coresight/coresight-etm3x-core.c | 85 +++--- .../coresight/coresight-etm3x-sysfs.c | 28 +- .../coresight/coresight-etm4x-core.c | 65 ++++- .../coresight/coresight-etm4x-sysfs.c | 32 ++- drivers/hwtracing/coresight/coresight-etm4x.h | 3 + drivers/hwtracing/coresight/coresight-stm.c | 49 +--- .../hwtracing/coresight/coresight-trace-id.c | 263 ++++++++++++++++++ .../hwtracing/coresight/coresight-trace-id.h | 65 +++++ include/linux/coresight-pmu.h | 31 ++- include/linux/coresight.h | 3 - kernel/events/core.c | 1 + tools/include/linux/coresight-pmu.h | 31 ++- tools/perf/arch/arm/util/cs-etm.c | 21 +- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 9 + tools/perf/util/cs-etm.c | 220 +++++++++++++-- tools/perf/util/cs-etm.h | 14 +- 20 files changed, 784 insertions(+), 207 deletions(-) create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
Hi James,
On Tue, 26 Jul 2022 at 14:53, James Clark james.clark@arm.com wrote:
On 25/07/2022 09:19, Mike Leach wrote:
Hi James
On Fri, 22 Jul 2022 at 13:10, James Clark james.clark@arm.com wrote:
On 21/07/2022 14:54, Mike Leach wrote:
Hi James,
Thanks for looking at this.
On Thu, 21 Jul 2022 at 11:27, James Clark james.clark@arm.com wrote:
On 04/07/2022 09:11, Mike Leach wrote:
The current method for allocating trace source ID values to sources is to use a fixed algorithm for CPU based sources of (cpu_num * 2 + 0x10). The STM is allocated ID 0x1.
This fixed algorithm is used in both the CoreSight driver code, and by perf when writing the trace metadata in the AUXTRACE_INFO record.
The method needs replacing as currently:-
- It is inefficient in using available IDs.
- Does not scale to larger systems with many cores and the algorithm
has no limits so will generate invalid trace IDs for cpu number > 44.
Additionally requirements to allocate additional system IDs on some systems have been seen.
This patch set introduces an API that allows the allocation of trace IDs in a dynamic manner.
I've tested this with various commands like with per-thread mode, attaching, running the tests and also Carsten's new tests. Apart from the possible backwards compatibility issue and the minor code comments it looks good to me.
I've looked at the backwards compatibility issue. At present with the current set (K = kernel drivers, P-rec = perf record, P-rep = perf report) ::
K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK) K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (fail) K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (fail) K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
So, with a P-rec generating v2 metadata, P rep will cleanly error out. Where the Kernel ID version and the perf report ID version differ, even the P rep v2 will fail, due to the IDs being different in the file and actual drivers. These fails will simply look like no data present.
There are two possible fixes that improve this:- A) if the v2 kernel uses a sysfs flag to indicate new ID usage, then if this is missing the new perf record can degrade to using the old algorithm to put IDs directly into metadata as it assumes it is running on a v1 kernel. This fixes things then for the P-rep v2 that can look for this & we know there will be no incoming ID packets. B) P-rep v2 can look for new packets irrespective of incoming metadata version, and if it sees them, override them
Compatibility matrix then looks like:: K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK) K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK) K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (OK) K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
There is no solution to using an old version of perf record on a new kernel and getting the old version of perf report to correctly decode the file.
We had a discussion about this last point on the Friday AutoFDO call.
Sorry I missed that - I was on holiday.
I think I didn't realise it at the time but I was thinking of two separate requirements relating to this, rather than one. So I will list them here first to avoid confusion:
New perfs fall back to the legacy ID mappings if they don't see any HW_IDs.
This is to support the AutoFDO workflow when using new (fixed) perfs on old kernels. This only affects the Perf side changes in this patch, not any of the kernel changes.
Agreed - if the new perf record fills in the trace ID metadata as it did before using the old static algorithm, then the file generated on an old kernel can be correctly interpreted by the new perf report, as the absence of the new HW_ID packets can trigger it to use the metadata instead.
Wherever possible (absent of any reserved ID clashes or CPUs > 47), the new driver continues to use the old static ID allocation.
This is to support not making any changes to simpleperf (or any other tools if they exist) until there is an actual need to. As you say, this is only a temporary measure. This requirement can also be dropped if we make the simpleperf changes at the same time as these driver updates. But it would buy some time. But we can't fix any tools that we don't know about.
There is no requirement to support old perfs on new kernels as far as I can see.
The _only_ reason to get the ID allocator in the driver to mimic the old allocation numbers is if you _are_ using an old perf to record and then read the data generated on a new kernel. The ID allocator is only visible to the drivers, not perf record. perf record simply makes assumptions about what the ID values are when filling in the file metadata. The old version uses a static calculation on the cpu number, the new version assumes that responsibility has been passed on to the HW_ID packets.
You state below that the version of the metadata should remain updated (@2) so old versions of perf / simpleperf can never read a file generated by new versions of perf. You state above that old versions of perf a not needed to be supported on new kernels, so will never run on a system that uses the new allocation mechanism and thereby never generate an old version of file that mis-matches the new ID allocation mechanism.
So I am confused about precisely what the requirements are here.
Regards
Mike
Do you think it's possible to keep the old static ID allocations if num_possible_cpus() < Max Trace ID? This is especially important for simple perf because Android doesn't even have the more than 128 CPUs issue, so technically shouldn't have to have any changes made to it.
If android never runs high core count hardware, then that could work. The actual CPU limit is in fact 47, after which point the static algorithm fails.
The question arises what do the kernel drivers do then?
The old perf -record will not realise things are about to go wrong, and continue to blindly allocate incorrect trace IDs. Realistically the new drivers will then switch to use the previously unused IDs, whereby they will mismatch with the blindly allocated perf IDs and the old perf decode process will silently fail to decode any data with IDs that do not match.
Do you mean this situation occurs if there are more than 47 cores? I think it's fine for things to go wrong in this case because it's already broken. Regardless of whether the perf and kernel versions match or don't match.
The user would have to upgrade both parts in that case no matter what we do.
If we also removed the metadata version update that goes alongside the ID changes, then old perf-reports would continue to try to decode newer files - again silently failing once the static algorithm has failed.
That's true, but I don't think we need to drop the metaversion update. There's no requirement for an old perf-report to open new files, so we can still make this change.
Legacy ID allocation support must be added as a kernel CONFIG option - so that it is up front an obvious to users what is being selected. And we can output appropriate error messages.
This would be a temporary solution at best as there are upcoming issues that will need attention:- 1 ) We need to deal with the fact that customers are adding new CS compatible hardware to their systems, some of which they have hardcoded trace IDs. These hardware allocations will become reservations in the dynamic allocator, with no guarantee they will not clash with the static algorithm.
Maybe instead of the temporary solution we can just make the change to simpleperf at the same time. The only reason to do this would be to buy some time or make the transition period smoother.
But does it need to be a CONFIG option if it only happens when CPUs < 47 or if there is a clash? We can still output the AUX_OUTPUT_HW_ID, but use the old ID allocation scheme. So it would appear to be the new scheme for anyone looking for HW_IDs, but is also compatible with old simpleperf until there is a clash.
- There may be a point in the future where we need to use per Sink ID
allocation.
- We have an outstanding perf issue with ETE + TRBE which never use
trace IDs - at present decode works here because all the ETE capabilities on the current systems are identical. Once that changes, perf will need updating to look at the trace metadata on a CPU number basis, not on a trace ID basis.
That's true, I have this one on my list but didn't get to it yet.
- Future architecture updates will render newer trace un-decodable by
old perf versions.
The question here is why would Android build an up to date kernel version with the updated CoreSight drivers, but insist on using an outdated perf / simpleperf version?
I suppose I was thinking that it might be convenient to not have to make any changes to simpleperf because it will always run on low core counts. But with the other issues about clashes, it looks like changing it is unavoidable.
But for the opposite (old kernel, new perf), supporting that should be pretty easy and the reason for using that combo is to get a perf with decode fixes and run it somewhere that the kernel can't be easily updated.
James
Regards
Mike
Making the dynamic traceID allocation use the same IDs as before whenever possible should allow both old Perf and simpleperf to open the file as before and ignore the AUX_OUTPUT_HW_ID packets.
James
Thoughts?
Mike
Architecturally reserved IDs are never allocated, and the system is limited to allocating only valid IDs.
Each of the current trace sources ETM3.x, ETM4.x and STM is updated to use the new API.
For the ETMx.x devices IDs are allocated on certain events a) When using sysfs, an ID will be allocated on hardware enable, or a read of sysfs TRCTRACEID register and freed when the sysfs reset is written.
b) When using perf, ID is allocated on hardware enable, and freed on hardware disable. IDs are communicated using the AUX_OUTPUT_HW_ID packet. The ID allocator is notified when perf sessions start and stop so CPU based IDs are kept constant throughout any perf session.
Note: This patchset breaks backward compatibility for perf record and perf report.
Because the method for generating the AUXTRACE_INFO meta data has changed, using an older perf record will result in metadata that does not match the trace IDs used in the recorded trace data. This mismatch will cause subsequent decode to fail.
The version of the AUXTRACE_INFO has been updated to reflect the fact that the trace source IDs are no longer present in the metadata. This will mean older versions of perf report cannot decode the file.
Applies to coresight/next [c06475910b52] Tested on DB410c
Changes since v1: (after feedback & discussion with Mathieu & Suzuki).
- API has changed. The global trace ID map is managed internally, so it
is no longer passed in to the API functions.
- perf record does not use sysfs to find the trace IDs. These are now
output as AUX_OUTPUT_HW_ID events. The drivers, perf record, and perf report have been updated accordingly to generate and handle these events.
Mike Leach (13): coresight: trace-id: Add API to dynamically assign Trace ID values coresight: trace-id: update CoreSight core to use Trace ID API coresight: stm: Update STM driver to use Trace ID API coresight: etm4x: Update ETM4 driver to use Trace ID API coresight: etm3x: Update ETM3 driver to use Trace ID API coresight: etmX.X: stm: Remove unused legacy source Trace ID ops coresight: perf: traceid: Add perf notifiers for Trace ID perf: cs-etm: Move mapping of Trace ID and cpu into helper function perf: cs-etm: Update record event to use new Trace ID protocol kernel: events: Export perf_report_aux_output_id() perf: cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packet coresight: events: PERF_RECORD_AUX_OUTPUT_HW_ID used for Trace ID coresight: trace-id: Add debug & test macros to Trace ID allocation
drivers/hwtracing/coresight/Makefile | 2 +- drivers/hwtracing/coresight/coresight-core.c | 49 +--- .../hwtracing/coresight/coresight-etm-perf.c | 17 ++ drivers/hwtracing/coresight/coresight-etm.h | 3 +- .../coresight/coresight-etm3x-core.c | 85 +++--- .../coresight/coresight-etm3x-sysfs.c | 28 +- .../coresight/coresight-etm4x-core.c | 65 ++++- .../coresight/coresight-etm4x-sysfs.c | 32 ++- drivers/hwtracing/coresight/coresight-etm4x.h | 3 + drivers/hwtracing/coresight/coresight-stm.c | 49 +--- .../hwtracing/coresight/coresight-trace-id.c | 263 ++++++++++++++++++ .../hwtracing/coresight/coresight-trace-id.h | 65 +++++ include/linux/coresight-pmu.h | 31 ++- include/linux/coresight.h | 3 - kernel/events/core.c | 1 + tools/include/linux/coresight-pmu.h | 31 ++- tools/perf/arch/arm/util/cs-etm.c | 21 +- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 9 + tools/perf/util/cs-etm.c | 220 +++++++++++++-- tools/perf/util/cs-etm.h | 14 +- 20 files changed, 784 insertions(+), 207 deletions(-) create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h