This series enables future IP trace features Embedded Trace Extension (ETE) and Trace Buffer Extension (TRBE). This series depends on the ETM system register instruction support series [0] and the v8.4 Self hosted tracing support series (Jonathan Zhou) [1]. The tree is available here [2] for quick access.
ETE is the PE (CPU) trace unit for CPUs, implementing future architecture extensions. ETE overlaps with the ETMv4 architecture, with additions to support the newer architecture features and some restrictions on the supported features w.r.t ETMv4. The ETE support is added by extending the ETMv4 driver to recognise the ETE and handle the features as exposed by the TRCIDRx registers. ETE only supports system instructions access from the host CPU. The ETE could be integrated with a TRBE (see below), or with the legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware description as the ETMs and requires a node per instance.
Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is accessible via the system registers and can be combined with the ETE to provide a 1x1 configuration of source & sink. TRBE is being represented here as a CoreSight sink. Primary reason is that the ETE source could work with other traditional CoreSight sink devices. As TRBE captures the trace data which is produced by ETE, it cannot work alone.
TRBE representation here have some distinct deviations from a traditional CoreSight sink device. Coresight path between ETE and TRBE are not built during boot looking at respective DT or ACPI entries. Instead TRBE gets checked on each available CPU, when found gets connected with respective ETE source device on the same CPU, after altering its outward connections. ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE coupling/decoupling method implemented here is not optimal and would be reworked later on.
Unlike traditional sinks, TRBE can generate interrupts to signal including many other things, buffer got filled. The interrupt is a PPI and should be communicated from the platform. DT or ACPI entry representing TRBE should have the PPI number for a given platform. During perf session, the TRBE IRQ handler should capture trace for perf auxiliary buffer before restarting it back. System registers being used here to configure ETE and TRBE could be referred in the link below.
https://developer.arm.com/docs/ddi0601/g/aarch64-system-registers.
This adds another change where CoreSight sink device needs to be disabled before capturing the trace data for perf in order to avoid race condition with another simultaneous TRBE IRQ handling. This might cause problem with traditional sink devices which can be operated in both sysfs and perf mode. This needs to be addressed correctly. One option would be to move the update_buffer callback into the respective sink devices. e.g, disable().
This series is primarily looking from some early feed back both on proposed design and its implementation. It acknowledges, that it might be incomplete and will have scopes for improvement.
Things todo: - Improve ETE-TRBE coupling and decoupling method - Improve TRBE IRQ handling for all possible corner cases - Implement sysfs based trace sessions
[0] https://lore.kernel.org/linux-arm-kernel/20201028220945.3826358-1-suzuki.pou... [1] https://lore.kernel.org/linux-arm-kernel/1600396210-54196-1-git-send-email-j... [2] https://gitlab.arm.com/linux-arm/linux-skp/-/tree/coresight/etm/v8.4-self-ho...
Anshuman Khandual (6): arm64: Add TRBE definitions coresight: sink: Add TRBE driver coresight: etm-perf: Truncate the perf record if handle has no space coresight: etm-perf: Disable the path before capturing the trace data coresgith: etm-perf: Connect TRBE sink with ETE source dts: bindings: Document device tree binding for Arm TRBE
Suzuki K Poulose (5): coresight: etm-perf: Allow an event to use different sinks coresight: Do not scan for graph if none is present coresight: etm4x: Add support for PE OS lock coresight: ete: Add support for sysreg support coresight: ete: Detect ETE as one of the supported ETMs
.../devicetree/bindings/arm/coresight.txt | 3 + Documentation/devicetree/bindings/arm/trbe.txt | 20 + Documentation/trace/coresight/coresight-trbe.rst | 36 + arch/arm64/include/asm/sysreg.h | 51 ++ drivers/hwtracing/coresight/Kconfig | 11 + drivers/hwtracing/coresight/Makefile | 1 + drivers/hwtracing/coresight/coresight-etm-perf.c | 85 ++- drivers/hwtracing/coresight/coresight-etm-perf.h | 4 + drivers/hwtracing/coresight/coresight-etm4x-core.c | 144 +++- drivers/hwtracing/coresight/coresight-etm4x.h | 64 +- drivers/hwtracing/coresight/coresight-platform.c | 9 +- drivers/hwtracing/coresight/coresight-trbe.c | 768 +++++++++++++++++++++ drivers/hwtracing/coresight/coresight-trbe.h | 525 ++++++++++++++ include/linux/coresight.h | 2 + 14 files changed, 1680 insertions(+), 43 deletions(-) create mode 100644 Documentation/devicetree/bindings/arm/trbe.txt create mode 100644 Documentation/trace/coresight/coresight-trbe.rst create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
This adds TRBE related registers and corresponding feature macros.
Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com --- arch/arm64/include/asm/sysreg.h | 49 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 8bfca08..14cb156 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -330,6 +330,55 @@
#define SYS_PMMIR_EL1 sys_reg(3, 0, 9, 14, 6)
+/* + * TRBE Registers + */ +#define SYS_TRBLIMITR_EL1 sys_reg(3, 0, 9, 11, 0) +#define SYS_TRBPTR_EL1 sys_reg(3, 0, 9, 11, 1) +#define SYS_TRBBASER_EL1 sys_reg(3, 0, 9, 11, 2) +#define SYS_TRBSR_EL1 sys_reg(3, 0, 9, 11, 3) +#define SYS_TRBMAR_EL1 sys_reg(3, 0, 9, 11, 4) +#define SYS_TRBTRG_EL1 sys_reg(3, 0, 9, 11, 6) +#define SYS_TRBIDR_EL1 sys_reg(3, 0, 9, 11, 7) + +#define TRBLIMITR_LIMIT_MASK GENMASK(51, 0) +#define TRBLIMITR_LIMIT_SHIFT 12 +#define TRBLIMITR_NVM (1UL << 5) +#define TRBLIMITR_TRIG_MODE_MASK GENMASK(1, 0) +#define TRBLIMITR_TRIG_MODE_SHIFT 2 +#define TRBLIMITR_FILL_MODE_MASK GENMASK(1, 0) +#define TRBLIMITR_FILL_MODE_SHIFT 1 +#define TRBLIMITR_ENABLE (1UL << 0) +#define TRBPTR_PTR_MASK GENMASK(63, 0) +#define TRBPTR_PTR_SHIFT 0 +#define TRBBASER_BASE_MASK GENMASK(51, 0) +#define TRBBASER_BASE_SHIFT 12 +#define TRBSR_EC_MASK GENMASK(5, 0) +#define TRBSR_EC_SHIFT 26 +#define TRBSR_IRQ (1UL << 22) +#define TRBSR_TRG (1UL << 21) +#define TRBSR_WRAP (1UL << 20) +#define TRBSR_ABORT (1UL << 18) +#define TRBSR_STOP (1UL << 17) +#define TRBSR_MSS_MASK GENMASK(15, 0) +#define TRBSR_MSS_SHIFT 0 +#define TRBSR_BSC_MASK GENMASK(5, 0) +#define TRBSR_BSC_SHIFT 0 +#define TRBSR_FSC_MASK GENMASK(5, 0) +#define TRBSR_FSC_SHIFT 0 +#define TRBMAR_SHARE_MASK GENMASK(1, 0) +#define TRBMAR_SHARE_SHIFT 8 +#define TRBMAR_OUTER_MASK GENMASK(3, 0) +#define TRBMAR_OUTER_SHIFT 4 +#define TRBMAR_INNER_MASK GENMASK(3, 0) +#define TRBMAR_INNER_SHIFT 0 +#define TRBTRG_TRG_MASK GENMASK(31, 0) +#define TRBTRG_TRG_SHIFT 0 +#define TRBIDR_FLAG (1UL << 5) +#define TRBIDR_PROG (1UL << 4) +#define TRBIDR_ALIGN_MASK GENMASK(3, 0) +#define TRBIDR_ALIGN_SHIFT 0 + #define SYS_MAIR_EL1 sys_reg(3, 0, 10, 2, 0) #define SYS_AMAIR_EL1 sys_reg(3, 0, 10, 3, 0)
From: Suzuki K Poulose suzuki.poulose@arm.com
When there are multiple sinks on the system, in the absence of a specified sink, it is quite possible that a default sink for an ETM could be different from that of another ETM. However we do not support having multiple sinks for an event yet. This patch allows the event to use the default sinks on the ETMs where they are scheduled as long as the sinks are of the same type.
e.g, if we have 1x1 topology with per-CPU ETRs, the event can use the per-CPU ETR for the session. However, if the sinks are of different type, e.g TMC-ETR on one and a custom sink on another, the event will only trace on the first detected sink.
Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com --- drivers/hwtracing/coresight/coresight-etm-perf.c | 50 ++++++++++++++++++------ 1 file changed, 39 insertions(+), 11 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index c2c9b12..ea73cfa 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -204,14 +204,22 @@ static void etm_free_aux(void *data) schedule_work(&event_data->work); }
+static bool sinks_match(struct coresight_device *a, struct coresight_device *b) +{ + if (!a || !b) + return false; + return (sink_ops(a) == sink_ops(b)); +} + static void *etm_setup_aux(struct perf_event *event, void **pages, int nr_pages, bool overwrite) { u32 id; int cpu = event->cpu; cpumask_t *mask; - struct coresight_device *sink; + struct coresight_device *sink = NULL; struct etm_event_data *event_data = NULL; + bool sink_forced = false;
event_data = alloc_event_data(cpu); if (!event_data) @@ -222,6 +230,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, if (event->attr.config2) { id = (u32)event->attr.config2; sink = coresight_get_sink_by_id(id); + sink_forced = true; }
mask = &event_data->mask; @@ -235,7 +244,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, */ for_each_cpu(cpu, mask) { struct list_head *path; - struct coresight_device *csdev; + struct coresight_device *csdev, *new_sink;
csdev = per_cpu(csdev_src, cpu); /* @@ -249,21 +258,35 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, }
/* - * No sink provided - look for a default sink for one of the - * devices. At present we only support topology where all CPUs - * use the same sink [N:1], so only need to find one sink. The - * coresight_build_path later will remove any CPU that does not - * attach to the sink, or if we have not found a sink. + * No sink provided - look for a default sink for all the devices. + * We only support multiple sinks, only if all the default sinks + * are of the same type, so that the sink buffer can be shared + * as the event moves around. We don't trace on a CPU if it can't + * */ - if (!sink) - sink = coresight_find_default_sink(csdev); + if (!sink_forced) { + new_sink = coresight_find_default_sink(csdev); + if (!new_sink) { + cpumask_clear_cpu(cpu, mask); + continue; + } + /* Skip checks for the first sink */ + if (!sink) { + sink = new_sink; + } else if (!sinks_match(new_sink, sink)) { + cpumask_clear_cpu(cpu, mask); + continue; + } + } else { + new_sink = sink; + }
/* * Building a path doesn't enable it, it simply builds a * list of devices from source to sink that can be * referenced later when the path is actually needed. */ - path = coresight_build_path(csdev, sink); + path = coresight_build_path(csdev, new_sink); if (IS_ERR(path)) { cpumask_clear_cpu(cpu, mask); continue; @@ -284,7 +307,12 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, if (!sink_ops(sink)->alloc_buffer || !sink_ops(sink)->free_buffer) goto err;
- /* Allocate the sink buffer for this session */ + /* + * Allocate the sink buffer for this session. All the sinks + * where this event can be scheduled are ensured to be of the + * same type. Thus the same sink configuration is used by the + * sinks. + */ event_data->snk_config = sink_ops(sink)->alloc_buffer(sink, event, pages, nr_pages, overwrite);
Hi Linu,
Please could you test this slightly modified version and give us a Tested-by tag if you are happy with the results ?
Suzuki
On 11/10/20 12:45 PM, Anshuman Khandual wrote:
From: Suzuki K Poulose suzuki.poulose@arm.com
When there are multiple sinks on the system, in the absence of a specified sink, it is quite possible that a default sink for an ETM could be different from that of another ETM. However we do not support having multiple sinks for an event yet. This patch allows the event to use the default sinks on the ETMs where they are scheduled as long as the sinks are of the same type.
e.g, if we have 1x1 topology with per-CPU ETRs, the event can use the per-CPU ETR for the session. However, if the sinks are of different type, e.g TMC-ETR on one and a custom sink on another, the event will only trace on the first detected sink.
Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com
drivers/hwtracing/coresight/coresight-etm-perf.c | 50 ++++++++++++++++++------ 1 file changed, 39 insertions(+), 11 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index c2c9b12..ea73cfa 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -204,14 +204,22 @@ static void etm_free_aux(void *data) schedule_work(&event_data->work); } +static bool sinks_match(struct coresight_device *a, struct coresight_device *b) +{
- if (!a || !b)
return false;
- return (sink_ops(a) == sink_ops(b));
+}
- static void *etm_setup_aux(struct perf_event *event, void **pages, int nr_pages, bool overwrite) { u32 id; int cpu = event->cpu; cpumask_t *mask;
- struct coresight_device *sink;
- struct coresight_device *sink = NULL; struct etm_event_data *event_data = NULL;
- bool sink_forced = false;
event_data = alloc_event_data(cpu); if (!event_data) @@ -222,6 +230,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, if (event->attr.config2) { id = (u32)event->attr.config2; sink = coresight_get_sink_by_id(id);
}sink_forced = true;
mask = &event_data->mask; @@ -235,7 +244,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, */ for_each_cpu(cpu, mask) { struct list_head *path;
struct coresight_device *csdev;
struct coresight_device *csdev, *new_sink;
csdev = per_cpu(csdev_src, cpu); /* @@ -249,21 +258,35 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, } /*
* No sink provided - look for a default sink for one of the
* devices. At present we only support topology where all CPUs
* use the same sink [N:1], so only need to find one sink. The
* coresight_build_path later will remove any CPU that does not
* attach to the sink, or if we have not found a sink.
* No sink provided - look for a default sink for all the devices.
* We only support multiple sinks, only if all the default sinks
* are of the same type, so that the sink buffer can be shared
* as the event moves around. We don't trace on a CPU if it can't
*/*
if (!sink)
sink = coresight_find_default_sink(csdev);
if (!sink_forced) {
new_sink = coresight_find_default_sink(csdev);
if (!new_sink) {
cpumask_clear_cpu(cpu, mask);
continue;
}
/* Skip checks for the first sink */
if (!sink) {
sink = new_sink;
} else if (!sinks_match(new_sink, sink)) {
cpumask_clear_cpu(cpu, mask);
continue;
}
} else {
new_sink = sink;
}
/* * Building a path doesn't enable it, it simply builds a * list of devices from source to sink that can be * referenced later when the path is actually needed. */
path = coresight_build_path(csdev, sink);
if (IS_ERR(path)) { cpumask_clear_cpu(cpu, mask); continue;path = coresight_build_path(csdev, new_sink);
@@ -284,7 +307,12 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, if (!sink_ops(sink)->alloc_buffer || !sink_ops(sink)->free_buffer) goto err;
- /* Allocate the sink buffer for this session */
- /*
* Allocate the sink buffer for this session. All the sinks
* where this event can be scheduled are ensured to be of the
* same type. Thus the same sink configuration is used by the
* sinks.
event_data->snk_config = sink_ops(sink)->alloc_buffer(sink, event, pages, nr_pages, overwrite);*/
Hi Suzuki,
On Thu, Nov 12, 2020 at 2:51 PM Suzuki K Poulose suzuki.poulose@arm.com wrote:
Hi Linu,
Please could you test this slightly modified version and give us a Tested-by tag if you are happy with the results ?
Suzuki
On 11/10/20 12:45 PM, Anshuman Khandual wrote:
From: Suzuki K Poulose suzuki.poulose@arm.com
When there are multiple sinks on the system, in the absence of a specified sink, it is quite possible that a default sink for an ETM could be different from that of another ETM. However we do not support having multiple sinks for an event yet. This patch allows the event to use the default sinks on the ETMs where they are scheduled as long as the sinks are of the same type.
e.g, if we have 1x1 topology with per-CPU ETRs, the event can use the per-CPU ETR for the session. However, if the sinks are of different type, e.g TMC-ETR on one and a custom sink on another, the event will only trace on the first detected sink.
Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com
drivers/hwtracing/coresight/coresight-etm-perf.c | 50 ++++++++++++++++++------ 1 file changed, 39 insertions(+), 11 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index c2c9b12..ea73cfa 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -204,14 +204,22 @@ static void etm_free_aux(void *data) schedule_work(&event_data->work); }
+static bool sinks_match(struct coresight_device *a, struct coresight_device *b) +{
if (!a || !b)
return false;
return (sink_ops(a) == sink_ops(b));
+}
- static void *etm_setup_aux(struct perf_event *event, void **pages, int nr_pages, bool overwrite) { u32 id; int cpu = event->cpu; cpumask_t *mask;
struct coresight_device *sink;
struct coresight_device *sink = NULL; struct etm_event_data *event_data = NULL;
bool sink_forced = false; event_data = alloc_event_data(cpu); if (!event_data)
@@ -222,6 +230,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, if (event->attr.config2) { id = (u32)event->attr.config2; sink = coresight_get_sink_by_id(id);
sink_forced = true; } mask = &event_data->mask;
@@ -235,7 +244,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, */ for_each_cpu(cpu, mask) { struct list_head *path;
struct coresight_device *csdev;
struct coresight_device *csdev, *new_sink; csdev = per_cpu(csdev_src, cpu); /*
@@ -249,21 +258,35 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, }
/*
* No sink provided - look for a default sink for one of the
* devices. At present we only support topology where all CPUs
* use the same sink [N:1], so only need to find one sink. The
* coresight_build_path later will remove any CPU that does not
* attach to the sink, or if we have not found a sink.
* No sink provided - look for a default sink for all the devices.
* We only support multiple sinks, only if all the default sinks
* are of the same type, so that the sink buffer can be shared
* as the event moves around. We don't trace on a CPU if it can't
* */
if (!sink)
sink = coresight_find_default_sink(csdev);
if (!sink_forced) {
new_sink = coresight_find_default_sink(csdev);
if (!new_sink) {
cpumask_clear_cpu(cpu, mask);
continue;
}
/* Skip checks for the first sink */
if (!sink) {
sink = new_sink;
} else if (!sinks_match(new_sink, sink)) {
cpumask_clear_cpu(cpu, mask);
continue;
}
} else {
new_sink = sink;
} /* * Building a path doesn't enable it, it simply builds a * list of devices from source to sink that can be * referenced later when the path is actually needed. */
path = coresight_build_path(csdev, sink);
path = coresight_build_path(csdev, new_sink); if (IS_ERR(path)) { cpumask_clear_cpu(cpu, mask); continue;
@@ -284,7 +307,12 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, if (!sink_ops(sink)->alloc_buffer || !sink_ops(sink)->free_buffer) goto err;
/* Allocate the sink buffer for this session */
/*
* Allocate the sink buffer for this session. All the sinks
* where this event can be scheduled are ensured to be of the
* same type. Thus the same sink configuration is used by the
* sinks.
*/ event_data->snk_config = sink_ops(sink)->alloc_buffer(sink, event, pages, nr_pages, overwrite);
Perf record and report worked fine with this as well, with formatting related opencsd hacks.
Tested-by : Linu Cherian lcherian@marvell.com
Thanks.
On 11/12/20 10:37 AM, Linu Cherian wrote:
Hi Suzuki,
On Thu, Nov 12, 2020 at 2:51 PM Suzuki K Poulose suzuki.poulose@arm.com wrote:
Hi Linu,
Please could you test this slightly modified version and give us a Tested-by tag if you are happy with the results ?
Suzuki
On 11/10/20 12:45 PM, Anshuman Khandual wrote:
From: Suzuki K Poulose suzuki.poulose@arm.com
When there are multiple sinks on the system, in the absence of a specified sink, it is quite possible that a default sink for an ETM could be different from that of another ETM. However we do not support having multiple sinks for an event yet. This patch allows the event to use the default sinks on the ETMs where they are scheduled as long as the sinks are of the same type.
e.g, if we have 1x1 topology with per-CPU ETRs, the event can use the per-CPU ETR for the session. However, if the sinks are of different type, e.g TMC-ETR on one and a custom sink on another, the event will only trace on the first detected sink.
Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com
@@ -284,7 +307,12 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, if (!sink_ops(sink)->alloc_buffer || !sink_ops(sink)->free_buffer) goto err;
/* Allocate the sink buffer for this session */
/*
* Allocate the sink buffer for this session. All the sinks
* where this event can be scheduled are ensured to be of the
* same type. Thus the same sink configuration is used by the
* sinks.
*/ event_data->snk_config = sink_ops(sink)->alloc_buffer(sink, event, pages, nr_pages, overwrite);
Perf record and report worked fine with this as well, with formatting related opencsd hacks.
Tested-by : Linu Cherian lcherian@marvell.com
Thanks Linu, much appreciated.
Suzuki
From: Suzuki K Poulose suzuki.poulose@arm.com
If a graph node is not found for a given node, of_get_next_endpoint() will emit the following error message :
OF: graph: no port node found in /<node_name>
If the given component doesn't have any explicit connections (e.g, ETE) we could simply ignore the graph parsing.
Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com --- drivers/hwtracing/coresight/coresight-platform.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-platform.c b/drivers/hwtracing/coresight/coresight-platform.c index 3629b78..c594f45 100644 --- a/drivers/hwtracing/coresight/coresight-platform.c +++ b/drivers/hwtracing/coresight/coresight-platform.c @@ -90,6 +90,12 @@ static void of_coresight_get_ports_legacy(const struct device_node *node, struct of_endpoint endpoint; int in = 0, out = 0;
+ /* + * Avoid warnings in of_graph_get_next_endpoint() + * if the device doesn't have any graph connections + */ + if (!of_graph_is_present(node)) + return; do { ep = of_graph_get_next_endpoint(node, ep); if (!ep)
From: Suzuki K Poulose suzuki.poulose@arm.com
ETE may not implement the OS lock and instead could rely on the PE OS Lock for the trace unit access. This is indicated by the TRCOLSR.OSM == 0b100. Add support for handling the PE OS lock
Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com --- drivers/hwtracing/coresight/coresight-etm4x-core.c | 50 ++++++++++++++++++---- drivers/hwtracing/coresight/coresight-etm4x.h | 15 +++++++ 2 files changed, 56 insertions(+), 9 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index fd945c1..0269b4c 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -101,30 +101,59 @@ void etm4x_sysreg_write(struct csdev_access *csa, } }
-static void etm4_os_unlock_csa(struct etmv4_drvdata *drvdata, struct csdev_access *csa) +static void etm_detect_os_lock(struct etmv4_drvdata *drvdata, + struct csdev_access *csa) { - /* Writing 0 to TRCOSLAR unlocks the trace registers */ - etm4x_relaxed_write32(csa, 0x0, TRCOSLAR); - drvdata->os_unlock = true; + u32 oslsr = etm4x_relaxed_read32(csa, TRCOSLSR); + + drvdata->os_lock_model = ETM_OSLSR_OSLM(oslsr); +} + +static void etm_write_os_lock(struct etmv4_drvdata *drvdata, + struct csdev_access *csa, u32 val) +{ + val = !!val; + + switch (drvdata->os_lock_model) { + case ETM_OSLOCK_PRESENT: + etm4x_relaxed_write32(csa, val, TRCOSLAR); + break; + case ETM_OSLOCK_PE: + write_sysreg_s(val, SYS_OSLAR_EL1); + break; + default: + pr_warn_once("CPU%d: Unsupported Trace OSLock model: %x\n", + smp_processor_id(), drvdata->os_lock_model); + fallthrough; + case ETM_OSLOCK_NI: + return; + } isb(); }
+static inline void etm4_os_unlock_csa(struct etmv4_drvdata *drvdata, + struct csdev_access *csa) +{ + WARN_ON(drvdata->cpu != smp_processor_id()); + + /* Writing 0 to OS Lock unlocks the trace unit registers */ + etm_write_os_lock(drvdata, csa, 0x0); + drvdata->os_unlock = true; +} + static void etm4_os_unlock(struct etmv4_drvdata *drvdata) { if (!WARN_ON(!drvdata->csdev)) etm4_os_unlock_csa(drvdata, &drvdata->csdev->access); - }
static void etm4_os_lock(struct etmv4_drvdata *drvdata) { if (WARN_ON(!drvdata->csdev)) return; - - /* Writing 0x1 to TRCOSLAR locks the trace registers */ - etm4x_relaxed_write32(&drvdata->csdev->access, 0x1, TRCOSLAR); + /* Writing 0x1 to OS Lock locks the trace registers */ + etm_write_os_lock(drvdata, &drvdata->csdev->access, 0x1); drvdata->os_unlock = false; - isb(); }
static void etm4_cs_lock(struct etmv4_drvdata *drvdata, @@ -794,6 +823,9 @@ static void etm4_init_arch_data(void *info) if (!etm_init_csdev_access(drvdata, csa)) return;
+ /* Detect the support for OS Lock before we actuall use it */ + etm_detect_os_lock(drvdata, csa); + /* Make sure all registers are accessible */ etm4_os_unlock_csa(drvdata, csa); etm4_cs_unlock(drvdata, csa); diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h index fe71072..4b1bfc2 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.h +++ b/drivers/hwtracing/coresight/coresight-etm4x.h @@ -497,6 +497,20 @@ ETM_MODE_EXCL_USER)
/* + * TRCOSLSR.OSLM advertises the OS Lock model. + * OSLM[2:0] = TRCOSLSR[4:3,0] + * + * 0b000 - Trace OS Lock is not implemented. + * 0b010 - Trace OS Lock is implemented. + * 0b100 - Trace OS Lock is not implemented, unit is controlled by PE OS Lock. + */ +#define ETM_OSLOCK_NI 0b000 +#define ETM_OSLOCK_PRESENT 0b010 +#define ETM_OSLOCK_PE 0b100 + +#define ETM_OSLSR_OSLM(oslsr) ((((oslsr) & GENMASK(4, 3)) >> 2) | (oslsr & 0x1)) + +/* * TRCDEVARCH Bit field definitions * Bits[31:21] - ARCHITECT = Always Arm Ltd. * * Bits[31:28] = 0x4 @@ -879,6 +893,7 @@ struct etmv4_drvdata { u8 s_ex_level; u8 ns_ex_level; u8 q_support; + u8 os_lock_model; bool sticky_enable; bool boot_enable; bool os_unlock;
From: Suzuki K Poulose suzuki.poulose@arm.com
This adds sysreg support for ETE.
Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com --- drivers/hwtracing/coresight/coresight-etm4x-core.c | 39 ++++++++++++++++++++ drivers/hwtracing/coresight/coresight-etm4x.h | 42 +++++++++++++++++----- 2 files changed, 72 insertions(+), 9 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index 0269b4c..15b6e94 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -101,6 +101,45 @@ void etm4x_sysreg_write(struct csdev_access *csa, } }
+u64 ete_sysreg_read(struct csdev_access *csa, + u32 offset, + bool _relaxed, + bool _64bit) +{ + u64 res = 0; + + switch (offset) { + ETE_READ_CASES(res) + default : + WARN_ONCE(1, "ete: trying to read unsupported register @%x\n", + offset); + } + + if (!_relaxed) + __iormb(res); /* Imitate the !relaxed I/O helpers */ + + return res; +} + +void ete_sysreg_write(struct csdev_access *csa, + u64 val, + u32 offset, + bool _relaxed, + bool _64bit) +{ + if (!_relaxed) + __iowmb(); /* Imitate the !relaxed I/O helpers */ + if (!_64bit) + val &= GENMASK(31, 0); + + switch (offset) { + ETE_WRITE_CASES(val) + default : + WARN_ONCE(1, "ete: trying to write to unsupported register @%x\n", + offset); + } +} + static void etm_detect_os_lock(struct etmv4_drvdata *drvdata, struct csdev_access *csa) { diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h index 4b1bfc2..00c0367 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.h +++ b/drivers/hwtracing/coresight/coresight-etm4x.h @@ -28,6 +28,7 @@ #define TRCAUXCTLR 0x018 #define TRCEVENTCTL0R 0x020 #define TRCEVENTCTL1R 0x024 +#define TRCRSR 0x028 #define TRCSTALLCTLR 0x02C #define TRCTSCTLR 0x030 #define TRCSYNCPR 0x034 @@ -48,6 +49,7 @@ #define TRCSEQRSTEVR 0x118 #define TRCSEQSTR 0x11C #define TRCEXTINSELR 0x120 +#define TRCEXTINSELRn(n) (0x120 + (n * 4)) /* n = 0-3 */ #define TRCCNTRLDVRn(n) (0x140 + (n * 4)) /* n = 0-3 */ #define TRCCNTCTLRn(n) (0x150 + (n * 4)) /* n = 0-3 */ #define TRCCNTVRn(n) (0x160 + (n * 4)) /* n = 0-3 */ @@ -156,9 +158,22 @@ #define CASE_WRITE(val, x) \ case (x): { write_etm4x_sysreg_const_offset((val), (x)); break; }
-#define CASE_LIST(op, val) \ - CASE_##op((val), TRCPRGCTLR) \ +#define ETE_ONLY_LIST(op, val) \ + CASE_##op((val), TRCRSR) \ + CASE_##op((val), TRCEXTINSELRn(1)) \ + CASE_##op((val), TRCEXTINSELRn(2)) \ + CASE_##op((val), TRCEXTINSELRn(3)) + +#define ETM_ONLY_LIST(op, val) \ CASE_##op((val), TRCPROCSELR) \ + CASE_##op((val), TRCVDCTLR) \ + CASE_##op((val), TRCVDSACCTLR) \ + CASE_##op((val), TRCVDARCCTLR) \ + CASE_##op((val), TRCITCTRL) \ + CASE_##op((val), TRCOSLAR) + +#define COMMON_LIST(op, val) \ + CASE_##op((val), TRCPRGCTLR) \ CASE_##op((val), TRCSTATR) \ CASE_##op((val), TRCCONFIGR) \ CASE_##op((val), TRCAUXCTLR) \ @@ -175,9 +190,6 @@ CASE_##op((val), TRCVIIECTLR) \ CASE_##op((val), TRCVISSCTLR) \ CASE_##op((val), TRCVIPCSSCTLR) \ - CASE_##op((val), TRCVDCTLR) \ - CASE_##op((val), TRCVDSACCTLR) \ - CASE_##op((val), TRCVDARCCTLR) \ CASE_##op((val), TRCSEQEVRn(0)) \ CASE_##op((val), TRCSEQEVRn(1)) \ CASE_##op((val), TRCSEQEVRn(2)) \ @@ -272,7 +284,6 @@ CASE_##op((val), TRCSSPCICRn(5)) \ CASE_##op((val), TRCSSPCICRn(6)) \ CASE_##op((val), TRCSSPCICRn(7)) \ - CASE_##op((val), TRCOSLAR) \ CASE_##op((val), TRCOSLSR) \ CASE_##op((val), TRCPDCR) \ CASE_##op((val), TRCPDSR) \ @@ -344,7 +355,6 @@ CASE_##op((val), TRCCIDCCTLR1) \ CASE_##op((val), TRCVMIDCCTLR0) \ CASE_##op((val), TRCVMIDCCTLR1) \ - CASE_##op((val), TRCITCTRL) \ CASE_##op((val), TRCCLAIMSET) \ CASE_##op((val), TRCCLAIMCLR) \ CASE_##op((val), TRCDEVAFF0) \ @@ -364,8 +374,22 @@ CASE_##op((val), TRCPIDR2) \ CASE_##op((val), TRCPIDR3)
-#define ETM4x_READ_CASES(res) CASE_LIST(READ, (res)) -#define ETM4x_WRITE_CASES(val) CASE_LIST(WRITE, (val)) +#define ETM4x_READ_CASES(res) \ + COMMON_LIST(READ, (res)) \ + ETM_ONLY_LIST(READ, (res)) + +#define ETM4x_WRITE_CASES(res) \ + COMMON_LIST(WRITE, (res)) \ + ETM_ONLY_LIST(WRITE, (res)) + +#define ETE_READ_CASES(res) \ + COMMON_LIST(READ, (res)) \ + ETE_ONLY_LIST(READ, (res)) + +#define ETE_WRITE_CASES(res) \ + COMMON_LIST(WRITE, (res)) \ + ETE_ONLY_LIST(WRITE, (res)) +
#define read_etm4x_sysreg_offset(csa, offset, _64bit) \ ({ \
From: Suzuki K Poulose suzuki.poulose@arm.com
Add ETE as one of the supported device types we support with ETM4x driver. The devices are named following the existing convention as ete<N>.
ETE mandates that the trace resource status register is programmed before the tracing is turned on. For the moment simply write to it indicating TraceActive.
Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com --- .../devicetree/bindings/arm/coresight.txt | 3 ++ drivers/hwtracing/coresight/coresight-etm4x-core.c | 55 +++++++++++++++++----- drivers/hwtracing/coresight/coresight-etm4x.h | 7 +++ 3 files changed, 52 insertions(+), 13 deletions(-)
diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt index bff96a5..784cc1b 100644 --- a/Documentation/devicetree/bindings/arm/coresight.txt +++ b/Documentation/devicetree/bindings/arm/coresight.txt @@ -40,6 +40,9 @@ its hardware characteristcs. - Embedded Trace Macrocell with system register access only. "arm,coresight-etm-sysreg";
+ - Embedded Trace Extensions. + "arm,ete" + - Coresight programmable Replicator : "arm,coresight-dynamic-replicator", "arm,primecell";
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index 15b6e94..0fea349 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -331,6 +331,13 @@ static int etm4_enable_hw(struct etmv4_drvdata *drvdata) etm4x_relaxed_write32(csa, trcpdcr | TRCPDCR_PU, TRCPDCR); }
+ /* + * ETE mandates that the TRCRSR is written to before + * enabling it. + */ + if (drvdata->arch >= ETM_ARCH_ETE) + etm4x_relaxed_write32(csa, TRCRSR_TA, TRCRSR); + /* Enable the trace unit */ etm4x_relaxed_write32(csa, 1, TRCPRGCTLR);
@@ -763,13 +770,24 @@ static bool etm_init_sysreg_access(struct etmv4_drvdata *drvdata, * ETMs implementing sysreg access must implement TRCDEVARCH. */ devarch = read_etm4x_sysreg_const_offset(TRCDEVARCH); - if ((devarch & ETM_DEVARCH_ID_MASK) != ETM_DEVARCH_ETMv4x_ARCH) + switch (devarch & ETM_DEVARCH_ID_MASK) { + case ETM_DEVARCH_ETMv4x_ARCH: + *csa = (struct csdev_access) { + .io_mem = false, + .read = etm4x_sysreg_read, + .write = etm4x_sysreg_write, + }; + break; + case ETM_DEVARCH_ETE_ARCH: + *csa = (struct csdev_access) { + .io_mem = false, + .read = ete_sysreg_read, + .write = ete_sysreg_write, + }; + break; + default: return false; - *csa = (struct csdev_access) { - .io_mem = false, - .read = etm4x_sysreg_read, - .write = etm4x_sysreg_write, - }; + }
drvdata->arch = etm_devarch_to_arch(devarch); return true; @@ -1698,6 +1716,8 @@ static int etm4_probe(struct device *dev, void __iomem *base) struct etmv4_drvdata *drvdata; struct coresight_desc desc = { 0 }; struct etm_init_arg init_arg = { 0 }; + u8 major, minor; + char *type_name;
drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL); if (!drvdata) @@ -1724,10 +1744,6 @@ static int etm4_probe(struct device *dev, void __iomem *base) if (drvdata->cpu < 0) return drvdata->cpu;
- desc.name = devm_kasprintf(dev, GFP_KERNEL, "etm%d", drvdata->cpu); - if (!desc.name) - return -ENOMEM; - init_arg.drvdata = drvdata; init_arg.csa = &desc.access;
@@ -1742,6 +1758,19 @@ static int etm4_probe(struct device *dev, void __iomem *base) if (!desc.access.io_mem || fwnode_property_present(dev_fwnode(dev), "qcom,skip-power-up")) drvdata->skip_power_up = true; + major = ETM_ARCH_MAJOR_VERSION(drvdata->arch); + minor = ETM_ARCH_MINOR_VERSION(drvdata->arch); + if (drvdata->arch >= ETM_ARCH_ETE) { + type_name = "ete"; + major -= 4; + } else { + type_name = "etm"; + } + + desc.name = devm_kasprintf(dev, GFP_KERNEL, + "%s%d", type_name, drvdata->cpu); + if (!desc.name) + return -ENOMEM;
etm4_init_trace_id(drvdata); etm4_set_default(&drvdata->config); @@ -1770,9 +1799,8 @@ static int etm4_probe(struct device *dev, void __iomem *base)
etmdrvdata[drvdata->cpu] = drvdata;
- dev_info(&drvdata->csdev->dev, "CPU%d: ETM v%d.%d initialized\n", - drvdata->cpu, ETM_ARCH_MAJOR_VERSION(drvdata->arch), - ETM_ARCH_MINOR_VERSION(drvdata->arch)); + dev_info(&drvdata->csdev->dev, "CPU%d: %s v%d.%d initialized\n", + drvdata->cpu, type_name, major, minor);
if (boot_enable) { coresight_enable(drvdata->csdev); @@ -1892,6 +1920,7 @@ static struct amba_driver etm4x_amba_driver = {
static const struct of_device_id etm_sysreg_match[] = { { .compatible = "arm,coresight-etm-sysreg" }, + { .compatible = "arm,ete" }, {} };
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h index 00c0367..05fd0e5 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.h +++ b/drivers/hwtracing/coresight/coresight-etm4x.h @@ -127,6 +127,8 @@ #define TRCCIDR2 0xFF8 #define TRCCIDR3 0xFFC
+#define TRCRSR_TA BIT(12) + /* * System instructions to access ETM registers. * See ETMv4.4 spec ARM IHI0064F section 4.3.6 System instructions @@ -570,11 +572,14 @@ ((ETM_DEVARCH_MAKE_ARCHID_ARCH_VER(major)) | ETM_DEVARCH_ARCHID_ARCH_PART(0xA13))
#define ETM_DEVARCH_ARCHID_ETMv4x ETM_DEVARCH_MAKE_ARCHID(0x4) +#define ETM_DEVARCH_ARCHID_ETE ETM_DEVARCH_MAKE_ARCHID(0x5)
#define ETM_DEVARCH_ID_MASK \ (ETM_DEVARCH_ARCHITECT_MASK | ETM_DEVARCH_ARCHID_MASK | ETM_DEVARCH_PRESENT) #define ETM_DEVARCH_ETMv4x_ARCH \ (ETM_DEVARCH_ARCHITECT_ARM | ETM_DEVARCH_ARCHID_ETMv4x | ETM_DEVARCH_PRESENT) +#define ETM_DEVARCH_ETE_ARCH \ + (ETM_DEVARCH_ARCHITECT_ARM | ETM_DEVARCH_ARCHID_ETE | ETM_DEVARCH_PRESENT)
#define TRCSTATR_IDLE_BIT 0 #define TRCSTATR_PMSTABLE_BIT 1 @@ -661,6 +666,8 @@ #define ETM_ARCH_MINOR_VERSION(arch) ((arch) & 0xfU)
#define ETM_ARCH_V4 ETM_ARCH_VERSION(4, 0) +#define ETM_ARCH_ETE ETM_ARCH_VERSION(5, 0) + /* Interpretation of resource numbers change at ETM v4.3 architecture */ #define ETM_ARCH_V4_3 ETM_ARCH_VERSION(4, 3)
Hi Anshuman,
On Tue, Nov 10, 2020 at 08:45:04PM +0800, Anshuman Khandual wrote:
From: Suzuki K Poulose suzuki.poulose@arm.com
Add ETE as one of the supported device types we support with ETM4x driver. The devices are named following the existing convention as ete<N>.
ETE mandates that the trace resource status register is programmed before the tracing is turned on. For the moment simply write to it indicating TraceActive.
Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com
.../devicetree/bindings/arm/coresight.txt | 3 ++ drivers/hwtracing/coresight/coresight-etm4x-core.c | 55 +++++++++++++++++----- drivers/hwtracing/coresight/coresight-etm4x.h | 7 +++ 3 files changed, 52 insertions(+), 13 deletions(-)
diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt index bff96a5..784cc1b 100644 --- a/Documentation/devicetree/bindings/arm/coresight.txt +++ b/Documentation/devicetree/bindings/arm/coresight.txt @@ -40,6 +40,9 @@ its hardware characteristcs. - Embedded Trace Macrocell with system register access only. "arm,coresight-etm-sysreg";
- Embedded Trace Extensions.
"arm,ete"
- Coresight programmable Replicator : "arm,coresight-dynamic-replicator", "arm,primecell";
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index 15b6e94..0fea349 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -331,6 +331,13 @@ static int etm4_enable_hw(struct etmv4_drvdata *drvdata) etm4x_relaxed_write32(csa, trcpdcr | TRCPDCR_PU, TRCPDCR); }
- /*
* ETE mandates that the TRCRSR is written to before
* enabling it.
*/
- if (drvdata->arch >= ETM_ARCH_ETE)
etm4x_relaxed_write32(csa, TRCRSR_TA, TRCRSR);
- /* Enable the trace unit */ etm4x_relaxed_write32(csa, 1, TRCPRGCTLR);
@@ -763,13 +770,24 @@ static bool etm_init_sysreg_access(struct etmv4_drvdata *drvdata, * ETMs implementing sysreg access must implement TRCDEVARCH. */ devarch = read_etm4x_sysreg_const_offset(TRCDEVARCH);
- if ((devarch & ETM_DEVARCH_ID_MASK) != ETM_DEVARCH_ETMv4x_ARCH)
- switch (devarch & ETM_DEVARCH_ID_MASK) {
- case ETM_DEVARCH_ETMv4x_ARCH:
*csa = (struct csdev_access) {
.io_mem = false,
.read = etm4x_sysreg_read,
.write = etm4x_sysreg_write,
};
break;
- case ETM_DEVARCH_ETE_ARCH:
*csa = (struct csdev_access) {
.io_mem = false,
.read = ete_sysreg_read,
.write = ete_sysreg_write,
};
break;
- default: return false;
- *csa = (struct csdev_access) {
.io_mem = false,
.read = etm4x_sysreg_read,
.write = etm4x_sysreg_write,
- };
}
drvdata->arch = etm_devarch_to_arch(devarch); return true;
@@ -1698,6 +1716,8 @@ static int etm4_probe(struct device *dev, void __iomem *base) struct etmv4_drvdata *drvdata; struct coresight_desc desc = { 0 }; struct etm_init_arg init_arg = { 0 };
u8 major, minor;
char *type_name;
drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL); if (!drvdata)
@@ -1724,10 +1744,6 @@ static int etm4_probe(struct device *dev, void __iomem *base) if (drvdata->cpu < 0) return drvdata->cpu;
- desc.name = devm_kasprintf(dev, GFP_KERNEL, "etm%d", drvdata->cpu);
- if (!desc.name)
return -ENOMEM;
- init_arg.drvdata = drvdata; init_arg.csa = &desc.access;
@@ -1742,6 +1758,19 @@ static int etm4_probe(struct device *dev, void __iomem *base) if (!desc.access.io_mem || fwnode_property_present(dev_fwnode(dev), "qcom,skip-power-up")) drvdata->skip_power_up = true;
- major = ETM_ARCH_MAJOR_VERSION(drvdata->arch);
- minor = ETM_ARCH_MINOR_VERSION(drvdata->arch);
- if (drvdata->arch >= ETM_ARCH_ETE) {
type_name = "ete";
major -= 4;
- } else {
type_name = "etm";
- }
When trace unit supports ETE, could it be still compatible with ETMv4.4? Can use selectively use it as ETM instead of ETE?
Thanks, Tingwei
desc.name = devm_kasprintf(dev, GFP_KERNEL,
"%s%d", type_name, drvdata->cpu);
if (!desc.name)
return -ENOMEM;
etm4_init_trace_id(drvdata); etm4_set_default(&drvdata->config);
@@ -1770,9 +1799,8 @@ static int etm4_probe(struct device *dev, void __iomem *base)
etmdrvdata[drvdata->cpu] = drvdata;
- dev_info(&drvdata->csdev->dev, "CPU%d: ETM v%d.%d initialized\n",
drvdata->cpu, ETM_ARCH_MAJOR_VERSION(drvdata->arch),
ETM_ARCH_MINOR_VERSION(drvdata->arch));
dev_info(&drvdata->csdev->dev, "CPU%d: %s v%d.%d initialized\n",
drvdata->cpu, type_name, major, minor);
if (boot_enable) { coresight_enable(drvdata->csdev);
@@ -1892,6 +1920,7 @@ static struct amba_driver etm4x_amba_driver = {
static const struct of_device_id etm_sysreg_match[] = { { .compatible = "arm,coresight-etm-sysreg" },
- { .compatible = "arm,ete" }, {}
};
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h index 00c0367..05fd0e5 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.h +++ b/drivers/hwtracing/coresight/coresight-etm4x.h @@ -127,6 +127,8 @@ #define TRCCIDR2 0xFF8 #define TRCCIDR3 0xFFC
+#define TRCRSR_TA BIT(12)
/*
- System instructions to access ETM registers.
- See ETMv4.4 spec ARM IHI0064F section 4.3.6 System instructions
@@ -570,11 +572,14 @@ ((ETM_DEVARCH_MAKE_ARCHID_ARCH_VER(major)) | ETM_DEVARCH_ARCHID_ARCH_PART(0xA13))
#define ETM_DEVARCH_ARCHID_ETMv4x ETM_DEVARCH_MAKE_ARCHID(0x4) +#define ETM_DEVARCH_ARCHID_ETE ETM_DEVARCH_MAKE_ARCHID(0x5)
#define ETM_DEVARCH_ID_MASK \ (ETM_DEVARCH_ARCHITECT_MASK | ETM_DEVARCH_ARCHID_MASK | ETM_DEVARCH_PRESENT) #define ETM_DEVARCH_ETMv4x_ARCH \ (ETM_DEVARCH_ARCHITECT_ARM | ETM_DEVARCH_ARCHID_ETMv4x | ETM_DEVARCH_PRESENT) +#define ETM_DEVARCH_ETE_ARCH \
- (ETM_DEVARCH_ARCHITECT_ARM | ETM_DEVARCH_ARCHID_ETE | ETM_DEVARCH_PRESENT)
#define TRCSTATR_IDLE_BIT 0 #define TRCSTATR_PMSTABLE_BIT 1 @@ -661,6 +666,8 @@ #define ETM_ARCH_MINOR_VERSION(arch) ((arch) & 0xfU)
#define ETM_ARCH_V4 ETM_ARCH_VERSION(4, 0) +#define ETM_ARCH_ETE ETM_ARCH_VERSION(5, 0)
/* Interpretation of resource numbers change at ETM v4.3 architecture */ #define ETM_ARCH_V4_3 ETM_ARCH_VERSION(4, 3)
-- 2.7.4
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
Hi Tingwei,
On 11/14/20 5:36 AM, Tingwei Zhang wrote:
Hi Anshuman,
On Tue, Nov 10, 2020 at 08:45:04PM +0800, Anshuman Khandual wrote:
From: Suzuki K Poulose suzuki.poulose@arm.com
Add ETE as one of the supported device types we support with ETM4x driver. The devices are named following the existing convention as ete<N>.
ETE mandates that the trace resource status register is programmed before the tracing is turned on. For the moment simply write to it indicating TraceActive.
Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com
@@ -1742,6 +1758,19 @@ static int etm4_probe(struct device *dev, void __iomem *base) if (!desc.access.io_mem || fwnode_property_present(dev_fwnode(dev), "qcom,skip-power-up")) drvdata->skip_power_up = true;
- major = ETM_ARCH_MAJOR_VERSION(drvdata->arch);
- minor = ETM_ARCH_MINOR_VERSION(drvdata->arch);
- if (drvdata->arch >= ETM_ARCH_ETE) {
type_name = "ete";
major -= 4;
- } else {
type_name = "etm";
- }
When trace unit supports ETE, could it be still compatible with ETMv4.4? Can use selectively use it as ETM instead of ETE?
No. Even though most of the register sets are compatible, there are additional restrictions and some new rules for the ETE. So, when you treat the ETE as an ETMv4.4, you could be treading into "UNPREDICTABLE" behaviors.
Suzuki
Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is accessible via the system registers. The TRBE supports different addressing modes including CPU virtual address and buffer modes including the circular buffer mode. The TRBE buffer is addressed by a base pointer (TRBBASER_EL1), an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But the access to the trace buffer could be prohibited by a higher exception level (EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU private interrupt (PPI) on address translation errors and when the buffer is full. Overall implementation here is inspired from the Arm SPE driver.
Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com --- Documentation/trace/coresight/coresight-trbe.rst | 36 ++ arch/arm64/include/asm/sysreg.h | 2 + drivers/hwtracing/coresight/Kconfig | 11 + drivers/hwtracing/coresight/Makefile | 1 + drivers/hwtracing/coresight/coresight-trbe.c | 766 +++++++++++++++++++++++ drivers/hwtracing/coresight/coresight-trbe.h | 525 ++++++++++++++++ 6 files changed, 1341 insertions(+) create mode 100644 Documentation/trace/coresight/coresight-trbe.rst create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
diff --git a/Documentation/trace/coresight/coresight-trbe.rst b/Documentation/trace/coresight/coresight-trbe.rst new file mode 100644 index 0000000..4320a8b --- /dev/null +++ b/Documentation/trace/coresight/coresight-trbe.rst @@ -0,0 +1,36 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================== +Trace Buffer Extension (TRBE). +============================== + + :Author: Anshuman Khandual anshuman.khandual@arm.com + :Date: November 2020 + +Hardware Description +-------------------- + +Trace Buffer Extension (TRBE) is a percpu hardware which captures in system +memory, CPU traces generated from a corresponding percpu tracing unit. This +gets plugged in as a coresight sink device because the corresponding trace +genarators (ETE), are plugged in as source device. + +Sysfs files and directories +--------------------------- + +The TRBE devices appear on the existing coresight bus alongside the other +coresight devices:: + + >$ ls /sys/bus/coresight/devices + trbe0 trbe1 trbe2 trbe3 + +The ``trbe<N>`` named TRBEs are associated with a CPU.:: + + >$ ls /sys/bus/coresight/devices/trbe0/ + irq align dbm + +*Key file items are:-* + * ``irq``: TRBE maintenance interrupt number + * ``align``: TRBE write pointer alignment + * ``dbm``: TRBE updates memory with access and dirty flags + diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 14cb156..61136f6 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -97,6 +97,7 @@ #define SET_PSTATE_UAO(x) __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift)) #define SET_PSTATE_SSBS(x) __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift)) #define SET_PSTATE_TCO(x) __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift)) +#define TSB_CSYNC __emit_inst(0xd503225f)
#define __SYS_BARRIER_INSN(CRm, op2, Rt) \ __emit_inst(0xd5000000 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & 0x1f)) @@ -865,6 +866,7 @@ #define ID_AA64MMFR2_CNP_SHIFT 0
/* id_aa64dfr0 */ +#define ID_AA64DFR0_TRBE_SHIFT 44 #define ID_AA64DFR0_TRACE_FILT_SHIFT 40 #define ID_AA64DFR0_DOUBLELOCK_SHIFT 36 #define ID_AA64DFR0_PMSVER_SHIFT 32 diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig index c119824..0f5e101 100644 --- a/drivers/hwtracing/coresight/Kconfig +++ b/drivers/hwtracing/coresight/Kconfig @@ -156,6 +156,17 @@ config CORESIGHT_CTI To compile this driver as a module, choose M here: the module will be called coresight-cti.
+config CORESIGHT_TRBE + bool "Trace Buffer Extension (TRBE) driver" + depends on ARM64 + help + This driver provides support for percpu Trace Buffer Extension (TRBE). + TRBE always needs to be used along with it's corresponding percpu ETE + component. ETE generates trace data which is then captured with TRBE. + Unlike traditional sink devices, TRBE is a CPU feature accessible via + system registers. But it's explicit dependency with trace unit (ETE) + requires it to be plugged in as a coresight sink device. + config CORESIGHT_CTI_INTEGRATION_REGS bool "Access CTI CoreSight Integration Registers" depends on CORESIGHT_CTI diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile index f20e357..d608165 100644 --- a/drivers/hwtracing/coresight/Makefile +++ b/drivers/hwtracing/coresight/Makefile @@ -21,5 +21,6 @@ obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o obj-$(CONFIG_CORESIGHT_CATU) += coresight-catu.o obj-$(CONFIG_CORESIGHT_CTI) += coresight-cti.o +obj-$(CONFIG_CORESIGHT_TRBE) += coresight-trbe.o coresight-cti-y := coresight-cti-core.o coresight-cti-platform.o \ coresight-cti-sysfs.o diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c new file mode 100644 index 0000000..48a8ec3 --- /dev/null +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -0,0 +1,766 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * This driver enables Trace Buffer Extension (TRBE) as a per-cpu coresight + * sink device could then pair with an appropriate per-cpu coresight source + * device (ETE) thus generating required trace data. Trace can be enabled + * via the perf framework. + * + * Copyright (C) 2020 ARM Ltd. + * + * Author: Anshuman Khandual anshuman.khandual@arm.com + */ +#define DRVNAME "arm_trbe" + +#define pr_fmt(fmt) DRVNAME ": " fmt + +#include "coresight-trbe.h" + +#define PERF_IDX2OFF(idx, buf) ((idx) % ((buf)->nr_pages << PAGE_SHIFT)) + +#define ETE_IGNORE_PACKET 0x70 + +static const char trbe_name[] = "trbe"; + +enum trbe_fault_action { + TRBE_FAULT_ACT_WRAP, + TRBE_FAULT_ACT_SPURIOUS, + TRBE_FAULT_ACT_FATAL, +}; + +struct trbe_perf { + unsigned long trbe_base; + unsigned long trbe_limit; + unsigned long trbe_write; + pid_t pid; + int nr_pages; + void **pages; + bool snapshot; + struct trbe_cpudata *cpudata; +}; + +struct trbe_cpudata { + struct coresight_device *csdev; + bool trbe_dbm; + u64 trbe_align; + int cpu; + enum cs_mode mode; + struct trbe_perf *perf; + struct trbe_drvdata *drvdata; +}; + +struct trbe_drvdata { + struct trbe_cpudata __percpu *cpudata; + struct perf_output_handle __percpu *handle; + struct hlist_node hotplug_node; + int irq; + cpumask_t supported_cpus; + enum cpuhp_state trbe_online; + struct platform_device *pdev; + struct clk *atclk; +}; + +static int trbe_alloc_node(struct perf_event *event) +{ + if (event->cpu == -1) + return NUMA_NO_NODE; + return cpu_to_node(event->cpu); +} + +static void trbe_disable_and_drain_local(void) +{ + write_sysreg_s(0, SYS_TRBLIMITR_EL1); + isb(); + dsb(nsh); + asm(TSB_CSYNC); +} + +static void trbe_reset_local(void) +{ + trbe_disable_and_drain_local(); + write_sysreg_s(0, SYS_TRBPTR_EL1); + isb(); + + write_sysreg_s(0, SYS_TRBBASER_EL1); + isb(); + + write_sysreg_s(0, SYS_TRBSR_EL1); + isb(); +} + +static void trbe_pad_buf(struct perf_output_handle *handle, int len) +{ + struct trbe_perf *perf = etm_perf_sink_config(handle); + u64 head = PERF_IDX2OFF(handle->head, perf); + + memset((void *) perf->trbe_base + head, ETE_IGNORE_PACKET, len); + if (!perf->snapshot) + perf_aux_output_skip(handle, len); +} + +static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle) +{ + struct trbe_perf *perf = etm_perf_sink_config(handle); + u64 head = PERF_IDX2OFF(handle->head, perf); + u64 limit = perf->nr_pages * PAGE_SIZE; + + if (head < limit >> 1) + limit >>= 1; + + return limit; +} + +static unsigned long trbe_normal_offset(struct perf_output_handle *handle) +{ + struct trbe_perf *perf = etm_perf_sink_config(handle); + struct trbe_cpudata *cpudata = perf->cpudata; + const u64 bufsize = perf->nr_pages * PAGE_SIZE; + u64 limit = bufsize; + u64 head, tail, wakeup; + + head = PERF_IDX2OFF(handle->head, perf); + if (!IS_ALIGNED(head, cpudata->trbe_align)) { + unsigned long delta = roundup(head, cpudata->trbe_align) - head; + + delta = min(delta, handle->size); + trbe_pad_buf(handle, delta); + head = PERF_IDX2OFF(handle->head, perf); + } + + if (!handle->size) { + perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED); + return 0; + } + + tail = PERF_IDX2OFF(handle->head + handle->size, perf); + wakeup = PERF_IDX2OFF(handle->wakeup, perf); + + if (head < tail) + limit = round_down(tail, PAGE_SIZE); + + if (handle->wakeup < (handle->head + handle->size) && head <= wakeup) + limit = min(limit, round_up(wakeup, PAGE_SIZE)); + + if (limit > head) + return limit; + + trbe_pad_buf(handle, handle->size); + perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED); + return 0; +} + +static unsigned long get_trbe_limit(struct perf_output_handle *handle) +{ + struct trbe_perf *perf = etm_perf_sink_config(handle); + unsigned long offset; + + if (perf->snapshot) + offset = trbe_snapshot_offset(handle); + else + offset = trbe_normal_offset(handle); + return perf->trbe_base + offset; +} + +static void trbe_enable_hw(struct trbe_perf *perf) +{ + WARN_ON(perf->trbe_write < perf->trbe_base); + WARN_ON(perf->trbe_write >= perf->trbe_limit); + set_trbe_disabled(); + clr_trbe_irq(); + clr_trbe_wrap(); + clr_trbe_abort(); + clr_trbe_ec(); + clr_trbe_bsc(); + clr_trbe_fsc(); + set_trbe_virtual_mode(); + set_trbe_fill_mode(TRBE_FILL_STOP); + set_trbe_trig_mode(TRBE_TRIGGER_IGNORE); + isb(); + set_trbe_base_pointer(perf->trbe_base); + set_trbe_limit_pointer(perf->trbe_limit); + set_trbe_write_pointer(perf->trbe_write); + isb(); + dsb(ishst); + flush_tlb_all(); + set_trbe_running(); + set_trbe_enabled(); + asm(TSB_CSYNC); +} + +static void *arm_trbe_alloc_buffer(struct coresight_device *csdev, + struct perf_event *event, void **pages, + int nr_pages, bool snapshot) +{ + struct trbe_perf *perf; + struct page **pglist; + int i; + + if ((nr_pages < 2) || (snapshot && (nr_pages & 1))) + return NULL; + + perf = kzalloc_node(sizeof(*perf), GFP_KERNEL, trbe_alloc_node(event)); + if (IS_ERR(perf)) + return ERR_PTR(-ENOMEM); + + pglist = kcalloc(nr_pages, sizeof(*pglist), GFP_KERNEL); + if (IS_ERR(pglist)) { + kfree(perf); + return ERR_PTR(-ENOMEM); + } + + for (i = 0; i < nr_pages; i++) + pglist[i] = virt_to_page(pages[i]); + + perf->trbe_base = (unsigned long) vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL); + if (IS_ERR((void *) perf->trbe_base)) { + kfree(pglist); + kfree(perf); + return ERR_PTR(perf->trbe_base); + } + perf->trbe_limit = perf->trbe_base + nr_pages * PAGE_SIZE; + perf->trbe_write = perf->trbe_base; + perf->pid = task_pid_nr(event->owner); + perf->snapshot = snapshot; + perf->nr_pages = nr_pages; + perf->pages = pages; + kfree(pglist); + return perf; +} + +void arm_trbe_free_buffer(void *config) +{ + struct trbe_perf *perf = config; + + vunmap((void *) perf->trbe_base); + kfree(perf); +} + +static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev, + struct perf_output_handle *handle, + void *config) +{ + struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); + struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev); + struct trbe_perf *perf = config; + unsigned long size, offset; + + WARN_ON(perf->cpudata != cpudata); + WARN_ON(cpudata->cpu != smp_processor_id()); + WARN_ON(cpudata->mode != CS_MODE_PERF); + WARN_ON(cpudata->drvdata != drvdata); + + offset = get_trbe_write_pointer() - get_trbe_base_pointer(); + size = offset - PERF_IDX2OFF(handle->head, perf); + if (perf->snapshot) + handle->head += size; + return size; +} + +static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data) +{ + struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); + struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev); + struct perf_output_handle *handle = data; + struct trbe_perf *perf = etm_perf_sink_config(handle); + + WARN_ON(cpudata->cpu != smp_processor_id()); + WARN_ON(mode != CS_MODE_PERF); + WARN_ON(cpudata->drvdata != drvdata); + + *this_cpu_ptr(drvdata->handle) = *handle; + cpudata->perf = perf; + cpudata->mode = mode; + perf->cpudata = cpudata; + perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf); + perf->trbe_limit = get_trbe_limit(handle); + if (perf->trbe_limit == perf->trbe_base) { + trbe_disable_and_drain_local(); + return 0; + } + trbe_enable_hw(perf); + return 0; +} + +static int arm_trbe_disable(struct coresight_device *csdev) +{ + struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); + struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev); + struct trbe_perf *perf = cpudata->perf; + + WARN_ON(perf->cpudata != cpudata); + WARN_ON(cpudata->cpu != smp_processor_id()); + WARN_ON(cpudata->mode != CS_MODE_PERF); + WARN_ON(cpudata->drvdata != drvdata); + + trbe_disable_and_drain_local(); + perf->cpudata = NULL; + cpudata->perf = NULL; + cpudata->mode = CS_MODE_DISABLED; + return 0; +} + +static void trbe_handle_fatal(struct perf_output_handle *handle) +{ + perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED); + perf_aux_output_end(handle, 0); + trbe_disable_and_drain_local(); +} + +static void trbe_handle_spurious(struct perf_output_handle *handle) +{ + struct trbe_perf *perf = etm_perf_sink_config(handle); + + perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf); + perf->trbe_limit = get_trbe_limit(handle); + if (perf->trbe_limit == perf->trbe_base) { + trbe_disable_and_drain_local(); + return; + } + trbe_enable_hw(perf); +} + +static void trbe_handle_overflow(struct perf_output_handle *handle) +{ + struct perf_event *event = handle->event; + struct trbe_perf *perf = etm_perf_sink_config(handle); + unsigned long offset, size; + struct etm_event_data *event_data; + + offset = get_trbe_limit_pointer() - get_trbe_base_pointer(); + size = offset - PERF_IDX2OFF(handle->head, perf); + if (perf->snapshot) + handle->head = offset; + perf_aux_output_end(handle, size); + + event_data = perf_aux_output_begin(handle, event); + if (!event_data) { + event->hw.state |= PERF_HES_STOPPED; + trbe_disable_and_drain_local(); + return; + } + perf->trbe_write = perf->trbe_base; + perf->trbe_limit = get_trbe_limit(handle); + if (perf->trbe_limit == perf->trbe_base) { + trbe_disable_and_drain_local(); + return; + } + *this_cpu_ptr(perf->cpudata->drvdata->handle) = *handle; + trbe_enable_hw(perf); +} + +static bool is_perf_trbe(struct perf_output_handle *handle) +{ + struct trbe_perf *perf = etm_perf_sink_config(handle); + struct trbe_cpudata *cpudata = perf->cpudata; + struct trbe_drvdata *drvdata = cpudata->drvdata; + int cpu = smp_processor_id(); + + WARN_ON(perf->trbe_base != get_trbe_base_pointer()); + WARN_ON(perf->trbe_limit != get_trbe_limit_pointer()); + + if (cpudata->mode != CS_MODE_PERF) + return false; + + if (cpudata->cpu != cpu) + return false; + + if (!cpumask_test_cpu(cpu, &drvdata->supported_cpus)) + return false; + + return true; +} + +static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle) +{ + enum trbe_ec ec = get_trbe_ec(); + enum trbe_bsc bsc = get_trbe_bsc(); + + WARN_ON(is_trbe_running()); + asm(TSB_CSYNC); + dsb(nsh); + isb(); + + if (is_trbe_trg() || is_trbe_abort()) + return TRBE_FAULT_ACT_FATAL; + + if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT)) + return TRBE_FAULT_ACT_FATAL; + + if (is_trbe_wrap() && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) { + if (get_trbe_write_pointer() == get_trbe_base_pointer()) + return TRBE_FAULT_ACT_WRAP; + } + return TRBE_FAULT_ACT_SPURIOUS; +} + +static irqreturn_t arm_trbe_irq_handler(int irq, void *dev) +{ + struct perf_output_handle *handle = dev; + enum trbe_fault_action act; + + WARN_ON(!is_trbe_irq()); + clr_trbe_irq(); + + if (!perf_get_aux(handle)) + return IRQ_NONE; + + if (!is_perf_trbe(handle)) + return IRQ_NONE; + + irq_work_run(); + + act = trbe_get_fault_act(handle); + switch (act) { + case TRBE_FAULT_ACT_WRAP: + trbe_handle_overflow(handle); + break; + case TRBE_FAULT_ACT_SPURIOUS: + trbe_handle_spurious(handle); + break; + case TRBE_FAULT_ACT_FATAL: + trbe_handle_fatal(handle); + break; + } + return IRQ_HANDLED; +} + +static const struct coresight_ops_sink arm_trbe_sink_ops = { + .enable = arm_trbe_enable, + .disable = arm_trbe_disable, + .alloc_buffer = arm_trbe_alloc_buffer, + .free_buffer = arm_trbe_free_buffer, + .update_buffer = arm_trbe_update_buffer, +}; + +static const struct coresight_ops arm_trbe_cs_ops = { + .sink_ops = &arm_trbe_sink_ops, +}; + +static ssize_t irq_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct trbe_drvdata *drvdata = dev_get_drvdata(dev->parent); + + return sprintf(buf, "%d\n", drvdata->irq); +} +static DEVICE_ATTR_RO(irq); + +static ssize_t align_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct trbe_cpudata *cpudata = dev_get_drvdata(dev); + + return sprintf(buf, "%s\n", trbe_buffer_align_str[ilog2(cpudata->trbe_align)]); +} +static DEVICE_ATTR_RO(align); + +static ssize_t dbm_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct trbe_cpudata *cpudata = dev_get_drvdata(dev); + + return sprintf(buf, "%d\n", cpudata->trbe_dbm); +} +static DEVICE_ATTR_RO(dbm); + +static struct attribute *arm_trbe_attrs[] = { + &dev_attr_align.attr, + &dev_attr_irq.attr, + &dev_attr_dbm.attr, + NULL, +}; + +static const struct attribute_group arm_trbe_group = { + .attrs = arm_trbe_attrs, +}; + +static const struct attribute_group *arm_trbe_groups[] = { + &arm_trbe_group, + NULL, +}; + +static void arm_trbe_probe_coresight_cpu(void *info) +{ + struct trbe_cpudata *cpudata = info; + struct device *dev = &cpudata->drvdata->pdev->dev; + struct coresight_desc desc = { 0 }; + + if (WARN_ON(!cpudata)) + goto cpu_clear; + + if (!is_trbe_available()) { + pr_err("TRBE is not implemented on cpu %d\n", cpudata->cpu); + goto cpu_clear; + } + + if (!is_trbe_programmable()) { + pr_err("TRBE is owned in higher exception level on cpu %d\n", cpudata->cpu); + goto cpu_clear; + } + desc.name = devm_kasprintf(dev, GFP_KERNEL, "%s%d", trbe_name, smp_processor_id()); + if (IS_ERR(desc.name)) + goto cpu_clear; + + desc.type = CORESIGHT_DEV_TYPE_SINK; + desc.subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_SYSMEM; + desc.ops = &arm_trbe_cs_ops; + desc.pdata = dev_get_platdata(dev); + desc.groups = arm_trbe_groups; + desc.dev = dev; + cpudata->csdev = coresight_register(&desc); + if (IS_ERR(cpudata->csdev)) + goto cpu_clear; + + dev_set_drvdata(&cpudata->csdev->dev, cpudata); + cpudata->trbe_dbm = get_trbe_flag_update(); + cpudata->trbe_align = 1ULL << get_trbe_address_align(); + if (cpudata->trbe_align > SZ_2K) { + pr_err("Unsupported alignment on cpu %d\n", cpudata->cpu); + goto cpu_clear; + } + return; +cpu_clear: + cpumask_clear_cpu(cpudata->cpu, &cpudata->drvdata->supported_cpus); +} + +static int arm_trbe_probe_coresight(struct trbe_drvdata *drvdata) +{ + struct trbe_cpudata *cpudata; + int cpu; + + drvdata->cpudata = alloc_percpu(typeof(*drvdata->cpudata)); + if (IS_ERR(drvdata->cpudata)) + return PTR_ERR(drvdata->cpudata); + + for_each_cpu(cpu, &drvdata->supported_cpus) { + cpudata = per_cpu_ptr(drvdata->cpudata, cpu); + cpudata->cpu = cpu; + cpudata->drvdata = drvdata; + smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1); + } + return 0; +} + +static void arm_trbe_remove_coresight_cpu(void *info) +{ + struct trbe_drvdata *drvdata = info; + + disable_percpu_irq(drvdata->irq); +} + +static int arm_trbe_remove_coresight(struct trbe_drvdata *drvdata) +{ + struct trbe_cpudata *cpudata; + int cpu; + + for_each_cpu(cpu, &drvdata->supported_cpus) { + smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu, drvdata, 1); + cpudata = per_cpu_ptr(drvdata->cpudata, cpu); + if (cpudata->csdev) { + coresight_unregister(cpudata->csdev); + cpudata->drvdata = NULL; + cpudata->csdev = NULL; + } + } + free_percpu(drvdata->cpudata); + return 0; +} + +static int arm_trbe_cpu_startup(unsigned int cpu, struct hlist_node *node) +{ + struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node); + struct trbe_cpudata *cpudata; + + if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) { + cpudata = per_cpu_ptr(drvdata->cpudata, cpu); + if (!cpudata->csdev) { + cpudata->drvdata = drvdata; + smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1); + } + trbe_reset_local(); + enable_percpu_irq(drvdata->irq, IRQ_TYPE_NONE); + } + return 0; +} + +static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node) +{ + struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node); + struct trbe_cpudata *cpudata; + + if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) { + cpudata = per_cpu_ptr(drvdata->cpudata, cpu); + if (cpudata->csdev) { + coresight_unregister(cpudata->csdev); + cpudata->drvdata = NULL; + cpudata->csdev = NULL; + } + disable_percpu_irq(drvdata->irq); + trbe_reset_local(); + } + return 0; +} + +static int arm_trbe_probe_cpuhp(struct trbe_drvdata *drvdata) +{ + enum cpuhp_state trbe_online; + + trbe_online = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, DRVNAME, + arm_trbe_cpu_startup, arm_trbe_cpu_teardown); + if (trbe_online < 0) + return -EINVAL; + + if (cpuhp_state_add_instance(trbe_online, &drvdata->hotplug_node)) + return -EINVAL; + + drvdata->trbe_online = trbe_online; + return 0; +} + +static void arm_trbe_remove_cpuhp(struct trbe_drvdata *drvdata) +{ + cpuhp_remove_multi_state(drvdata->trbe_online); +} + +static int arm_trbe_probe_irq(struct platform_device *pdev, + struct trbe_drvdata *drvdata) +{ + drvdata->irq = platform_get_irq(pdev, 0); + if (!drvdata->irq) { + pr_err("IRQ not found for the platform device\n"); + return -ENXIO; + } + + if (!irq_is_percpu(drvdata->irq)) { + pr_err("IRQ is not a PPI\n"); + return -EINVAL; + } + + if (irq_get_percpu_devid_partition(drvdata->irq, &drvdata->supported_cpus)) + return -EINVAL; + + drvdata->handle = alloc_percpu(typeof(*drvdata->handle)); + if (!drvdata->handle) + return -ENOMEM; + + if (request_percpu_irq(drvdata->irq, arm_trbe_irq_handler, DRVNAME, drvdata->handle)) { + free_percpu(drvdata->handle); + return -EINVAL; + } + return 0; +} + +static void arm_trbe_remove_irq(struct trbe_drvdata *drvdata) +{ + free_percpu_irq(drvdata->irq, drvdata->handle); + free_percpu(drvdata->handle); +} + +static int arm_trbe_device_probe(struct platform_device *pdev) +{ + struct coresight_platform_data *pdata; + struct trbe_drvdata *drvdata; + struct device *dev = &pdev->dev; + int ret; + + drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL); + if (IS_ERR(drvdata)) + return -ENOMEM; + + pdata = coresight_get_platform_data(dev); + if (IS_ERR(pdata)) { + kfree(drvdata); + return -ENOMEM; + } + + drvdata->atclk = devm_clk_get(dev, "atclk"); + if (!IS_ERR(drvdata->atclk)) { + ret = clk_prepare_enable(drvdata->atclk); + if (ret) + return ret; + } + dev_set_drvdata(dev, drvdata); + dev->platform_data = pdata; + drvdata->pdev = pdev; + ret = arm_trbe_probe_irq(pdev, drvdata); + if (ret) + goto irq_failed; + + ret = arm_trbe_probe_coresight(drvdata); + if (ret) + goto probe_failed; + + ret = arm_trbe_probe_cpuhp(drvdata); + if (ret) + goto cpuhp_failed; + + return 0; +cpuhp_failed: + arm_trbe_remove_coresight(drvdata); +probe_failed: + arm_trbe_remove_irq(drvdata); +irq_failed: + kfree(pdata); + kfree(drvdata); + return ret; +} + +static int arm_trbe_device_remove(struct platform_device *pdev) +{ + struct coresight_platform_data *pdata = dev_get_platdata(&pdev->dev); + struct trbe_drvdata *drvdata = platform_get_drvdata(pdev); + + arm_trbe_remove_coresight(drvdata); + arm_trbe_remove_cpuhp(drvdata); + arm_trbe_remove_irq(drvdata); + kfree(pdata); + kfree(drvdata); + return 0; +} + +#ifdef CONFIG_PM +static int arm_trbe_runtime_suspend(struct device *dev) +{ + struct trbe_drvdata *drvdata = dev_get_drvdata(dev); + + if (drvdata && !IS_ERR(drvdata->atclk)) + clk_disable_unprepare(drvdata->atclk); + + return 0; +} + +static int arm_trbe_runtime_resume(struct device *dev) +{ + struct trbe_drvdata *drvdata = dev_get_drvdata(dev); + + if (drvdata && !IS_ERR(drvdata->atclk)) + clk_prepare_enable(drvdata->atclk); + + return 0; +} +#endif + +static const struct dev_pm_ops arm_trbe_dev_pm_ops = { + SET_RUNTIME_PM_OPS(arm_trbe_runtime_suspend, arm_trbe_runtime_resume, NULL) +}; + +static const struct of_device_id arm_trbe_of_match[] = { + { .compatible = "arm,arm-trbe", .data = (void *)1 }, + {}, +}; +MODULE_DEVICE_TABLE(of, arm_trbe_of_match); + +static const struct platform_device_id arm_trbe_match[] = { + { "arm,trbe", 0}, + { } +}; +MODULE_DEVICE_TABLE(platform, arm_trbe_match); + +static struct platform_driver arm_trbe_driver = { + .id_table = arm_trbe_match, + .driver = { + .name = DRVNAME, + .of_match_table = of_match_ptr(arm_trbe_of_match), + .pm = &arm_trbe_dev_pm_ops, + .suppress_bind_attrs = true, + }, + .probe = arm_trbe_device_probe, + .remove = arm_trbe_device_remove, +}; +builtin_platform_driver(arm_trbe_driver) diff --git a/drivers/hwtracing/coresight/coresight-trbe.h b/drivers/hwtracing/coresight/coresight-trbe.h new file mode 100644 index 0000000..82ffbfc --- /dev/null +++ b/drivers/hwtracing/coresight/coresight-trbe.h @@ -0,0 +1,525 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * This contains all required hardware related helper functions for + * Trace Buffer Extension (TRBE) driver in the coresight framework. + * + * Copyright (C) 2020 ARM Ltd. + * + * Author: Anshuman Khandual anshuman.khandual@arm.com + */ +#include <linux/coresight.h> +#include <linux/device.h> +#include <linux/irq.h> +#include <linux/kernel.h> +#include <linux/of.h> +#include <linux/platform_device.h> +#include <linux/smp.h> + +#include "coresight-etm-perf.h" + +static inline bool is_trbe_available(void) +{ + u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1); + int trbe = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_TRBE_SHIFT); + + return trbe >= 0b0001; +} + +static inline bool is_ete_available(void) +{ + u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1); + int tracever = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_TRACEVER_SHIFT); + + return (tracever != 0b0000); +} + +static inline bool is_trbe_enabled(void) +{ + u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1); + + return trblimitr & TRBLIMITR_ENABLE; +} + +enum trbe_ec { + TRBE_EC_OTHERS = 0, + TRBE_EC_STAGE1_ABORT = 36, + TRBE_EC_STAGE2_ABORT = 37, +}; + +static const char *const trbe_ec_str[] = { + [TRBE_EC_OTHERS] = "Maintenance exception", + [TRBE_EC_STAGE1_ABORT] = "Stage-1 exception", + [TRBE_EC_STAGE2_ABORT] = "Stage-2 exception", +}; + +static inline enum trbe_ec get_trbe_ec(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + return (trbsr >> TRBSR_EC_SHIFT) & TRBSR_EC_MASK; +} + +static inline void clr_trbe_ec(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + trbsr &= ~(TRBSR_EC_MASK << TRBSR_EC_SHIFT); + write_sysreg_s(trbsr, SYS_TRBSR_EL1); +} + +enum trbe_bsc { + TRBE_BSC_NOT_STOPPED = 0, + TRBE_BSC_FILLED = 1, + TRBE_BSC_TRIGGERED = 2, +}; + +static const char *const trbe_bsc_str[] = { + [TRBE_BSC_NOT_STOPPED] = "TRBE collection not stopped", + [TRBE_BSC_FILLED] = "TRBE filled", + [TRBE_BSC_TRIGGERED] = "TRBE triggered", +}; + +static inline enum trbe_bsc get_trbe_bsc(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + return (trbsr >> TRBSR_BSC_SHIFT) & TRBSR_BSC_MASK; +} + +static inline void clr_trbe_bsc(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + trbsr &= ~(TRBSR_BSC_MASK << TRBSR_BSC_SHIFT); + write_sysreg_s(trbsr, SYS_TRBSR_EL1); +} + +enum trbe_fsc { + TRBE_FSC_ASF_LEVEL0 = 0, + TRBE_FSC_ASF_LEVEL1 = 1, + TRBE_FSC_ASF_LEVEL2 = 2, + TRBE_FSC_ASF_LEVEL3 = 3, + TRBE_FSC_TF_LEVEL0 = 4, + TRBE_FSC_TF_LEVEL1 = 5, + TRBE_FSC_TF_LEVEL2 = 6, + TRBE_FSC_TF_LEVEL3 = 7, + TRBE_FSC_AFF_LEVEL0 = 8, + TRBE_FSC_AFF_LEVEL1 = 9, + TRBE_FSC_AFF_LEVEL2 = 10, + TRBE_FSC_AFF_LEVEL3 = 11, + TRBE_FSC_PF_LEVEL0 = 12, + TRBE_FSC_PF_LEVEL1 = 13, + TRBE_FSC_PF_LEVEL2 = 14, + TRBE_FSC_PF_LEVEL3 = 15, + TRBE_FSC_SEA_WRITE = 16, + TRBE_FSC_ASEA_WRITE = 17, + TRBE_FSC_SEA_LEVEL0 = 20, + TRBE_FSC_SEA_LEVEL1 = 21, + TRBE_FSC_SEA_LEVEL2 = 22, + TRBE_FSC_SEA_LEVEL3 = 23, + TRBE_FSC_ALIGN_FAULT = 33, + TRBE_FSC_TLB_FAULT = 48, + TRBE_FSC_ATOMIC_FAULT = 49, +}; + +static const char *const trbe_fsc_str[] = { + [TRBE_FSC_ASF_LEVEL0] = "Address size fault - level 0", + [TRBE_FSC_ASF_LEVEL1] = "Address size fault - level 1", + [TRBE_FSC_ASF_LEVEL2] = "Address size fault - level 2", + [TRBE_FSC_ASF_LEVEL3] = "Address size fault - level 3", + [TRBE_FSC_TF_LEVEL0] = "Translation fault - level 0", + [TRBE_FSC_TF_LEVEL1] = "Translation fault - level 1", + [TRBE_FSC_TF_LEVEL2] = "Translation fault - level 2", + [TRBE_FSC_TF_LEVEL3] = "Translation fault - level 3", + [TRBE_FSC_AFF_LEVEL0] = "Access flag fault - level 0", + [TRBE_FSC_AFF_LEVEL1] = "Access flag fault - level 1", + [TRBE_FSC_AFF_LEVEL2] = "Access flag fault - level 2", + [TRBE_FSC_AFF_LEVEL3] = "Access flag fault - level 3", + [TRBE_FSC_PF_LEVEL0] = "Permission fault - level 0", + [TRBE_FSC_PF_LEVEL1] = "Permission fault - level 1", + [TRBE_FSC_PF_LEVEL2] = "Permission fault - level 2", + [TRBE_FSC_PF_LEVEL3] = "Permission fault - level 3", + [TRBE_FSC_SEA_WRITE] = "Synchronous external abort on write", + [TRBE_FSC_ASEA_WRITE] = "Asynchronous external abort on write", + [TRBE_FSC_SEA_LEVEL0] = "Syncrhonous external abort on table walk - level 0", + [TRBE_FSC_SEA_LEVEL1] = "Syncrhonous external abort on table walk - level 1", + [TRBE_FSC_SEA_LEVEL2] = "Syncrhonous external abort on table walk - level 2", + [TRBE_FSC_SEA_LEVEL3] = "Syncrhonous external abort on table walk - level 3", + [TRBE_FSC_ALIGN_FAULT] = "Alignment fault", + [TRBE_FSC_TLB_FAULT] = "TLB conflict fault", + [TRBE_FSC_ATOMIC_FAULT] = "Atmoc fault", +}; + +static inline enum trbe_fsc get_trbe_fsc(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + return (trbsr >> TRBSR_FSC_SHIFT) & TRBSR_FSC_MASK; +} + +static inline void clr_trbe_fsc(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + trbsr &= ~(TRBSR_FSC_MASK << TRBSR_FSC_SHIFT); + write_sysreg_s(trbsr, SYS_TRBSR_EL1); +} + +static inline void set_trbe_irq(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + WARN_ON(is_trbe_enabled()); + trbsr |= TRBSR_IRQ; + write_sysreg_s(trbsr, SYS_TRBSR_EL1); +} + +static inline void clr_trbe_irq(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + trbsr &= ~TRBSR_IRQ; + write_sysreg_s(trbsr, SYS_TRBSR_EL1); +} + +static inline void set_trbe_trg(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + WARN_ON(is_trbe_enabled()); + trbsr |= TRBSR_TRG; + write_sysreg_s(trbsr, SYS_TRBSR_EL1); +} + +static inline void clr_trbe_trg(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + WARN_ON(is_trbe_enabled()); + trbsr &= ~TRBSR_TRG; + write_sysreg_s(trbsr, SYS_TRBSR_EL1); +} + +static inline void set_trbe_wrap(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + WARN_ON(is_trbe_enabled()); + trbsr |= TRBSR_WRAP; + write_sysreg_s(trbsr, SYS_TRBSR_EL1); +} + +static inline void clr_trbe_wrap(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + WARN_ON(is_trbe_enabled()); + trbsr &= ~TRBSR_WRAP; + write_sysreg_s(trbsr, SYS_TRBSR_EL1); +} + +static inline void set_trbe_abort(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + WARN_ON(is_trbe_enabled()); + trbsr |= TRBSR_ABORT; + write_sysreg_s(trbsr, SYS_TRBSR_EL1); +} + +static inline void clr_trbe_abort(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + WARN_ON(is_trbe_enabled()); + trbsr &= ~TRBSR_ABORT; + write_sysreg_s(trbsr, SYS_TRBSR_EL1); +} + +static inline bool is_trbe_irq(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + return trbsr & TRBSR_IRQ; +} + +static inline bool is_trbe_trg(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + return trbsr & TRBSR_TRG; +} + +static inline bool is_trbe_wrap(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + return trbsr & TRBSR_WRAP; +} + +static inline bool is_trbe_abort(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + return trbsr & TRBSR_ABORT; +} + +static inline bool is_trbe_running(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + return !(trbsr & TRBSR_STOP); +} + +static inline void set_trbe_running(void) +{ + u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1); + + trbsr &= ~TRBSR_STOP; + write_sysreg_s(trbsr, SYS_TRBSR_EL1); +} + +enum trbe_address_mode { + TRBE_ADDRESS_VIRTUAL, + TRBE_ADDRESS_PHYSICAL, +}; + +static const char *const trbe_address_mode_str[] = { + [TRBE_ADDRESS_VIRTUAL] = "Address mode - virtual", + [TRBE_ADDRESS_PHYSICAL] = "Address mode - physical", +}; + +static inline bool is_trbe_virtual_mode(void) +{ + u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1); + + return !(trblimitr & TRBLIMITR_NVM); +} + +static inline bool is_trbe_physical_mode(void) +{ + u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1); + + return trblimitr & TRBLIMITR_NVM; +} + +static inline void set_trbe_virtual_mode(void) +{ + u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1); + + trblimitr &= ~TRBLIMITR_NVM; + write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1); +} + +static inline void set_trbe_physical_mode(void) +{ + u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1); + + trblimitr |= TRBLIMITR_NVM; + write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1); +} + +enum trbe_trig_mode { + TRBE_TRIGGER_STOP = 0, + TRBE_TRIGGER_IRQ = 1, + TRBE_TRIGGER_IGNORE = 3, +}; + +static const char *const trbe_trig_mode_str[] = { + [TRBE_TRIGGER_STOP] = "Trigger mode - stop", + [TRBE_TRIGGER_IRQ] = "Trigger mode - irq", + [TRBE_TRIGGER_IGNORE] = "Trigger mode - ignore", +}; + +static inline enum trbe_trig_mode get_trbe_trig_mode(void) +{ + u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1); + + return (trblimitr >> TRBLIMITR_TRIG_MODE_SHIFT) & TRBLIMITR_TRIG_MODE_MASK; +} + +static inline void set_trbe_trig_mode(enum trbe_trig_mode mode) +{ + u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1); + + trblimitr &= ~(TRBLIMITR_TRIG_MODE_MASK << TRBLIMITR_TRIG_MODE_SHIFT); + trblimitr |= ((mode & TRBLIMITR_TRIG_MODE_MASK) << TRBLIMITR_TRIG_MODE_SHIFT); + write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1); +} + +enum trbe_fill_mode { + TRBE_FILL_STOP = 0, + TRBE_FILL_WRAP = 1, + TRBE_FILL_CIRCULAR = 3, +}; + +static const char *const trbe_fill_mode_str[] = { + [TRBE_FILL_STOP] = "Buffer mode - stop", + [TRBE_FILL_WRAP] = "Buffer mode - wrap", + [TRBE_FILL_CIRCULAR] = "Buffer mode - circular", +}; + +static inline enum trbe_fill_mode get_trbe_fill_mode(void) +{ + u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1); + + return (trblimitr >> TRBLIMITR_FILL_MODE_SHIFT) & TRBLIMITR_FILL_MODE_MASK; +} + +static inline void set_trbe_fill_mode(enum trbe_fill_mode mode) +{ + u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1); + + trblimitr &= ~(TRBLIMITR_FILL_MODE_MASK << TRBLIMITR_FILL_MODE_SHIFT); + trblimitr |= ((mode & TRBLIMITR_FILL_MODE_MASK) << TRBLIMITR_FILL_MODE_SHIFT); + write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1); +} + +static inline void set_trbe_disabled(void) +{ + u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1); + + trblimitr &= ~TRBLIMITR_ENABLE; + write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1); +} + +static inline void set_trbe_enabled(void) +{ + u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1); + + trblimitr |= TRBLIMITR_ENABLE; + write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1); +} + +static inline bool get_trbe_flag_update(void) +{ + u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1); + + return trbidr & TRBIDR_FLAG; +} + +static inline bool is_trbe_programmable(void) +{ + u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1); + + return !(trbidr & TRBIDR_PROG); +} + +enum trbe_buffer_align { + TRBE_BUFFER_BYTE, + TRBE_BUFFER_HALF_WORD, + TRBE_BUFFER_WORD, + TRBE_BUFFER_DOUBLE_WORD, + TRBE_BUFFER_16_BYTES, + TRBE_BUFFER_32_BYTES, + TRBE_BUFFER_64_BYTES, + TRBE_BUFFER_128_BYTES, + TRBE_BUFFER_256_BYTES, + TRBE_BUFFER_512_BYTES, + TRBE_BUFFER_1K_BYTES, + TRBE_BUFFER_2K_BYTES, +}; + +static const char *const trbe_buffer_align_str[] = { + [TRBE_BUFFER_BYTE] = "Byte", + [TRBE_BUFFER_HALF_WORD] = "Half word", + [TRBE_BUFFER_WORD] = "Word", + [TRBE_BUFFER_DOUBLE_WORD] = "Double word", + [TRBE_BUFFER_16_BYTES] = "16 bytes", + [TRBE_BUFFER_32_BYTES] = "32 bytes", + [TRBE_BUFFER_64_BYTES] = "64 bytes", + [TRBE_BUFFER_128_BYTES] = "128 bytes", + [TRBE_BUFFER_256_BYTES] = "256 bytes", + [TRBE_BUFFER_512_BYTES] = "512 bytes", + [TRBE_BUFFER_1K_BYTES] = "1K bytes", + [TRBE_BUFFER_2K_BYTES] = "2K bytes", +}; + +static inline enum trbe_buffer_align get_trbe_address_align(void) +{ + u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1); + + return (trbidr >> TRBIDR_ALIGN_SHIFT) & TRBIDR_ALIGN_MASK; +} + +static inline void assert_trbe_address_mode(unsigned long addr) +{ + bool virt_addr = virt_addr_valid(addr) || is_vmalloc_addr((void *)addr); + bool virt_mode = is_trbe_virtual_mode(); + + WARN_ON(addr && ((virt_addr && !virt_mode) || (!virt_addr && virt_mode))); +} + +static inline void assert_trbe_address_align(unsigned long addr) +{ + unsigned long nr_bytes = 1ULL << get_trbe_address_align(); + + WARN_ON(addr & (nr_bytes - 1)); +} + +static inline unsigned long get_trbe_write_pointer(void) +{ + u64 trbptr = read_sysreg_s(SYS_TRBPTR_EL1); + unsigned long addr = (trbptr >> TRBPTR_PTR_SHIFT) & TRBPTR_PTR_MASK; + + assert_trbe_address_mode(addr); + assert_trbe_address_align(addr); + return addr; +} + +static inline void set_trbe_write_pointer(unsigned long addr) +{ + WARN_ON(is_trbe_enabled()); + assert_trbe_address_mode(addr); + assert_trbe_address_align(addr); + addr = (addr >> TRBPTR_PTR_SHIFT) & TRBPTR_PTR_MASK; + write_sysreg_s(addr, SYS_TRBPTR_EL1); +} + +static inline unsigned long get_trbe_limit_pointer(void) +{ + u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1); + unsigned long limit = (trblimitr >> TRBLIMITR_LIMIT_SHIFT) & TRBLIMITR_LIMIT_MASK; + unsigned long addr = limit << TRBLIMITR_LIMIT_SHIFT; + + WARN_ON(addr & (PAGE_SIZE - 1)); + assert_trbe_address_mode(addr); + assert_trbe_address_align(addr); + return addr; +} + +static inline void set_trbe_limit_pointer(unsigned long addr) +{ + u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1); + + WARN_ON(is_trbe_enabled()); + assert_trbe_address_mode(addr); + assert_trbe_address_align(addr); + WARN_ON(addr & ((1UL << TRBLIMITR_LIMIT_SHIFT) - 1)); + WARN_ON(addr & (PAGE_SIZE - 1)); + trblimitr &= ~(TRBLIMITR_LIMIT_MASK << TRBLIMITR_LIMIT_SHIFT); + trblimitr |= (addr & PAGE_MASK); + write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1); +} + +static inline unsigned long get_trbe_base_pointer(void) +{ + u64 trbbaser = read_sysreg_s(SYS_TRBBASER_EL1); + unsigned long addr = (trbbaser >> TRBBASER_BASE_SHIFT) & TRBBASER_BASE_MASK; + + addr = addr << TRBBASER_BASE_SHIFT; + WARN_ON(addr & (PAGE_SIZE - 1)); + assert_trbe_address_mode(addr); + assert_trbe_address_align(addr); + return addr; +} + +static inline void set_trbe_base_pointer(unsigned long addr) +{ + WARN_ON(is_trbe_enabled()); + assert_trbe_address_mode(addr); + assert_trbe_address_align(addr); + WARN_ON(addr & ((1UL << TRBBASER_BASE_SHIFT) - 1)); + WARN_ON(addr & (PAGE_SIZE - 1)); + write_sysreg_s(addr, SYS_TRBBASER_EL1); +}
On 11/10/20 12:45 PM, Anshuman Khandual wrote:
Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is accessible via the system registers. The TRBE supports different addressing modes including CPU virtual address and buffer modes including the circular buffer mode. The TRBE buffer is addressed by a base pointer (TRBBASER_EL1), an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But the access to the trace buffer could be prohibited by a higher exception level (EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU private interrupt (PPI) on address translation errors and when the buffer is full. Overall implementation here is inspired from the Arm SPE driver.
Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com
Documentation/trace/coresight/coresight-trbe.rst | 36 ++ arch/arm64/include/asm/sysreg.h | 2 + drivers/hwtracing/coresight/Kconfig | 11 + drivers/hwtracing/coresight/Makefile | 1 + drivers/hwtracing/coresight/coresight-trbe.c | 766 +++++++++++++++++++++++ drivers/hwtracing/coresight/coresight-trbe.h | 525 ++++++++++++++++ 6 files changed, 1341 insertions(+) create mode 100644 Documentation/trace/coresight/coresight-trbe.rst create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
diff --git a/Documentation/trace/coresight/coresight-trbe.rst b/Documentation/trace/coresight/coresight-trbe.rst new file mode 100644 index 0000000..4320a8b --- /dev/null +++ b/Documentation/trace/coresight/coresight-trbe.rst @@ -0,0 +1,36 @@ +.. SPDX-License-Identifier: GPL-2.0
+============================== +Trace Buffer Extension (TRBE). +==============================
- :Author: Anshuman Khandual anshuman.khandual@arm.com
- :Date: November 2020
+Hardware Description +--------------------
+Trace Buffer Extension (TRBE) is a percpu hardware which captures in system +memory, CPU traces generated from a corresponding percpu tracing unit. This +gets plugged in as a coresight sink device because the corresponding trace +genarators (ETE), are plugged in as source device.
+Sysfs files and directories +---------------------------
+The TRBE devices appear on the existing coresight bus alongside the other +coresight devices::
$ ls /sys/bus/coresight/devices- trbe0 trbe1 trbe2 trbe3
+The ``trbe<N>`` named TRBEs are associated with a CPU.::
$ ls /sys/bus/coresight/devices/trbe0/- irq align dbm
+*Key file items are:-*
- ``irq``: TRBE maintenance interrupt number
- ``align``: TRBE write pointer alignment
- ``dbm``: TRBE updates memory with access and dirty flags
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 14cb156..61136f6 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -97,6 +97,7 @@ #define SET_PSTATE_UAO(x) __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift)) #define SET_PSTATE_SSBS(x) __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift)) #define SET_PSTATE_TCO(x) __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift)) +#define TSB_CSYNC __emit_inst(0xd503225f) #define __SYS_BARRIER_INSN(CRm, op2, Rt) \ __emit_inst(0xd5000000 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & 0x1f)) @@ -865,6 +866,7 @@ #define ID_AA64MMFR2_CNP_SHIFT 0 /* id_aa64dfr0 */ +#define ID_AA64DFR0_TRBE_SHIFT 44 #define ID_AA64DFR0_TRACE_FILT_SHIFT 40 #define ID_AA64DFR0_DOUBLELOCK_SHIFT 36 #define ID_AA64DFR0_PMSVER_SHIFT 32 diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig index c119824..0f5e101 100644 --- a/drivers/hwtracing/coresight/Kconfig +++ b/drivers/hwtracing/coresight/Kconfig @@ -156,6 +156,17 @@ config CORESIGHT_CTI To compile this driver as a module, choose M here: the module will be called coresight-cti. +config CORESIGHT_TRBE
- bool "Trace Buffer Extension (TRBE) driver"
- depends on ARM64
- help
This driver provides support for percpu Trace Buffer Extension (TRBE).
TRBE always needs to be used along with it's corresponding percpu ETE
component. ETE generates trace data which is then captured with TRBE.
Unlike traditional sink devices, TRBE is a CPU feature accessible via
system registers. But it's explicit dependency with trace unit (ETE)
requires it to be plugged in as a coresight sink device.
- config CORESIGHT_CTI_INTEGRATION_REGS bool "Access CTI CoreSight Integration Registers" depends on CORESIGHT_CTI
diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile index f20e357..d608165 100644 --- a/drivers/hwtracing/coresight/Makefile +++ b/drivers/hwtracing/coresight/Makefile @@ -21,5 +21,6 @@ obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o obj-$(CONFIG_CORESIGHT_CATU) += coresight-catu.o obj-$(CONFIG_CORESIGHT_CTI) += coresight-cti.o +obj-$(CONFIG_CORESIGHT_TRBE) += coresight-trbe.o coresight-cti-y := coresight-cti-core.o coresight-cti-platform.o \ coresight-cti-sysfs.o diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c new file mode 100644 index 0000000..48a8ec3 --- /dev/null +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -0,0 +1,766 @@ +// SPDX-License-Identifier: GPL-2.0 +/*
- This driver enables Trace Buffer Extension (TRBE) as a per-cpu coresight
- sink device could then pair with an appropriate per-cpu coresight source
- device (ETE) thus generating required trace data. Trace can be enabled
- via the perf framework.
- Copyright (C) 2020 ARM Ltd.
- Author: Anshuman Khandual anshuman.khandual@arm.com
- */
+#define DRVNAME "arm_trbe"
+#define pr_fmt(fmt) DRVNAME ": " fmt
+#include "coresight-trbe.h"
+#define PERF_IDX2OFF(idx, buf) ((idx) % ((buf)->nr_pages << PAGE_SHIFT))
+#define ETE_IGNORE_PACKET 0x70
Add a comment here, on what this means to the decoder.
+static const char trbe_name[] = "trbe";
Why not
#define DEVNAME "trbe"
+enum trbe_fault_action {
- TRBE_FAULT_ACT_WRAP,
- TRBE_FAULT_ACT_SPURIOUS,
- TRBE_FAULT_ACT_FATAL,
+};
+struct trbe_perf {
Please rename this to trbe_buf. This will be used for sysfs mode as well.
- unsigned long trbe_base;
- unsigned long trbe_limit;
- unsigned long trbe_write;
- pid_t pid;
Why do we need this ? This seems unused and moreover, there cannot be multiple tracers into TRBE. So, we don't need to share the sink unlike the traditional ones.
- int nr_pages;
- void **pages;
- bool snapshot;
- struct trbe_cpudata *cpudata;
+};
+struct trbe_cpudata {
- struct coresight_device *csdev;
- bool trbe_dbm;
Why do we need this ?
- u64 trbe_align;
- int cpu;
- enum cs_mode mode;
- struct trbe_perf *perf;
- struct trbe_drvdata *drvdata;
+};
+struct trbe_drvdata {
- struct trbe_cpudata __percpu *cpudata;
- struct perf_output_handle __percpu *handle;
Shouldn't this be :
struct perf_output_handle __percpu **handle ?
as we get a handle from the etm-perf and is not controlled by the TRBE ?
- struct hlist_node hotplug_node;
- int irq;
- cpumask_t supported_cpus;
- enum cpuhp_state trbe_online;
- struct platform_device *pdev;
- struct clk *atclk;
We don't have any clocks for the TRBE instance. Please remove.
+};
+static int trbe_alloc_node(struct perf_event *event) +{
- if (event->cpu == -1)
return NUMA_NO_NODE;
- return cpu_to_node(event->cpu);
+}
+static void trbe_disable_and_drain_local(void) +{
- write_sysreg_s(0, SYS_TRBLIMITR_EL1);
- isb();
- dsb(nsh);
- asm(TSB_CSYNC);
+}
+static void trbe_reset_local(void) +{
- trbe_disable_and_drain_local();
- write_sysreg_s(0, SYS_TRBPTR_EL1);
- isb();
- write_sysreg_s(0, SYS_TRBBASER_EL1);
- isb();
- write_sysreg_s(0, SYS_TRBSR_EL1);
- isb();
+}
+static void trbe_pad_buf(struct perf_output_handle *handle, int len) +{
- struct trbe_perf *perf = etm_perf_sink_config(handle);
- u64 head = PERF_IDX2OFF(handle->head, perf);
- memset((void *) perf->trbe_base + head, ETE_IGNORE_PACKET, len);
- if (!perf->snapshot)
perf_aux_output_skip(handle, len);
+}
+static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle) +{
- struct trbe_perf *perf = etm_perf_sink_config(handle);
- u64 head = PERF_IDX2OFF(handle->head, perf);
- u64 limit = perf->nr_pages * PAGE_SIZE;
So we are using half of the buffer for snapshot mode to avoid a case where the analyzer is unable to decode the trace in case of an overflow.
- if (head < limit >> 1)
limit >>= 1;
Also this needs to be thought out. We may not need this restriction. The trace decoder will be able to walk forward and then find a synchronization packet and then continue the tracing from there. So, we could use the entire buffer for TRBE.
- return limit;
+}
+static unsigned long trbe_normal_offset(struct perf_output_handle *handle) +{
- struct trbe_perf *perf = etm_perf_sink_config(handle);
- struct trbe_cpudata *cpudata = perf->cpudata;
- const u64 bufsize = perf->nr_pages * PAGE_SIZE;
- u64 limit = bufsize;
- u64 head, tail, wakeup;
Commentary please.
- head = PERF_IDX2OFF(handle->head, perf);
- if (!IS_ALIGNED(head, cpudata->trbe_align)) {
unsigned long delta = roundup(head, cpudata->trbe_align) - head;
delta = min(delta, handle->size);
trbe_pad_buf(handle, delta);
head = PERF_IDX2OFF(handle->head, perf);
- }
- if (!handle->size) {
perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
return 0;
- }
- tail = PERF_IDX2OFF(handle->head + handle->size, perf);
- wakeup = PERF_IDX2OFF(handle->wakeup, perf);
- if (head < tail)
comment
limit = round_down(tail, PAGE_SIZE);
- if (handle->wakeup < (handle->head + handle->size) && head <= wakeup)
limit = min(limit, round_up(wakeup, PAGE_SIZE));
comment. Also do we need an alignement to PAGE_SIZE ?
- if (limit > head)
return limit;
- trbe_pad_buf(handle, handle->size);
- perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
- return 0;
+}
+static unsigned long get_trbe_limit(struct perf_output_handle *handle) +{
- struct trbe_perf *perf = etm_perf_sink_config(handle);
- unsigned long offset;
- if (perf->snapshot)
offset = trbe_snapshot_offset(handle);
- else
offset = trbe_normal_offset(handle);
- return perf->trbe_base + offset;
+}
+static void trbe_enable_hw(struct trbe_perf *perf) +{
- WARN_ON(perf->trbe_write < perf->trbe_base);
- WARN_ON(perf->trbe_write >= perf->trbe_limit);
- set_trbe_disabled();
- clr_trbe_irq();
- clr_trbe_wrap();
- clr_trbe_abort();
- clr_trbe_ec();
- clr_trbe_bsc();
- clr_trbe_fsc();
Please merge all of these field updates to single register update unless mandated by the architecture.
- set_trbe_virtual_mode();
- set_trbe_fill_mode(TRBE_FILL_STOP);
- set_trbe_trig_mode(TRBE_TRIGGER_IGNORE);
Same here ^^
- isb();
- set_trbe_base_pointer(perf->trbe_base);
- set_trbe_limit_pointer(perf->trbe_limit);
- set_trbe_write_pointer(perf->trbe_write);
- isb();
- dsb(ishst);
- flush_tlb_all();
Why is this needed ?
- set_trbe_running();
- set_trbe_enabled();
- asm(TSB_CSYNC);
+}
+static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
struct perf_event *event, void **pages,
int nr_pages, bool snapshot)
+{
- struct trbe_perf *perf;
- struct page **pglist;
- int i;
- if ((nr_pages < 2) || (snapshot && (nr_pages & 1)))
We may be able to remove the restriction on snapshot mode, see my comment above.
return NULL;
- perf = kzalloc_node(sizeof(*perf), GFP_KERNEL, trbe_alloc_node(event));
- if (IS_ERR(perf))
return ERR_PTR(-ENOMEM);
- pglist = kcalloc(nr_pages, sizeof(*pglist), GFP_KERNEL);
- if (IS_ERR(pglist)) {
kfree(perf);
return ERR_PTR(-ENOMEM);
- }
- for (i = 0; i < nr_pages; i++)
pglist[i] = virt_to_page(pages[i]);
- perf->trbe_base = (unsigned long) vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL);
- if (IS_ERR((void *) perf->trbe_base)) {
kfree(pglist);
kfree(perf);
return ERR_PTR(perf->trbe_base);
- }
- perf->trbe_limit = perf->trbe_base + nr_pages * PAGE_SIZE;
- perf->trbe_write = perf->trbe_base;
- perf->pid = task_pid_nr(event->owner);
- perf->snapshot = snapshot;
- perf->nr_pages = nr_pages;
- perf->pages = pages;
- kfree(pglist);
- return perf;
+}
+void arm_trbe_free_buffer(void *config) +{
- struct trbe_perf *perf = config;
- vunmap((void *) perf->trbe_base);
- kfree(perf);
+}
+static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
struct perf_output_handle *handle,
void *config)
+{
- struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
- struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
- struct trbe_perf *perf = config;
- unsigned long size, offset;
- WARN_ON(perf->cpudata != cpudata);
- WARN_ON(cpudata->cpu != smp_processor_id());
- WARN_ON(cpudata->mode != CS_MODE_PERF);
- WARN_ON(cpudata->drvdata != drvdata);
- offset = get_trbe_write_pointer() - get_trbe_base_pointer();
- size = offset - PERF_IDX2OFF(handle->head, perf);
- if (perf->snapshot)
handle->head += size;
- return size;
+}
+static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data) +{
- struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
- struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
- struct perf_output_handle *handle = data;
- struct trbe_perf *perf = etm_perf_sink_config(handle);
- WARN_ON(cpudata->cpu != smp_processor_id());
- WARN_ON(mode != CS_MODE_PERF);
Why WARN_ON ? Simply return -EINVAL ? Also you need a check to make sure the mode is DISABLED (when you get to sysfs mode).
- WARN_ON(cpudata->drvdata != drvdata);
- *this_cpu_ptr(drvdata->handle) = *handle;
That is wrong. Storing a local copy of a global perf generic structure is calling for trouble, assuming that the global structure doesn't change beneath us. Please store handle ptr.
- cpudata->perf = perf;
- cpudata->mode = mode;
- perf->cpudata = cpudata;
- perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
- perf->trbe_limit = get_trbe_limit(handle);
- if (perf->trbe_limit == perf->trbe_base) {
trbe_disable_and_drain_local();
return 0;
- }
- trbe_enable_hw(perf);
- return 0;
+}
+static int arm_trbe_disable(struct coresight_device *csdev) +{
- struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
- struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
- struct trbe_perf *perf = cpudata->perf;
- WARN_ON(perf->cpudata != cpudata);
- WARN_ON(cpudata->cpu != smp_processor_id());
- WARN_ON(cpudata->mode != CS_MODE_PERF);
- WARN_ON(cpudata->drvdata != drvdata);
- trbe_disable_and_drain_local();
- perf->cpudata = NULL;
- cpudata->perf = NULL;
- cpudata->mode = CS_MODE_DISABLED;
- return 0;
+}
+static void trbe_handle_fatal(struct perf_output_handle *handle) +{
- perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
- perf_aux_output_end(handle, 0);
- trbe_disable_and_drain_local();
+}
+static void trbe_handle_spurious(struct perf_output_handle *handle) +{
- struct trbe_perf *perf = etm_perf_sink_config(handle);
- perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
- perf->trbe_limit = get_trbe_limit(handle);
- if (perf->trbe_limit == perf->trbe_base) {
trbe_disable_and_drain_local();
return;
- }
- trbe_enable_hw(perf);
+}
+static void trbe_handle_overflow(struct perf_output_handle *handle) +{
- struct perf_event *event = handle->event;
- struct trbe_perf *perf = etm_perf_sink_config(handle);
- unsigned long offset, size;
- struct etm_event_data *event_data;
- offset = get_trbe_limit_pointer() - get_trbe_base_pointer();
- size = offset - PERF_IDX2OFF(handle->head, perf);
- if (perf->snapshot)
handle->head = offset;
Is this correct ? Or was this supposed to mean : handle->head += offset;
- perf_aux_output_end(handle, size);
- event_data = perf_aux_output_begin(handle, event);
- if (!event_data) {
event->hw.state |= PERF_HES_STOPPED;
trbe_disable_and_drain_local();
return;
- }
- perf->trbe_write = perf->trbe_base;
- perf->trbe_limit = get_trbe_limit(handle);
- if (perf->trbe_limit == perf->trbe_base) {
trbe_disable_and_drain_local();
return;
- }
- *this_cpu_ptr(perf->cpudata->drvdata->handle) = *handle;
- trbe_enable_hw(perf);
+}
+static bool is_perf_trbe(struct perf_output_handle *handle) +{
- struct trbe_perf *perf = etm_perf_sink_config(handle);
- struct trbe_cpudata *cpudata = perf->cpudata;
- struct trbe_drvdata *drvdata = cpudata->drvdata;
Can you trust the cpudata ptr here as we are still verifying if this was legitimate ?
- int cpu = smp_processor_id();
- WARN_ON(perf->trbe_base != get_trbe_base_pointer());
- WARN_ON(perf->trbe_limit != get_trbe_limit_pointer());
- if (cpudata->mode != CS_MODE_PERF)
return false;
- if (cpudata->cpu != cpu)
return false;
- if (!cpumask_test_cpu(cpu, &drvdata->supported_cpus))
return false;
- return true;
+}
+static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle) +{
- enum trbe_ec ec = get_trbe_ec();
- enum trbe_bsc bsc = get_trbe_bsc();
- WARN_ON(is_trbe_running());
- asm(TSB_CSYNC);
- dsb(nsh);
- isb();
- if (is_trbe_trg() || is_trbe_abort())
return TRBE_FAULT_ACT_FATAL;
- if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
return TRBE_FAULT_ACT_FATAL;
- if (is_trbe_wrap() && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
if (get_trbe_write_pointer() == get_trbe_base_pointer())
return TRBE_FAULT_ACT_WRAP;
- }
- return TRBE_FAULT_ACT_SPURIOUS;
+}
+static irqreturn_t arm_trbe_irq_handler(int irq, void *dev) +{
- struct perf_output_handle *handle = dev;
- enum trbe_fault_action act;
- WARN_ON(!is_trbe_irq());
- clr_trbe_irq();
- if (!perf_get_aux(handle))
return IRQ_NONE;
- if (!is_perf_trbe(handle))
return IRQ_NONE;
- irq_work_run();
- act = trbe_get_fault_act(handle);
- switch (act) {
- case TRBE_FAULT_ACT_WRAP:
trbe_handle_overflow(handle);
break;
- case TRBE_FAULT_ACT_SPURIOUS:
trbe_handle_spurious(handle);
break;
- case TRBE_FAULT_ACT_FATAL:
trbe_handle_fatal(handle);
break;
- }
- return IRQ_HANDLED;
+}
+static void arm_trbe_probe_coresight_cpu(void *info) +{
- struct trbe_cpudata *cpudata = info;
- struct device *dev = &cpudata->drvdata->pdev->dev;
- struct coresight_desc desc = { 0 };
- if (WARN_ON(!cpudata))
goto cpu_clear;
- if (!is_trbe_available()) {
pr_err("TRBE is not implemented on cpu %d\n", cpudata->cpu);
goto cpu_clear;
- }
- if (!is_trbe_programmable()) {
pr_err("TRBE is owned in higher exception level on cpu %d\n", cpudata->cpu);
goto cpu_clear;
- }
- desc.name = devm_kasprintf(dev, GFP_KERNEL, "%s%d", trbe_name, smp_processor_id());
- if (IS_ERR(desc.name))
goto cpu_clear;
- desc.type = CORESIGHT_DEV_TYPE_SINK;
- desc.subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_SYSMEM;
May be should add a new subtype to make this higher priority than the normal ETR. Something like :
CORESIGHT_DEV_SUBTYPE_SINK_PERCPU_SYSMEM
- desc.ops = &arm_trbe_cs_ops;
- desc.pdata = dev_get_platdata(dev);
- desc.groups = arm_trbe_groups;
- desc.dev = dev;
- cpudata->csdev = coresight_register(&desc);
- if (IS_ERR(cpudata->csdev))
goto cpu_clear;
- dev_set_drvdata(&cpudata->csdev->dev, cpudata);
- cpudata->trbe_dbm = get_trbe_flag_update();
- cpudata->trbe_align = 1ULL << get_trbe_address_align();
- if (cpudata->trbe_align > SZ_2K) {
pr_err("Unsupported alignment on cpu %d\n", cpudata->cpu);
goto cpu_clear;
- }
- return;
+cpu_clear:
- cpumask_clear_cpu(cpudata->cpu, &cpudata->drvdata->supported_cpus);
+}
+static int arm_trbe_probe_coresight(struct trbe_drvdata *drvdata) +{
- struct trbe_cpudata *cpudata;
- int cpu;
- drvdata->cpudata = alloc_percpu(typeof(*drvdata->cpudata));
- if (IS_ERR(drvdata->cpudata))
return PTR_ERR(drvdata->cpudata);
- for_each_cpu(cpu, &drvdata->supported_cpus) {
cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
cpudata->cpu = cpu;
cpudata->drvdata = drvdata;
smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);
We could batch it and run it on all CPUs at the same time ? Also it would be better to leave the per_cpu area filled by the CPU itself, to avoid racing.
- }
- return 0;
+}
+static void arm_trbe_remove_coresight_cpu(void *info) +{
- struct trbe_drvdata *drvdata = info;
- disable_percpu_irq(drvdata->irq);
+}
+static int arm_trbe_remove_coresight(struct trbe_drvdata *drvdata) +{
- struct trbe_cpudata *cpudata;
- int cpu;
- for_each_cpu(cpu, &drvdata->supported_cpus) {
smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu, drvdata, 1);
cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
if (cpudata->csdev) {
coresight_unregister(cpudata->csdev);
cpudata->drvdata = NULL;
cpudata->csdev = NULL;
}
Please leave this to the CPU to do the part.
- }
- free_percpu(drvdata->cpudata);
- return 0;
+}
+static int arm_trbe_cpu_startup(unsigned int cpu, struct hlist_node *node) +{
- struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node);
- struct trbe_cpudata *cpudata;
- if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
if (!cpudata->csdev) {
cpudata->drvdata = drvdata;
smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);
Why do we need smp_call here ? We are already on the CPU.
}
trbe_reset_local();
enable_percpu_irq(drvdata->irq, IRQ_TYPE_NONE);
- }
- return 0;
+}
+static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node) +{
- struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node);
- struct trbe_cpudata *cpudata;
- if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
if (cpudata->csdev) {
coresight_unregister(cpudata->csdev);
cpudata->drvdata = NULL;
cpudata->csdev = NULL;
}
disable_percpu_irq(drvdata->irq);
trbe_reset_local();
- }
- return 0;
+}
+static int arm_trbe_probe_cpuhp(struct trbe_drvdata *drvdata) +{
- enum cpuhp_state trbe_online;
- trbe_online = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, DRVNAME,
arm_trbe_cpu_startup, arm_trbe_cpu_teardown);
- if (trbe_online < 0)
return -EINVAL;
- if (cpuhp_state_add_instance(trbe_online, &drvdata->hotplug_node))
return -EINVAL;
- drvdata->trbe_online = trbe_online;
- return 0;
+}
+static void arm_trbe_remove_cpuhp(struct trbe_drvdata *drvdata) +{
- cpuhp_remove_multi_state(drvdata->trbe_online);
+}
+static int arm_trbe_probe_irq(struct platform_device *pdev,
struct trbe_drvdata *drvdata)
+{
- drvdata->irq = platform_get_irq(pdev, 0);
- if (!drvdata->irq) {
pr_err("IRQ not found for the platform device\n");
return -ENXIO;
- }
- if (!irq_is_percpu(drvdata->irq)) {
pr_err("IRQ is not a PPI\n");
return -EINVAL;
- }
- if (irq_get_percpu_devid_partition(drvdata->irq, &drvdata->supported_cpus))
return -EINVAL;
- drvdata->handle = alloc_percpu(typeof(*drvdata->handle));
- if (!drvdata->handle)
return -ENOMEM;
- if (request_percpu_irq(drvdata->irq, arm_trbe_irq_handler, DRVNAME, drvdata->handle)) {
free_percpu(drvdata->handle);
return -EINVAL;
- }
- return 0;
+}
+static void arm_trbe_remove_irq(struct trbe_drvdata *drvdata) +{
- free_percpu_irq(drvdata->irq, drvdata->handle);
- free_percpu(drvdata->handle);
+}
+static int arm_trbe_device_probe(struct platform_device *pdev) +{
- struct coresight_platform_data *pdata;
- struct trbe_drvdata *drvdata;
- struct device *dev = &pdev->dev;
- int ret;
- drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
- if (IS_ERR(drvdata))
return -ENOMEM;
- pdata = coresight_get_platform_data(dev);
- if (IS_ERR(pdata)) {
kfree(drvdata);
return -ENOMEM;
- }
- drvdata->atclk = devm_clk_get(dev, "atclk");
- if (!IS_ERR(drvdata->atclk)) {
ret = clk_prepare_enable(drvdata->atclk);
if (ret)
return ret;
- }
Please drop the clocks, we don't have any
- dev_set_drvdata(dev, drvdata);
- dev->platform_data = pdata;
- drvdata->pdev = pdev;
- ret = arm_trbe_probe_irq(pdev, drvdata);
- if (ret)
goto irq_failed;
- ret = arm_trbe_probe_coresight(drvdata);
- if (ret)
goto probe_failed;
- ret = arm_trbe_probe_cpuhp(drvdata);
- if (ret)
goto cpuhp_failed;
- return 0;
+cpuhp_failed:
- arm_trbe_remove_coresight(drvdata);
+probe_failed:
- arm_trbe_remove_irq(drvdata);
+irq_failed:
- kfree(pdata);
- kfree(drvdata);
- return ret;
+}
+static int arm_trbe_device_remove(struct platform_device *pdev) +{
- struct coresight_platform_data *pdata = dev_get_platdata(&pdev->dev);
- struct trbe_drvdata *drvdata = platform_get_drvdata(pdev);
- arm_trbe_remove_coresight(drvdata);
- arm_trbe_remove_cpuhp(drvdata);
- arm_trbe_remove_irq(drvdata);
- kfree(pdata);
- kfree(drvdata);
- return 0;
+}
+#ifdef CONFIG_PM +static int arm_trbe_runtime_suspend(struct device *dev) +{
- struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
- if (drvdata && !IS_ERR(drvdata->atclk))
clk_disable_unprepare(drvdata->atclk);
Remove. We may need to save/restore the TRBE ptrs, depending on the TRBE.
- return 0;
+}
+static int arm_trbe_runtime_resume(struct device *dev) +{
- struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
- if (drvdata && !IS_ERR(drvdata->atclk))
clk_prepare_enable(drvdata->atclk);
Remove. See above.
- return 0;
+} +#endif
+static const struct dev_pm_ops arm_trbe_dev_pm_ops = {
- SET_RUNTIME_PM_OPS(arm_trbe_runtime_suspend, arm_trbe_runtime_resume, NULL)
+};
+static const struct of_device_id arm_trbe_of_match[] = {
- { .compatible = "arm,arm-trbe", .data = (void *)1 },
- {},
+};
I think it is better to call this, we have too many acronyms ;-)
"arm,trace-buffer-extension"
+MODULE_DEVICE_TABLE(of, arm_trbe_of_match);
+static const struct platform_device_id arm_trbe_match[] = {
- { "arm,trbe", 0},
- { }
+}; +MODULE_DEVICE_TABLE(platform, arm_trbe_match);
Please remove. The ACPI part can be added when we get to it.
+static struct platform_driver arm_trbe_driver = {
- .id_table = arm_trbe_match,
- .driver = {
.name = DRVNAME,
.of_match_table = of_match_ptr(arm_trbe_of_match),
.pm = &arm_trbe_dev_pm_ops,
.suppress_bind_attrs = true,
- },
- .probe = arm_trbe_device_probe,
- .remove = arm_trbe_device_remove,
+}; +builtin_platform_driver(arm_trbe_driver)
Please make this modular.
diff --git a/drivers/hwtracing/coresight/coresight-trbe.h b/drivers/hwtracing/coresight/coresight-trbe.h new file mode 100644 index 0000000..82ffbfc --- /dev/null +++ b/drivers/hwtracing/coresight/coresight-trbe.h @@ -0,0 +1,525 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/*
- This contains all required hardware related helper functions for
- Trace Buffer Extension (TRBE) driver in the coresight framework.
- Copyright (C) 2020 ARM Ltd.
- Author: Anshuman Khandual anshuman.khandual@arm.com
- */
+#include <linux/coresight.h> +#include <linux/device.h> +#include <linux/irq.h> +#include <linux/kernel.h> +#include <linux/of.h> +#include <linux/platform_device.h> +#include <linux/smp.h>
+#include "coresight-etm-perf.h"
+static inline bool is_trbe_available(void) +{
- u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
- int trbe = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_TRBE_SHIFT);
- return trbe >= 0b0001;
+}
+static inline bool is_ete_available(void) +{
- u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
- int tracever = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_TRACEVER_SHIFT);
- return (tracever != 0b0000);
Why is this needed ?
+}
+static inline bool is_trbe_enabled(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- return trblimitr & TRBLIMITR_ENABLE;
+}
+enum trbe_ec {
- TRBE_EC_OTHERS = 0,
- TRBE_EC_STAGE1_ABORT = 36,
- TRBE_EC_STAGE2_ABORT = 37,
+};
+static const char *const trbe_ec_str[] = {
- [TRBE_EC_OTHERS] = "Maintenance exception",
- [TRBE_EC_STAGE1_ABORT] = "Stage-1 exception",
- [TRBE_EC_STAGE2_ABORT] = "Stage-2 exception",
+};
Please remove the defintions that are not used by the driver.
+static inline enum trbe_ec get_trbe_ec(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- return (trbsr >> TRBSR_EC_SHIFT) & TRBSR_EC_MASK;
+}
+static inline void clr_trbe_ec(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- trbsr &= ~(TRBSR_EC_MASK << TRBSR_EC_SHIFT);
- write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+enum trbe_bsc {
- TRBE_BSC_NOT_STOPPED = 0,
- TRBE_BSC_FILLED = 1,
- TRBE_BSC_TRIGGERED = 2,
+};
+static const char *const trbe_bsc_str[] = {
- [TRBE_BSC_NOT_STOPPED] = "TRBE collection not stopped",
- [TRBE_BSC_FILLED] = "TRBE filled",
- [TRBE_BSC_TRIGGERED] = "TRBE triggered",
+};
+static inline enum trbe_bsc get_trbe_bsc(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- return (trbsr >> TRBSR_BSC_SHIFT) & TRBSR_BSC_MASK;
+}
+static inline void clr_trbe_bsc(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- trbsr &= ~(TRBSR_BSC_MASK << TRBSR_BSC_SHIFT);
- write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+enum trbe_fsc {
- TRBE_FSC_ASF_LEVEL0 = 0,
- TRBE_FSC_ASF_LEVEL1 = 1,
- TRBE_FSC_ASF_LEVEL2 = 2,
- TRBE_FSC_ASF_LEVEL3 = 3,
- TRBE_FSC_TF_LEVEL0 = 4,
- TRBE_FSC_TF_LEVEL1 = 5,
- TRBE_FSC_TF_LEVEL2 = 6,
- TRBE_FSC_TF_LEVEL3 = 7,
- TRBE_FSC_AFF_LEVEL0 = 8,
- TRBE_FSC_AFF_LEVEL1 = 9,
- TRBE_FSC_AFF_LEVEL2 = 10,
- TRBE_FSC_AFF_LEVEL3 = 11,
- TRBE_FSC_PF_LEVEL0 = 12,
- TRBE_FSC_PF_LEVEL1 = 13,
- TRBE_FSC_PF_LEVEL2 = 14,
- TRBE_FSC_PF_LEVEL3 = 15,
- TRBE_FSC_SEA_WRITE = 16,
- TRBE_FSC_ASEA_WRITE = 17,
- TRBE_FSC_SEA_LEVEL0 = 20,
- TRBE_FSC_SEA_LEVEL1 = 21,
- TRBE_FSC_SEA_LEVEL2 = 22,
- TRBE_FSC_SEA_LEVEL3 = 23,
- TRBE_FSC_ALIGN_FAULT = 33,
- TRBE_FSC_TLB_FAULT = 48,
- TRBE_FSC_ATOMIC_FAULT = 49,
+};
Please remove ^^^
+static const char *const trbe_fsc_str[] = {
- [TRBE_FSC_ASF_LEVEL0] = "Address size fault - level 0",
- [TRBE_FSC_ASF_LEVEL1] = "Address size fault - level 1",
- [TRBE_FSC_ASF_LEVEL2] = "Address size fault - level 2",
- [TRBE_FSC_ASF_LEVEL3] = "Address size fault - level 3",
- [TRBE_FSC_TF_LEVEL0] = "Translation fault - level 0",
- [TRBE_FSC_TF_LEVEL1] = "Translation fault - level 1",
- [TRBE_FSC_TF_LEVEL2] = "Translation fault - level 2",
- [TRBE_FSC_TF_LEVEL3] = "Translation fault - level 3",
- [TRBE_FSC_AFF_LEVEL0] = "Access flag fault - level 0",
- [TRBE_FSC_AFF_LEVEL1] = "Access flag fault - level 1",
- [TRBE_FSC_AFF_LEVEL2] = "Access flag fault - level 2",
- [TRBE_FSC_AFF_LEVEL3] = "Access flag fault - level 3",
- [TRBE_FSC_PF_LEVEL0] = "Permission fault - level 0",
- [TRBE_FSC_PF_LEVEL1] = "Permission fault - level 1",
- [TRBE_FSC_PF_LEVEL2] = "Permission fault - level 2",
- [TRBE_FSC_PF_LEVEL3] = "Permission fault - level 3",
- [TRBE_FSC_SEA_WRITE] = "Synchronous external abort on write",
- [TRBE_FSC_ASEA_WRITE] = "Asynchronous external abort on write",
- [TRBE_FSC_SEA_LEVEL0] = "Syncrhonous external abort on table walk - level 0",
- [TRBE_FSC_SEA_LEVEL1] = "Syncrhonous external abort on table walk - level 1",
- [TRBE_FSC_SEA_LEVEL2] = "Syncrhonous external abort on table walk - level 2",
- [TRBE_FSC_SEA_LEVEL3] = "Syncrhonous external abort on table walk - level 3",
- [TRBE_FSC_ALIGN_FAULT] = "Alignment fault",
- [TRBE_FSC_TLB_FAULT] = "TLB conflict fault",
- [TRBE_FSC_ATOMIC_FAULT] = "Atmoc fault",
+};
Please remove ^^^
+enum trbe_address_mode {
- TRBE_ADDRESS_VIRTUAL,
- TRBE_ADDRESS_PHYSICAL,
+};
#define please.
+static const char *const trbe_address_mode_str[] = {
- [TRBE_ADDRESS_VIRTUAL] = "Address mode - virtual",
- [TRBE_ADDRESS_PHYSICAL] = "Address mode - physical",
+};
Do we need this ? We always use virtual.
+static inline bool is_trbe_virtual_mode(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- return !(trblimitr & TRBLIMITR_NVM);
+}
Remove
+static inline bool is_trbe_physical_mode(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- return trblimitr & TRBLIMITR_NVM;
+}
Remove
+static inline void set_trbe_virtual_mode(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- trblimitr &= ~TRBLIMITR_NVM;
- write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+static inline void set_trbe_physical_mode(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- trblimitr |= TRBLIMITR_NVM;
- write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
Remove
+enum trbe_trig_mode {
- TRBE_TRIGGER_STOP = 0,
- TRBE_TRIGGER_IRQ = 1,
- TRBE_TRIGGER_IGNORE = 3,
+};
+static const char *const trbe_trig_mode_str[] = {
- [TRBE_TRIGGER_STOP] = "Trigger mode - stop",
- [TRBE_TRIGGER_IRQ] = "Trigger mode - irq",
- [TRBE_TRIGGER_IGNORE] = "Trigger mode - ignore",
+};
+static inline enum trbe_trig_mode get_trbe_trig_mode(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- return (trblimitr >> TRBLIMITR_TRIG_MODE_SHIFT) & TRBLIMITR_TRIG_MODE_MASK;
+}
+static inline void set_trbe_trig_mode(enum trbe_trig_mode mode) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- trblimitr &= ~(TRBLIMITR_TRIG_MODE_MASK << TRBLIMITR_TRIG_MODE_SHIFT);
- trblimitr |= ((mode & TRBLIMITR_TRIG_MODE_MASK) << TRBLIMITR_TRIG_MODE_SHIFT);
- write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+enum trbe_fill_mode {
- TRBE_FILL_STOP = 0,
- TRBE_FILL_WRAP = 1,
- TRBE_FILL_CIRCULAR = 3,
+};
Please use #define
+static const char *const trbe_fill_mode_str[] = {
- [TRBE_FILL_STOP] = "Buffer mode - stop",
- [TRBE_FILL_WRAP] = "Buffer mode - wrap",
- [TRBE_FILL_CIRCULAR] = "Buffer mode - circular",
+};
+static inline enum trbe_fill_mode get_trbe_fill_mode(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- return (trblimitr >> TRBLIMITR_FILL_MODE_SHIFT) & TRBLIMITR_FILL_MODE_MASK;
+}
+static inline void set_trbe_fill_mode(enum trbe_fill_mode mode) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- trblimitr &= ~(TRBLIMITR_FILL_MODE_MASK << TRBLIMITR_FILL_MODE_SHIFT);
- trblimitr |= ((mode & TRBLIMITR_FILL_MODE_MASK) << TRBLIMITR_FILL_MODE_SHIFT);
- write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+static inline void set_trbe_disabled(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- trblimitr &= ~TRBLIMITR_ENABLE;
- write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+static inline void set_trbe_enabled(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- trblimitr |= TRBLIMITR_ENABLE;
- write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+static inline bool get_trbe_flag_update(void) +{
- u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
- return trbidr & TRBIDR_FLAG;
+}
+static inline bool is_trbe_programmable(void) +{
- u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
- return !(trbidr & TRBIDR_PROG);
+} +# +enum trbe_buffer_align {
- TRBE_BUFFER_BYTE,
- TRBE_BUFFER_HALF_WORD,
- TRBE_BUFFER_WORD,
- TRBE_BUFFER_DOUBLE_WORD,
- TRBE_BUFFER_16_BYTES,
- TRBE_BUFFER_32_BYTES,
- TRBE_BUFFER_64_BYTES,
- TRBE_BUFFER_128_BYTES,
- TRBE_BUFFER_256_BYTES,
- TRBE_BUFFER_512_BYTES,
- TRBE_BUFFER_1K_BYTES,
- TRBE_BUFFER_2K_BYTES,
+};
Remove ^^
+static const char *const trbe_buffer_align_str[] = {
- [TRBE_BUFFER_BYTE] = "Byte",
- [TRBE_BUFFER_HALF_WORD] = "Half word",
- [TRBE_BUFFER_WORD] = "Word",
- [TRBE_BUFFER_DOUBLE_WORD] = "Double word",
- [TRBE_BUFFER_16_BYTES] = "16 bytes",
- [TRBE_BUFFER_32_BYTES] = "32 bytes",
- [TRBE_BUFFER_64_BYTES] = "64 bytes",
- [TRBE_BUFFER_128_BYTES] = "128 bytes",
- [TRBE_BUFFER_256_BYTES] = "256 bytes",
- [TRBE_BUFFER_512_BYTES] = "512 bytes",
- [TRBE_BUFFER_1K_BYTES] = "1K bytes",
- [TRBE_BUFFER_2K_BYTES] = "2K bytes",
+};
We don't need any of this. We could simply "<<" and get the size.
+static inline enum trbe_buffer_align get_trbe_address_align(void) +{
- u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
- return (trbidr >> TRBIDR_ALIGN_SHIFT) & TRBIDR_ALIGN_MASK;
+}
+static inline void assert_trbe_address_mode(unsigned long addr) +{
- bool virt_addr = virt_addr_valid(addr) || is_vmalloc_addr((void *)addr);
- bool virt_mode = is_trbe_virtual_mode();
- WARN_ON(addr && ((virt_addr && !virt_mode) || (!virt_addr && virt_mode)));
+}
I am not sure if this is really helpful. You have to trust the kernel vmalloc().
+static inline void assert_trbe_address_align(unsigned long addr) +{
- unsigned long nr_bytes = 1ULL << get_trbe_address_align();
- WARN_ON(addr & (nr_bytes - 1));
+}
+static inline unsigned long get_trbe_write_pointer(void) +{
- u64 trbptr = read_sysreg_s(SYS_TRBPTR_EL1);
- unsigned long addr = (trbptr >> TRBPTR_PTR_SHIFT) & TRBPTR_PTR_MASK;
- assert_trbe_address_mode(addr);
- assert_trbe_address_align(addr);
- return addr;
+}
+static inline void set_trbe_write_pointer(unsigned long addr) +{
- WARN_ON(is_trbe_enabled());
- assert_trbe_address_mode(addr);
- assert_trbe_address_align(addr);
- addr = (addr >> TRBPTR_PTR_SHIFT) & TRBPTR_PTR_MASK;
- write_sysreg_s(addr, SYS_TRBPTR_EL1);
+}
+static inline unsigned long get_trbe_limit_pointer(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- unsigned long limit = (trblimitr >> TRBLIMITR_LIMIT_SHIFT) & TRBLIMITR_LIMIT_MASK;
- unsigned long addr = limit << TRBLIMITR_LIMIT_SHIFT;
- WARN_ON(addr & (PAGE_SIZE - 1));
- assert_trbe_address_mode(addr);
- assert_trbe_address_align(addr);
- return addr;
+}
+static inline void set_trbe_limit_pointer(unsigned long addr) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- WARN_ON(is_trbe_enabled());
- assert_trbe_address_mode(addr);
- assert_trbe_address_align(addr);
- WARN_ON(addr & ((1UL << TRBLIMITR_LIMIT_SHIFT) - 1));
- WARN_ON(addr & (PAGE_SIZE - 1));
- trblimitr &= ~(TRBLIMITR_LIMIT_MASK << TRBLIMITR_LIMIT_SHIFT);
- trblimitr |= (addr & PAGE_MASK);
- write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+static inline unsigned long get_trbe_base_pointer(void) +{
- u64 trbbaser = read_sysreg_s(SYS_TRBBASER_EL1);
- unsigned long addr = (trbbaser >> TRBBASER_BASE_SHIFT) & TRBBASER_BASE_MASK;
- addr = addr << TRBBASER_BASE_SHIFT;
- WARN_ON(addr & (PAGE_SIZE - 1));
- assert_trbe_address_mode(addr);
- assert_trbe_address_align(addr);
- return addr;
+}
+static inline void set_trbe_base_pointer(unsigned long addr) +{
- WARN_ON(is_trbe_enabled());
- assert_trbe_address_mode(addr);
- assert_trbe_address_align(addr);
- WARN_ON(addr & ((1UL << TRBBASER_BASE_SHIFT) - 1));
- WARN_ON(addr & (PAGE_SIZE - 1));
- write_sysreg_s(addr, SYS_TRBBASER_EL1);
+}
Suzuki
On 11/12/20 3:43 PM, Suzuki K Poulose wrote:
On 11/10/20 12:45 PM, Anshuman Khandual wrote:
Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is accessible via the system registers. The TRBE supports different addressing modes including CPU virtual address and buffer modes including the circular buffer mode. The TRBE buffer is addressed by a base pointer (TRBBASER_EL1), an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But the access to the trace buffer could be prohibited by a higher exception level (EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU private interrupt (PPI) on address translation errors and when the buffer is full. Overall implementation here is inspired from the Arm SPE driver.
Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com
Documentation/trace/coresight/coresight-trbe.rst | 36 ++  arch/arm64/include/asm/sysreg.h                 |  2 +  drivers/hwtracing/coresight/Kconfig             | 11 +  drivers/hwtracing/coresight/Makefile            |  1 +  drivers/hwtracing/coresight/coresight-trbe.c    | 766 +++++++++++++++++++++++  drivers/hwtracing/coresight/coresight-trbe.h    | 525 ++++++++++++++++  6 files changed, 1341 insertions(+)  create mode 100644 Documentation/trace/coresight/coresight-trbe.rst  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
diff --git a/Documentation/trace/coresight/coresight-trbe.rst b/Documentation/trace/coresight/coresight-trbe.rst new file mode 100644 index 0000000..4320a8b --- /dev/null +++ b/Documentation/trace/coresight/coresight-trbe.rst @@ -0,0 +1,36 @@ +.. SPDX-License-Identifier: GPL-2.0
+============================== +Trace Buffer Extension (TRBE). +==============================
+Â Â Â :Author:Â Â Anshuman Khandual anshuman.khandual@arm.com +Â Â Â :Date:Â Â Â Â November 2020
+Hardware Description +--------------------
+Trace Buffer Extension (TRBE) is a percpu hardware which captures in system +memory, CPU traces generated from a corresponding percpu tracing unit. This +gets plugged in as a coresight sink device because the corresponding trace +genarators (ETE), are plugged in as source device.
+Sysfs files and directories +---------------------------
+The TRBE devices appear on the existing coresight bus alongside the other +coresight devices::
+Â Â Â >$ ls /sys/bus/coresight/devices +Â Â Â trbe0Â trbe1Â trbe2 trbe3
+The ``trbe<N>`` named TRBEs are associated with a CPU.::
+Â Â Â >$ ls /sys/bus/coresight/devices/trbe0/ +Â Â Â irq align dbm
+*Key file items are:-* +Â Â * ``irq``: TRBE maintenance interrupt number +Â Â * ``align``: TRBE write pointer alignment +Â Â * ``dbm``: TRBE updates memory with access and dirty flags
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 14cb156..61136f6 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -97,6 +97,7 @@  #define SET_PSTATE_UAO(x)       __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift))  #define SET_PSTATE_SSBS(x)       __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift))  #define SET_PSTATE_TCO(x)       __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift)) +#define TSB_CSYNC           __emit_inst(0xd503225f)   #define __SYS_BARRIER_INSN(CRm, op2, Rt) \      __emit_inst(0xd5000000 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & 0x1f)) @@ -865,6 +866,7 @@  #define ID_AA64MMFR2_CNP_SHIFT       0   /* id_aa64dfr0 */ +#define ID_AA64DFR0_TRBE_SHIFT       44  #define ID_AA64DFR0_TRACE_FILT_SHIFT   40  #define ID_AA64DFR0_DOUBLELOCK_SHIFT   36  #define ID_AA64DFR0_PMSVER_SHIFT   32 diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig index c119824..0f5e101 100644 --- a/drivers/hwtracing/coresight/Kconfig +++ b/drivers/hwtracing/coresight/Kconfig @@ -156,6 +156,17 @@ config CORESIGHT_CTI        To compile this driver as a module, choose M here: the        module will be called coresight-cti.  +config CORESIGHT_TRBE +   bool "Trace Buffer Extension (TRBE) driver" +   depends on ARM64 +   help +     This driver provides support for percpu Trace Buffer Extension (TRBE). +     TRBE always needs to be used along with it's corresponding percpu ETE +     component. ETE generates trace data which is then captured with TRBE. +     Unlike traditional sink devices, TRBE is a CPU feature accessible via +     system registers. But it's explicit dependency with trace unit (ETE) +     requires it to be plugged in as a coresight sink device.
config CORESIGHT_CTI_INTEGRATION_REGS      bool "Access CTI CoreSight Integration Registers"      depends on CORESIGHT_CTI diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile index f20e357..d608165 100644 --- a/drivers/hwtracing/coresight/Makefile +++ b/drivers/hwtracing/coresight/Makefile @@ -21,5 +21,6 @@ obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o  obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o  obj-$(CONFIG_CORESIGHT_CATU) += coresight-catu.o  obj-$(CONFIG_CORESIGHT_CTI) += coresight-cti.o +obj-$(CONFIG_CORESIGHT_TRBE) += coresight-trbe.o  coresight-cti-y := coresight-cti-core.o   coresight-cti-platform.o \             coresight-cti-sysfs.o diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c new file mode 100644 index 0000000..48a8ec3 --- /dev/null +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -0,0 +1,766 @@ +// SPDX-License-Identifier: GPL-2.0 +/*
- This driver enables Trace Buffer Extension (TRBE) as a per-cpu coresight
- sink device could then pair with an appropriate per-cpu coresight source
- device (ETE) thus generating required trace data. Trace can be enabled
- via the perf framework.
- Copyright (C) 2020 ARM Ltd.
- Author: Anshuman Khandual anshuman.khandual@arm.com
- */
+#define DRVNAME "arm_trbe"
+#define pr_fmt(fmt) DRVNAME ": " fmt
+#include "coresight-trbe.h"
+#define PERF_IDX2OFF(idx, buf) ((idx) % ((buf)->nr_pages << PAGE_SHIFT))
+#define ETE_IGNORE_PACKET 0x70
Add a comment here, on what this means to the decoder.
Sure, will add.
+static const char trbe_name[] = "trbe";
Why not
#define DEVNAMEÂ Â Â "trbe"
That can be replaced but we already define DRVNAME which gets used for naming the TRBE interrupt that shows up in /proc/interrupts. But it is "arm_trbe" instead. Should /sys/bus/coresight/devices/ list TRBE devices as "arm_trbeN" ? If so, DRVNAME can be used without any problem. Should DRVNAME be changed to just "trbe" instead ? But it makes sense to have the same name for TRBE devices and the interrupt.
+enum trbe_fault_action { +Â Â Â TRBE_FAULT_ACT_WRAP, +Â Â Â TRBE_FAULT_ACT_SPURIOUS, +Â Â Â TRBE_FAULT_ACT_FATAL, +};
+struct trbe_perf {
Please rename this to trbe_buf. This will be used for sysfs mode as well.
Sure, will do.
+Â Â Â unsigned long trbe_base; +Â Â Â unsigned long trbe_limit; +Â Â Â unsigned long trbe_write; +Â Â Â pid_t pid;
Why do we need this ? This seems unused and moreover, there cannot be multiple tracers into TRBE. So, we don't need to share the sink unlike the traditional ones.
Sure, will drop.
+Â Â Â int nr_pages; +Â Â Â void **pages; +Â Â Â bool snapshot; +Â Â Â struct trbe_cpudata *cpudata; +};
+struct trbe_cpudata { +   struct coresight_device   *csdev; +   bool trbe_dbm;
Why do we need this ?
This is an internal implementation characteristic which should be presented to the user space via sysfs for better understanding and probably for debug purpose. The current proposal does not support the scenario when TRBE DBM is off, which we need to incorporate later on. Hence lets just leave this as is for now.
+Â Â Â u64 trbe_align; +Â Â Â int cpu; +Â Â Â enum cs_mode mode; +Â Â Â struct trbe_perf *perf; +Â Â Â struct trbe_drvdata *drvdata; +};
+struct trbe_drvdata { +Â Â Â struct trbe_cpudata __percpu *cpudata; +Â Â Â struct perf_output_handle __percpu *handle;
Shouldn't this be :
struct perf_output_handle __percpu **handle ?
as we get a handle from the etm-perf and is not controlled by the TRBE ?
Sure, will change this.
+Â Â Â struct hlist_node hotplug_node; +Â Â Â int irq; +Â Â Â cpumask_t supported_cpus; +Â Â Â enum cpuhp_state trbe_online; +Â Â Â struct platform_device *pdev; +Â Â Â struct clk *atclk;
We don't have any clocks for the TRBE instance. Please remove.
Sure, will drop.
+};
+static int trbe_alloc_node(struct perf_event *event) +{ +Â Â Â if (event->cpu == -1) +Â Â Â Â Â Â Â return NUMA_NO_NODE; +Â Â Â return cpu_to_node(event->cpu); +}
+static void trbe_disable_and_drain_local(void) +{ +Â Â Â write_sysreg_s(0, SYS_TRBLIMITR_EL1); +Â Â Â isb(); +Â Â Â dsb(nsh); +Â Â Â asm(TSB_CSYNC); +}
+static void trbe_reset_local(void) +{ +Â Â Â trbe_disable_and_drain_local(); +Â Â Â write_sysreg_s(0, SYS_TRBPTR_EL1); +Â Â Â isb();
+Â Â Â write_sysreg_s(0, SYS_TRBBASER_EL1); +Â Â Â isb();
+Â Â Â write_sysreg_s(0, SYS_TRBSR_EL1); +Â Â Â isb(); +}
+static void trbe_pad_buf(struct perf_output_handle *handle, int len) +{ +Â Â Â struct trbe_perf *perf = etm_perf_sink_config(handle); +Â Â Â u64 head = PERF_IDX2OFF(handle->head, perf);
+Â Â Â memset((void *) perf->trbe_base + head, ETE_IGNORE_PACKET, len); +Â Â Â if (!perf->snapshot) +Â Â Â Â Â Â Â perf_aux_output_skip(handle, len); +}
+static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle) +{ +Â Â Â struct trbe_perf *perf = etm_perf_sink_config(handle); +Â Â Â u64 head = PERF_IDX2OFF(handle->head, perf); +Â Â Â u64 limit = perf->nr_pages * PAGE_SIZE;
So we are using half of the buffer for snapshot mode to avoid a case where the analyzer is unable to decode the trace in case of an overflow.
Right.
+Â Â Â if (head < limit >> 1) +Â Â Â Â Â Â Â limit >>= 1;
Also this needs to be thought out. We may not need this restriction. The trace decoder will be able to walk forward and then find a synchronization packet and then continue the tracing from there. So, we could use the entire buffer for TRBE.
Okay. May be we could just go with half the TRBE buffer for now and later on, use the entire buffer after better understanding on this ?
+Â Â Â return limit; +}
+static unsigned long trbe_normal_offset(struct perf_output_handle *handle) +{ +Â Â Â struct trbe_perf *perf = etm_perf_sink_config(handle); +Â Â Â struct trbe_cpudata *cpudata = perf->cpudata; +Â Â Â const u64 bufsize = perf->nr_pages * PAGE_SIZE; +Â Â Â u64 limit = bufsize; +Â Â Â u64 head, tail, wakeup;
Commentary please.
Sure, will add some.
+Â Â Â head = PERF_IDX2OFF(handle->head, perf); +Â Â Â if (!IS_ALIGNED(head, cpudata->trbe_align)) { +Â Â Â Â Â Â Â unsigned long delta = roundup(head, cpudata->trbe_align) - head;
+Â Â Â Â Â Â Â delta = min(delta, handle->size); +Â Â Â Â Â Â Â trbe_pad_buf(handle, delta); +Â Â Â Â Â Â Â head = PERF_IDX2OFF(handle->head, perf); +Â Â Â }
+Â Â Â if (!handle->size) { +Â Â Â Â Â Â Â perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED); +Â Â Â Â Â Â Â return 0; +Â Â Â }
+Â Â Â tail = PERF_IDX2OFF(handle->head + handle->size, perf); +Â Â Â wakeup = PERF_IDX2OFF(handle->wakeup, perf);
+Â Â Â if (head < tail)
comment
+Â Â Â Â Â Â Â limit = round_down(tail, PAGE_SIZE);
+Â Â Â if (handle->wakeup < (handle->head + handle->size) && head <= wakeup) +Â Â Â Â Â Â Â limit = min(limit, round_up(wakeup, PAGE_SIZE));
comment. Also do we need an alignement to PAGE_SIZE ?
Limit has to be always PAGE_SIZE aligned because its eventually going to be the TRBE limit pointer, after getting added into the TRBE base pointer. Will add some more comment here as well.
+Â Â Â if (limit > head) +Â Â Â Â Â Â Â return limit;
+Â Â Â trbe_pad_buf(handle, handle->size); +Â Â Â perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED); +Â Â Â return 0; +}
+static unsigned long get_trbe_limit(struct perf_output_handle *handle) +{ +Â Â Â struct trbe_perf *perf = etm_perf_sink_config(handle); +Â Â Â unsigned long offset;
+Â Â Â if (perf->snapshot) +Â Â Â Â Â Â Â offset = trbe_snapshot_offset(handle); +Â Â Â else +Â Â Â Â Â Â Â offset = trbe_normal_offset(handle); +Â Â Â return perf->trbe_base + offset; +}
+static void trbe_enable_hw(struct trbe_perf *perf) +{ +Â Â Â WARN_ON(perf->trbe_write < perf->trbe_base); +Â Â Â WARN_ON(perf->trbe_write >= perf->trbe_limit); +Â Â Â set_trbe_disabled(); +Â Â Â clr_trbe_irq(); +Â Â Â clr_trbe_wrap(); +Â Â Â clr_trbe_abort(); +Â Â Â clr_trbe_ec(); +Â Â Â clr_trbe_bsc(); +Â Â Â clr_trbe_fsc();
Please merge all of these field updates to single register update unless mandated by the architecture.
Sure, will do.
+Â Â Â set_trbe_virtual_mode(); +Â Â Â set_trbe_fill_mode(TRBE_FILL_STOP); +Â Â Â set_trbe_trig_mode(TRBE_TRIGGER_IGNORE);
Same here ^^
Sure, will do.
+Â Â Â isb(); +Â Â Â set_trbe_base_pointer(perf->trbe_base); +Â Â Â set_trbe_limit_pointer(perf->trbe_limit); +Â Â Â set_trbe_write_pointer(perf->trbe_write); +Â Â Â isb(); +Â Â Â dsb(ishst); +Â Â Â flush_tlb_all();
Why is this needed ?
Will drop flush_tlb_all().
+Â Â Â set_trbe_running(); +Â Â Â set_trbe_enabled(); +Â Â Â asm(TSB_CSYNC); +}
+static void *arm_trbe_alloc_buffer(struct coresight_device *csdev, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â struct perf_event *event, void **pages, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â int nr_pages, bool snapshot) +{ +Â Â Â struct trbe_perf *perf; +Â Â Â struct page **pglist; +Â Â Â int i;
+Â Â Â if ((nr_pages < 2) || (snapshot && (nr_pages & 1)))
We may be able to remove the restriction on snapshot mode, see my comment above.
Sure, will drop when the entire buffer is used for the snapshot mode.
+Â Â Â Â Â Â Â return NULL;
+Â Â Â perf = kzalloc_node(sizeof(*perf), GFP_KERNEL, trbe_alloc_node(event)); +Â Â Â if (IS_ERR(perf)) +Â Â Â Â Â Â Â return ERR_PTR(-ENOMEM);
+Â Â Â pglist = kcalloc(nr_pages, sizeof(*pglist), GFP_KERNEL); +Â Â Â if (IS_ERR(pglist)) { +Â Â Â Â Â Â Â kfree(perf); +Â Â Â Â Â Â Â return ERR_PTR(-ENOMEM); +Â Â Â }
+Â Â Â for (i = 0; i < nr_pages; i++) +Â Â Â Â Â Â Â pglist[i] = virt_to_page(pages[i]);
+Â Â Â perf->trbe_base = (unsigned long) vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL); +Â Â Â if (IS_ERR((void *) perf->trbe_base)) { +Â Â Â Â Â Â Â kfree(pglist); +Â Â Â Â Â Â Â kfree(perf); +Â Â Â Â Â Â Â return ERR_PTR(perf->trbe_base); +Â Â Â } +Â Â Â perf->trbe_limit = perf->trbe_base + nr_pages * PAGE_SIZE; +Â Â Â perf->trbe_write = perf->trbe_base; +Â Â Â perf->pid = task_pid_nr(event->owner); +Â Â Â perf->snapshot = snapshot; +Â Â Â perf->nr_pages = nr_pages; +Â Â Â perf->pages = pages; +Â Â Â kfree(pglist); +Â Â Â return perf; +}
+void arm_trbe_free_buffer(void *config) +{ +Â Â Â struct trbe_perf *perf = config;
+Â Â Â vunmap((void *) perf->trbe_base); +Â Â Â kfree(perf); +}
+static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â struct perf_output_handle *handle, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â void *config) +{ +Â Â Â struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); +Â Â Â struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev); +Â Â Â struct trbe_perf *perf = config; +Â Â Â unsigned long size, offset;
+Â Â Â WARN_ON(perf->cpudata != cpudata); +Â Â Â WARN_ON(cpudata->cpu != smp_processor_id()); +Â Â Â WARN_ON(cpudata->mode != CS_MODE_PERF); +Â Â Â WARN_ON(cpudata->drvdata != drvdata);
+Â Â Â offset = get_trbe_write_pointer() - get_trbe_base_pointer(); +Â Â Â size = offset - PERF_IDX2OFF(handle->head, perf); +Â Â Â if (perf->snapshot) +Â Â Â Â Â Â Â handle->head += size; +Â Â Â return size; +}
+static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data) +{ +Â Â Â struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); +Â Â Â struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev); +Â Â Â struct perf_output_handle *handle = data; +Â Â Â struct trbe_perf *perf = etm_perf_sink_config(handle);
+Â Â Â WARN_ON(cpudata->cpu != smp_processor_id()); +Â Â Â WARN_ON(mode != CS_MODE_PERF);
Why WARN_ON ? Simply return -EINVAL ? Also you need a check to make sure the mode is DISABLED (when you get to sysfs mode).
+Â Â Â WARN_ON(cpudata->drvdata != drvdata);
+Â Â Â *this_cpu_ptr(drvdata->handle) = *handle;
That is wrong. Storing a local copy of a global perf generic structure is calling for trouble, assuming that the global structure doesn't change beneath us. Please store handle ptr.
Sure, will change.
+Â Â Â cpudata->perf = perf; +Â Â Â cpudata->mode = mode; +Â Â Â perf->cpudata = cpudata; +Â Â Â perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf); +Â Â Â perf->trbe_limit = get_trbe_limit(handle); +Â Â Â if (perf->trbe_limit == perf->trbe_base) { +Â Â Â Â Â Â Â trbe_disable_and_drain_local(); +Â Â Â Â Â Â Â return 0; +Â Â Â } +Â Â Â trbe_enable_hw(perf); +Â Â Â return 0; +}
+static int arm_trbe_disable(struct coresight_device *csdev) +{ +Â Â Â struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); +Â Â Â struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev); +Â Â Â struct trbe_perf *perf = cpudata->perf;
+Â Â Â WARN_ON(perf->cpudata != cpudata); +Â Â Â WARN_ON(cpudata->cpu != smp_processor_id()); +Â Â Â WARN_ON(cpudata->mode != CS_MODE_PERF); +Â Â Â WARN_ON(cpudata->drvdata != drvdata);
+Â Â Â trbe_disable_and_drain_local(); +Â Â Â perf->cpudata = NULL; +Â Â Â cpudata->perf = NULL; +Â Â Â cpudata->mode = CS_MODE_DISABLED; +Â Â Â return 0; +}
+static void trbe_handle_fatal(struct perf_output_handle *handle) +{ +Â Â Â perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED); +Â Â Â perf_aux_output_end(handle, 0); +Â Â Â trbe_disable_and_drain_local(); +}
+static void trbe_handle_spurious(struct perf_output_handle *handle) +{ +Â Â Â struct trbe_perf *perf = etm_perf_sink_config(handle);
+Â Â Â perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf); +Â Â Â perf->trbe_limit = get_trbe_limit(handle); +Â Â Â if (perf->trbe_limit == perf->trbe_base) { +Â Â Â Â Â Â Â trbe_disable_and_drain_local(); +Â Â Â Â Â Â Â return; +Â Â Â } +Â Â Â trbe_enable_hw(perf); +}
+static void trbe_handle_overflow(struct perf_output_handle *handle) +{ +Â Â Â struct perf_event *event = handle->event; +Â Â Â struct trbe_perf *perf = etm_perf_sink_config(handle); +Â Â Â unsigned long offset, size; +Â Â Â struct etm_event_data *event_data;
+Â Â Â offset = get_trbe_limit_pointer() - get_trbe_base_pointer(); +Â Â Â size = offset - PERF_IDX2OFF(handle->head, perf); +Â Â Â if (perf->snapshot) +Â Â Â Â Â Â Â handle->head = offset;
Is this correct ? Or was this supposed to mean : Â Â Â Â Â Â Â handle->head += offset;
Hmm, not too sure about this but the SPE driver does the same in arm_spe_perf_aux_output_end().
+Â Â Â perf_aux_output_end(handle, size);
+Â Â Â event_data = perf_aux_output_begin(handle, event); +Â Â Â if (!event_data) { +Â Â Â Â Â Â Â event->hw.state |= PERF_HES_STOPPED; +Â Â Â Â Â Â Â trbe_disable_and_drain_local(); +Â Â Â Â Â Â Â return; +Â Â Â } +Â Â Â perf->trbe_write = perf->trbe_base; +Â Â Â perf->trbe_limit = get_trbe_limit(handle); +Â Â Â if (perf->trbe_limit == perf->trbe_base) { +Â Â Â Â Â Â Â trbe_disable_and_drain_local(); +Â Â Â Â Â Â Â return; +Â Â Â } +Â Â Â *this_cpu_ptr(perf->cpudata->drvdata->handle) = *handle; +Â Â Â trbe_enable_hw(perf); +}
+static bool is_perf_trbe(struct perf_output_handle *handle) +{ +Â Â Â struct trbe_perf *perf = etm_perf_sink_config(handle); +Â Â Â struct trbe_cpudata *cpudata = perf->cpudata; +Â Â Â struct trbe_drvdata *drvdata = cpudata->drvdata;
Can you trust the cpudata ptr here as we are still verifying if this was legitimate ?
It verifies the legitimacy of the interrupt as being generated from an active perf session on the cpu with some simple sanity checks. But all data structure linkage should be intact. The perf handle originates from the drvdata percpu structure which should have a trbe_perf and everything flows from there.
+Â Â Â int cpu = smp_processor_id();
+Â Â Â WARN_ON(perf->trbe_base != get_trbe_base_pointer()); +Â Â Â WARN_ON(perf->trbe_limit != get_trbe_limit_pointer());
+Â Â Â if (cpudata->mode != CS_MODE_PERF) +Â Â Â Â Â Â Â return false;
+Â Â Â if (cpudata->cpu != cpu) +Â Â Â Â Â Â Â return false;
+Â Â Â if (!cpumask_test_cpu(cpu, &drvdata->supported_cpus)) +Â Â Â Â Â Â Â return false;
+Â Â Â return true; +}
+static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle) +{ +Â Â Â enum trbe_ec ec = get_trbe_ec(); +Â Â Â enum trbe_bsc bsc = get_trbe_bsc();
+Â Â Â WARN_ON(is_trbe_running()); +Â Â Â asm(TSB_CSYNC); +Â Â Â dsb(nsh); +Â Â Â isb();
+Â Â Â if (is_trbe_trg() || is_trbe_abort()) +Â Â Â Â Â Â Â return TRBE_FAULT_ACT_FATAL;
+Â Â Â if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT)) +Â Â Â Â Â Â Â return TRBE_FAULT_ACT_FATAL;
+Â Â Â if (is_trbe_wrap() && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) { +Â Â Â Â Â Â Â if (get_trbe_write_pointer() == get_trbe_base_pointer()) +Â Â Â Â Â Â Â Â Â Â Â return TRBE_FAULT_ACT_WRAP; +Â Â Â } +Â Â Â return TRBE_FAULT_ACT_SPURIOUS; +}
+static irqreturn_t arm_trbe_irq_handler(int irq, void *dev) +{ +Â Â Â struct perf_output_handle *handle = dev; +Â Â Â enum trbe_fault_action act;
+Â Â Â WARN_ON(!is_trbe_irq()); +Â Â Â clr_trbe_irq();
+Â Â Â if (!perf_get_aux(handle)) +Â Â Â Â Â Â Â return IRQ_NONE;
+Â Â Â if (!is_perf_trbe(handle)) +Â Â Â Â Â Â Â return IRQ_NONE;
+Â Â Â irq_work_run();
+Â Â Â act = trbe_get_fault_act(handle); +Â Â Â switch (act) { +Â Â Â case TRBE_FAULT_ACT_WRAP: +Â Â Â Â Â Â Â trbe_handle_overflow(handle); +Â Â Â Â Â Â Â break; +Â Â Â case TRBE_FAULT_ACT_SPURIOUS: +Â Â Â Â Â Â Â trbe_handle_spurious(handle); +Â Â Â Â Â Â Â break; +Â Â Â case TRBE_FAULT_ACT_FATAL: +Â Â Â Â Â Â Â trbe_handle_fatal(handle); +Â Â Â Â Â Â Â break; +Â Â Â } +Â Â Â return IRQ_HANDLED; +}
+static void arm_trbe_probe_coresight_cpu(void *info) +{ +Â Â Â struct trbe_cpudata *cpudata = info; +Â Â Â struct device *dev = &cpudata->drvdata->pdev->dev; +Â Â Â struct coresight_desc desc = { 0 };
+Â Â Â if (WARN_ON(!cpudata)) +Â Â Â Â Â Â Â goto cpu_clear;
+Â Â Â if (!is_trbe_available()) { +Â Â Â Â Â Â Â pr_err("TRBE is not implemented on cpu %d\n", cpudata->cpu); +Â Â Â Â Â Â Â goto cpu_clear; +Â Â Â }
+Â Â Â if (!is_trbe_programmable()) { +Â Â Â Â Â Â Â pr_err("TRBE is owned in higher exception level on cpu %d\n", cpudata->cpu); +Â Â Â Â Â Â Â goto cpu_clear; +Â Â Â } +Â Â Â desc.name = devm_kasprintf(dev, GFP_KERNEL, "%s%d", trbe_name, smp_processor_id()); +Â Â Â if (IS_ERR(desc.name)) +Â Â Â Â Â Â Â goto cpu_clear;
+Â Â Â desc.type = CORESIGHT_DEV_TYPE_SINK; +Â Â Â desc.subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_SYSMEM;
May be should add a new subtype to make this higher priority than the normal ETR. Something like :
CORESIGHT_DEV_SUBTYPE_SINK_PERCPU_SYSMEM
Sure, will do.
+Â Â Â desc.ops = &arm_trbe_cs_ops; +Â Â Â desc.pdata = dev_get_platdata(dev); +Â Â Â desc.groups = arm_trbe_groups; +Â Â Â desc.dev = dev; +Â Â Â cpudata->csdev = coresight_register(&desc); +Â Â Â if (IS_ERR(cpudata->csdev)) +Â Â Â Â Â Â Â goto cpu_clear;
+Â Â Â dev_set_drvdata(&cpudata->csdev->dev, cpudata); +Â Â Â cpudata->trbe_dbm = get_trbe_flag_update(); +Â Â Â cpudata->trbe_align = 1ULL << get_trbe_address_align(); +Â Â Â if (cpudata->trbe_align > SZ_2K) { +Â Â Â Â Â Â Â pr_err("Unsupported alignment on cpu %d\n", cpudata->cpu); +Â Â Â Â Â Â Â goto cpu_clear; +Â Â Â } +Â Â Â return; +cpu_clear: +Â Â Â cpumask_clear_cpu(cpudata->cpu, &cpudata->drvdata->supported_cpus); +}
+static int arm_trbe_probe_coresight(struct trbe_drvdata *drvdata) +{ +Â Â Â struct trbe_cpudata *cpudata; +Â Â Â int cpu;
+Â Â Â drvdata->cpudata = alloc_percpu(typeof(*drvdata->cpudata)); +Â Â Â if (IS_ERR(drvdata->cpudata)) +Â Â Â Â Â Â Â return PTR_ERR(drvdata->cpudata);
+Â Â Â for_each_cpu(cpu, &drvdata->supported_cpus) { +Â Â Â Â Â Â Â cpudata = per_cpu_ptr(drvdata->cpudata, cpu); +Â Â Â Â Â Â Â cpudata->cpu = cpu; +Â Â Â Â Â Â Â cpudata->drvdata = drvdata; +Â Â Â Â Â Â Â smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);
We could batch it and run it on all CPUs at the same time ? Also it would be better to leave the per_cpu area filled by the CPU itself, to avoid racing.
Sure, will re-organize the entire CPU probing/removal and also the CPU online/offline path. Planning to use smp_call_function_many() instead for a simultaneous init.
+Â Â Â } +Â Â Â return 0; +}
+static void arm_trbe_remove_coresight_cpu(void *info) +{ +Â Â Â struct trbe_drvdata *drvdata = info;
+Â Â Â disable_percpu_irq(drvdata->irq); +}
+static int arm_trbe_remove_coresight(struct trbe_drvdata *drvdata) +{ +Â Â Â struct trbe_cpudata *cpudata; +Â Â Â int cpu;
+Â Â Â for_each_cpu(cpu, &drvdata->supported_cpus) { +Â Â Â Â Â Â Â smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu, drvdata, 1); +Â Â Â Â Â Â Â cpudata = per_cpu_ptr(drvdata->cpudata, cpu); +Â Â Â Â Â Â Â if (cpudata->csdev) { +Â Â Â Â Â Â Â Â Â Â Â coresight_unregister(cpudata->csdev); +Â Â Â Â Â Â Â Â Â Â Â cpudata->drvdata = NULL; +Â Â Â Â Â Â Â Â Â Â Â cpudata->csdev = NULL; +Â Â Â Â Â Â Â }
Please leave this to the CPU to do the part.
Sure, will do.
+Â Â Â } +Â Â Â free_percpu(drvdata->cpudata); +Â Â Â return 0; +}
+static int arm_trbe_cpu_startup(unsigned int cpu, struct hlist_node *node) +{ +Â Â Â struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node); +Â Â Â struct trbe_cpudata *cpudata;
+Â Â Â if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) { +Â Â Â Â Â Â Â cpudata = per_cpu_ptr(drvdata->cpudata, cpu); +Â Â Â Â Â Â Â if (!cpudata->csdev) { +Â Â Â Â Â Â Â Â Â Â Â cpudata->drvdata = drvdata; +Â Â Â Â Â Â Â Â Â Â Â smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);
Why do we need smp_call here ? We are already on the CPU.
We dont need, will drop.
+Â Â Â Â Â Â Â } +Â Â Â Â Â Â Â trbe_reset_local(); +Â Â Â Â Â Â Â enable_percpu_irq(drvdata->irq, IRQ_TYPE_NONE); +Â Â Â } +Â Â Â return 0; +}
+static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node) +{ +Â Â Â struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node); +Â Â Â struct trbe_cpudata *cpudata;
+Â Â Â if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) { +Â Â Â Â Â Â Â cpudata = per_cpu_ptr(drvdata->cpudata, cpu); +Â Â Â Â Â Â Â if (cpudata->csdev) { +Â Â Â Â Â Â Â Â Â Â Â coresight_unregister(cpudata->csdev); +Â Â Â Â Â Â Â Â Â Â Â cpudata->drvdata = NULL; +Â Â Â Â Â Â Â Â Â Â Â cpudata->csdev = NULL; +Â Â Â Â Â Â Â } +Â Â Â Â Â Â Â disable_percpu_irq(drvdata->irq); +Â Â Â Â Â Â Â trbe_reset_local(); +Â Â Â } +Â Â Â return 0; +}
+static int arm_trbe_probe_cpuhp(struct trbe_drvdata *drvdata) +{ +Â Â Â enum cpuhp_state trbe_online;
+Â Â Â trbe_online = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, DRVNAME, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â arm_trbe_cpu_startup, arm_trbe_cpu_teardown); +Â Â Â if (trbe_online < 0) +Â Â Â Â Â Â Â return -EINVAL;
+Â Â Â if (cpuhp_state_add_instance(trbe_online, &drvdata->hotplug_node)) +Â Â Â Â Â Â Â return -EINVAL;
+Â Â Â drvdata->trbe_online = trbe_online; +Â Â Â return 0; +}
+static void arm_trbe_remove_cpuhp(struct trbe_drvdata *drvdata) +{ +Â Â Â cpuhp_remove_multi_state(drvdata->trbe_online); +}
+static int arm_trbe_probe_irq(struct platform_device *pdev, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â struct trbe_drvdata *drvdata) +{ +Â Â Â drvdata->irq = platform_get_irq(pdev, 0); +Â Â Â if (!drvdata->irq) { +Â Â Â Â Â Â Â pr_err("IRQ not found for the platform device\n"); +Â Â Â Â Â Â Â return -ENXIO; +Â Â Â }
+Â Â Â if (!irq_is_percpu(drvdata->irq)) { +Â Â Â Â Â Â Â pr_err("IRQ is not a PPI\n"); +Â Â Â Â Â Â Â return -EINVAL; +Â Â Â }
+Â Â Â if (irq_get_percpu_devid_partition(drvdata->irq, &drvdata->supported_cpus)) +Â Â Â Â Â Â Â return -EINVAL;
+Â Â Â drvdata->handle = alloc_percpu(typeof(*drvdata->handle)); +Â Â Â if (!drvdata->handle) +Â Â Â Â Â Â Â return -ENOMEM;
+Â Â Â if (request_percpu_irq(drvdata->irq, arm_trbe_irq_handler, DRVNAME, drvdata->handle)) { +Â Â Â Â Â Â Â free_percpu(drvdata->handle); +Â Â Â Â Â Â Â return -EINVAL; +Â Â Â } +Â Â Â return 0; +}
+static void arm_trbe_remove_irq(struct trbe_drvdata *drvdata) +{ +Â Â Â free_percpu_irq(drvdata->irq, drvdata->handle); +Â Â Â free_percpu(drvdata->handle); +}
+static int arm_trbe_device_probe(struct platform_device *pdev) +{ +Â Â Â struct coresight_platform_data *pdata; +Â Â Â struct trbe_drvdata *drvdata; +Â Â Â struct device *dev = &pdev->dev; +Â Â Â int ret;
+Â Â Â drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL); +Â Â Â if (IS_ERR(drvdata)) +Â Â Â Â Â Â Â return -ENOMEM;
+Â Â Â pdata = coresight_get_platform_data(dev); +Â Â Â if (IS_ERR(pdata)) { +Â Â Â Â Â Â Â kfree(drvdata); +Â Â Â Â Â Â Â return -ENOMEM; +Â Â Â }
+Â Â Â drvdata->atclk = devm_clk_get(dev, "atclk"); +Â Â Â if (!IS_ERR(drvdata->atclk)) { +Â Â Â Â Â Â Â ret = clk_prepare_enable(drvdata->atclk); +Â Â Â Â Â Â Â if (ret) +Â Â Â Â Â Â Â Â Â Â Â return ret; +Â Â Â }
Please drop the clocks, we don't have any
Right, will drop the clock and also the power management support along with it.
+Â Â Â dev_set_drvdata(dev, drvdata); +Â Â Â dev->platform_data = pdata; +Â Â Â drvdata->pdev = pdev; +Â Â Â ret = arm_trbe_probe_irq(pdev, drvdata); +Â Â Â if (ret) +Â Â Â Â Â Â Â goto irq_failed;
+Â Â Â ret = arm_trbe_probe_coresight(drvdata); +Â Â Â if (ret) +Â Â Â Â Â Â Â goto probe_failed;
+Â Â Â ret = arm_trbe_probe_cpuhp(drvdata); +Â Â Â if (ret) +Â Â Â Â Â Â Â goto cpuhp_failed;
+Â Â Â return 0; +cpuhp_failed: +Â Â Â arm_trbe_remove_coresight(drvdata); +probe_failed: +Â Â Â arm_trbe_remove_irq(drvdata); +irq_failed: +Â Â Â kfree(pdata); +Â Â Â kfree(drvdata); +Â Â Â return ret; +}
+static int arm_trbe_device_remove(struct platform_device *pdev) +{ +Â Â Â struct coresight_platform_data *pdata = dev_get_platdata(&pdev->dev); +Â Â Â struct trbe_drvdata *drvdata = platform_get_drvdata(pdev);
+Â Â Â arm_trbe_remove_coresight(drvdata); +Â Â Â arm_trbe_remove_cpuhp(drvdata); +Â Â Â arm_trbe_remove_irq(drvdata); +Â Â Â kfree(pdata); +Â Â Â kfree(drvdata); +Â Â Â return 0; +}
+#ifdef CONFIG_PM +static int arm_trbe_runtime_suspend(struct device *dev) +{ +Â Â Â struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
+Â Â Â if (drvdata && !IS_ERR(drvdata->atclk)) +Â Â Â Â Â Â Â clk_disable_unprepare(drvdata->atclk);
Remove. We may need to save/restore the TRBE ptrs, depending on the TRBE.
Will drop it for now. Could revisit this later after the base functionality is up and running.
+Â Â Â return 0; +}
+static int arm_trbe_runtime_resume(struct device *dev) +{ +Â Â Â struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
+Â Â Â if (drvdata && !IS_ERR(drvdata->atclk)) +Â Â Â Â Â Â Â clk_prepare_enable(drvdata->atclk);
Remove. See above.
+Â Â Â return 0; +} +#endif
+static const struct dev_pm_ops arm_trbe_dev_pm_ops = { +Â Â Â SET_RUNTIME_PM_OPS(arm_trbe_runtime_suspend, arm_trbe_runtime_resume, NULL) +};
+static const struct of_device_id arm_trbe_of_match[] = { +   { .compatible = "arm,arm-trbe",   .data = (void *)1 }, +   {}, +};
I think it is better to call this, we have too many acronyms ;-)
"arm,trace-buffer-extension"
Sure, will change.
+MODULE_DEVICE_TABLE(of, arm_trbe_of_match);
+static const struct platform_device_id arm_trbe_match[] = { +Â Â Â { "arm,trbe", 0}, +Â Â Â { } +}; +MODULE_DEVICE_TABLE(platform, arm_trbe_match);
Please remove. The ACPI part can be added when we get to it.
Sure, will drop for now.
+static struct platform_driver arm_trbe_driver = { +   .id_table = arm_trbe_match, +   .driver   = { +       .name = DRVNAME, +       .of_match_table = of_match_ptr(arm_trbe_of_match), +       .pm = &arm_trbe_dev_pm_ops, +       .suppress_bind_attrs = true, +   }, +   .probe   = arm_trbe_device_probe, +   .remove   = arm_trbe_device_remove, +}; +builtin_platform_driver(arm_trbe_driver)
Please make this modular.
Will do.
diff --git a/drivers/hwtracing/coresight/coresight-trbe.h b/drivers/hwtracing/coresight/coresight-trbe.h new file mode 100644 index 0000000..82ffbfc --- /dev/null +++ b/drivers/hwtracing/coresight/coresight-trbe.h @@ -0,0 +1,525 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/*
- This contains all required hardware related helper functions for
- Trace Buffer Extension (TRBE) driver in the coresight framework.
- Copyright (C) 2020 ARM Ltd.
- Author: Anshuman Khandual anshuman.khandual@arm.com
- */
+#include <linux/coresight.h> +#include <linux/device.h> +#include <linux/irq.h> +#include <linux/kernel.h> +#include <linux/of.h> +#include <linux/platform_device.h> +#include <linux/smp.h>
+#include "coresight-etm-perf.h"
+static inline bool is_trbe_available(void) +{ +Â Â Â u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1); +Â Â Â int trbe = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_TRBE_SHIFT);
+Â Â Â return trbe >= 0b0001; +}
+static inline bool is_ete_available(void) +{ +Â Â Â u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1); +Â Â Â int tracever = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_TRACEVER_SHIFT);
+Â Â Â return (tracever != 0b0000);
Why is this needed ?
Sure, will drop.
+}
+static inline bool is_trbe_enabled(void) +{ +Â Â Â u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+Â Â Â return trblimitr & TRBLIMITR_ENABLE; +}
+enum trbe_ec { +Â Â Â TRBE_EC_OTHERSÂ Â Â Â Â Â Â = 0, +Â Â Â TRBE_EC_STAGE1_ABORTÂ Â Â = 36, +Â Â Â TRBE_EC_STAGE2_ABORTÂ Â Â = 37, +};
+static const char *const trbe_ec_str[] = { +Â Â Â [TRBE_EC_OTHERS]Â Â Â = "Maintenance exception", +Â Â Â [TRBE_EC_STAGE1_ABORT]Â Â Â = "Stage-1 exception", +Â Â Â [TRBE_EC_STAGE2_ABORT]Â Â Â = "Stage-2 exception", +};
Please remove the defintions that are not used by the driver.
Sure, will do.
+static inline enum trbe_ec get_trbe_ec(void) +{ +Â Â Â u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+Â Â Â return (trbsr >> TRBSR_EC_SHIFT) & TRBSR_EC_MASK; +}
+static inline void clr_trbe_ec(void) +{ +Â Â Â u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+Â Â Â trbsr &= ~(TRBSR_EC_MASK << TRBSR_EC_SHIFT); +Â Â Â write_sysreg_s(trbsr, SYS_TRBSR_EL1); +}
+enum trbe_bsc { +Â Â Â TRBE_BSC_NOT_STOPPEDÂ Â Â = 0, +Â Â Â TRBE_BSC_FILLEDÂ Â Â Â Â Â Â = 1, +Â Â Â TRBE_BSC_TRIGGEREDÂ Â Â = 2, +};
+static const char *const trbe_bsc_str[] = { +Â Â Â [TRBE_BSC_NOT_STOPPED]Â Â Â = "TRBE collection not stopped", +Â Â Â [TRBE_BSC_FILLED]Â Â Â = "TRBE filled", +Â Â Â [TRBE_BSC_TRIGGERED]Â Â Â = "TRBE triggered", +};
+static inline enum trbe_bsc get_trbe_bsc(void) +{ +Â Â Â u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+Â Â Â return (trbsr >> TRBSR_BSC_SHIFT) & TRBSR_BSC_MASK; +}
+static inline void clr_trbe_bsc(void) +{ +Â Â Â u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+Â Â Â trbsr &= ~(TRBSR_BSC_MASK << TRBSR_BSC_SHIFT); +Â Â Â write_sysreg_s(trbsr, SYS_TRBSR_EL1); +}
+enum trbe_fsc { +Â Â Â TRBE_FSC_ASF_LEVEL0Â Â Â = 0, +Â Â Â TRBE_FSC_ASF_LEVEL1Â Â Â = 1, +Â Â Â TRBE_FSC_ASF_LEVEL2Â Â Â = 2, +Â Â Â TRBE_FSC_ASF_LEVEL3Â Â Â = 3, +Â Â Â TRBE_FSC_TF_LEVEL0Â Â Â = 4, +Â Â Â TRBE_FSC_TF_LEVEL1Â Â Â = 5, +Â Â Â TRBE_FSC_TF_LEVEL2Â Â Â = 6, +Â Â Â TRBE_FSC_TF_LEVEL3Â Â Â = 7, +Â Â Â TRBE_FSC_AFF_LEVEL0Â Â Â = 8, +Â Â Â TRBE_FSC_AFF_LEVEL1Â Â Â = 9, +Â Â Â TRBE_FSC_AFF_LEVEL2Â Â Â = 10, +Â Â Â TRBE_FSC_AFF_LEVEL3Â Â Â = 11, +Â Â Â TRBE_FSC_PF_LEVEL0Â Â Â = 12, +Â Â Â TRBE_FSC_PF_LEVEL1Â Â Â = 13, +Â Â Â TRBE_FSC_PF_LEVEL2Â Â Â = 14, +Â Â Â TRBE_FSC_PF_LEVEL3Â Â Â = 15, +Â Â Â TRBE_FSC_SEA_WRITEÂ Â Â = 16, +Â Â Â TRBE_FSC_ASEA_WRITEÂ Â Â = 17, +Â Â Â TRBE_FSC_SEA_LEVEL0Â Â Â = 20, +Â Â Â TRBE_FSC_SEA_LEVEL1Â Â Â = 21, +Â Â Â TRBE_FSC_SEA_LEVEL2Â Â Â = 22, +Â Â Â TRBE_FSC_SEA_LEVEL3Â Â Â = 23, +Â Â Â TRBE_FSC_ALIGN_FAULTÂ Â Â = 33, +Â Â Â TRBE_FSC_TLB_FAULTÂ Â Â = 48, +Â Â Â TRBE_FSC_ATOMIC_FAULTÂ Â Â = 49, +};
Please remove ^^^
Sure, will do.
+static const char *const trbe_fsc_str[] = { +Â Â Â [TRBE_FSC_ASF_LEVEL0]Â Â Â = "Address size fault - level 0", +Â Â Â [TRBE_FSC_ASF_LEVEL1]Â Â Â = "Address size fault - level 1", +Â Â Â [TRBE_FSC_ASF_LEVEL2]Â Â Â = "Address size fault - level 2", +Â Â Â [TRBE_FSC_ASF_LEVEL3]Â Â Â = "Address size fault - level 3", +Â Â Â [TRBE_FSC_TF_LEVEL0]Â Â Â = "Translation fault - level 0", +Â Â Â [TRBE_FSC_TF_LEVEL1]Â Â Â = "Translation fault - level 1", +Â Â Â [TRBE_FSC_TF_LEVEL2]Â Â Â = "Translation fault - level 2", +Â Â Â [TRBE_FSC_TF_LEVEL3]Â Â Â = "Translation fault - level 3", +Â Â Â [TRBE_FSC_AFF_LEVEL0]Â Â Â = "Access flag fault - level 0", +Â Â Â [TRBE_FSC_AFF_LEVEL1]Â Â Â = "Access flag fault - level 1", +Â Â Â [TRBE_FSC_AFF_LEVEL2]Â Â Â = "Access flag fault - level 2", +Â Â Â [TRBE_FSC_AFF_LEVEL3]Â Â Â = "Access flag fault - level 3", +Â Â Â [TRBE_FSC_PF_LEVEL0]Â Â Â = "Permission fault - level 0", +Â Â Â [TRBE_FSC_PF_LEVEL1]Â Â Â = "Permission fault - level 1", +Â Â Â [TRBE_FSC_PF_LEVEL2]Â Â Â = "Permission fault - level 2", +Â Â Â [TRBE_FSC_PF_LEVEL3]Â Â Â = "Permission fault - level 3", +Â Â Â [TRBE_FSC_SEA_WRITE]Â Â Â = "Synchronous external abort on write", +Â Â Â [TRBE_FSC_ASEA_WRITE]Â Â Â = "Asynchronous external abort on write", +Â Â Â [TRBE_FSC_SEA_LEVEL0]Â Â Â = "Syncrhonous external abort on table walk - level 0", +Â Â Â [TRBE_FSC_SEA_LEVEL1]Â Â Â = "Syncrhonous external abort on table walk - level 1", +Â Â Â [TRBE_FSC_SEA_LEVEL2]Â Â Â = "Syncrhonous external abort on table walk - level 2", +Â Â Â [TRBE_FSC_SEA_LEVEL3]Â Â Â = "Syncrhonous external abort on table walk - level 3", +Â Â Â [TRBE_FSC_ALIGN_FAULT]Â Â Â = "Alignment fault", +Â Â Â [TRBE_FSC_TLB_FAULT]Â Â Â = "TLB conflict fault", +Â Â Â [TRBE_FSC_ATOMIC_FAULT]Â Â Â = "Atmoc fault", +};
Please remove ^^^
Sure, will do.
+enum trbe_address_mode { +Â Â Â TRBE_ADDRESS_VIRTUAL, +Â Â Â TRBE_ADDRESS_PHYSICAL, +};
#define please.
+static const char *const trbe_address_mode_str[] = { +Â Â Â [TRBE_ADDRESS_VIRTUAL]Â Â Â = "Address mode - virtual", +Â Â Â [TRBE_ADDRESS_PHYSICAL]Â Â Â = "Address mode - physical", +};
Do we need this ? We always use virtual.
+static inline bool is_trbe_virtual_mode(void) +{ +Â Â Â u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+Â Â Â return !(trblimitr & TRBLIMITR_NVM); +}
Remove
Sure, will do.
+static inline bool is_trbe_physical_mode(void) +{ +Â Â Â u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+Â Â Â return trblimitr & TRBLIMITR_NVM; +}
Remove
Sure, will do.
+static inline void set_trbe_virtual_mode(void) +{ +Â Â Â u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+Â Â Â trblimitr &= ~TRBLIMITR_NVM; +Â Â Â write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1); +}
+static inline void set_trbe_physical_mode(void) +{ +Â Â Â u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+Â Â Â trblimitr |= TRBLIMITR_NVM; +Â Â Â write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1); +}
Remove
Sure, will do.
+enum trbe_trig_mode { +Â Â Â TRBE_TRIGGER_STOPÂ Â Â = 0, +Â Â Â TRBE_TRIGGER_IRQÂ Â Â = 1, +Â Â Â TRBE_TRIGGER_IGNOREÂ Â Â = 3, +};
+static const char *const trbe_trig_mode_str[] = { +Â Â Â [TRBE_TRIGGER_STOP]Â Â Â = "Trigger mode - stop", +Â Â Â [TRBE_TRIGGER_IRQ]Â Â Â = "Trigger mode - irq", +Â Â Â [TRBE_TRIGGER_IGNORE]Â Â Â = "Trigger mode - ignore", +};
+static inline enum trbe_trig_mode get_trbe_trig_mode(void) +{ +Â Â Â u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+Â Â Â return (trblimitr >> TRBLIMITR_TRIG_MODE_SHIFT) & TRBLIMITR_TRIG_MODE_MASK; +}
+static inline void set_trbe_trig_mode(enum trbe_trig_mode mode) +{ +Â Â Â u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+Â Â Â trblimitr &= ~(TRBLIMITR_TRIG_MODE_MASK << TRBLIMITR_TRIG_MODE_SHIFT); +Â Â Â trblimitr |= ((mode & TRBLIMITR_TRIG_MODE_MASK) << TRBLIMITR_TRIG_MODE_SHIFT); +Â Â Â write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1); +}
+enum trbe_fill_mode { +Â Â Â TRBE_FILL_STOPÂ Â Â Â Â Â Â = 0, +Â Â Â TRBE_FILL_WRAPÂ Â Â Â Â Â Â = 1, +Â Â Â TRBE_FILL_CIRCULARÂ Â Â = 3, +};
Please use #define
These are predefined constrained values which kind of makes them a set. An enumeration seems to be a better representation.
+static const char *const trbe_fill_mode_str[] = { +Â Â Â [TRBE_FILL_STOP]Â Â Â = "Buffer mode - stop", +Â Â Â [TRBE_FILL_WRAP]Â Â Â = "Buffer mode - wrap", +Â Â Â [TRBE_FILL_CIRCULAR]Â Â Â = "Buffer mode - circular", +};
+static inline enum trbe_fill_mode get_trbe_fill_mode(void) +{ +Â Â Â u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+Â Â Â return (trblimitr >> TRBLIMITR_FILL_MODE_SHIFT) & TRBLIMITR_FILL_MODE_MASK; +}
+static inline void set_trbe_fill_mode(enum trbe_fill_mode mode) +{ +Â Â Â u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+Â Â Â trblimitr &= ~(TRBLIMITR_FILL_MODE_MASK << TRBLIMITR_FILL_MODE_SHIFT); +Â Â Â trblimitr |= ((mode & TRBLIMITR_FILL_MODE_MASK) << TRBLIMITR_FILL_MODE_SHIFT); +Â Â Â write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1); +}
+static inline void set_trbe_disabled(void) +{ +Â Â Â u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+Â Â Â trblimitr &= ~TRBLIMITR_ENABLE; +Â Â Â write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1); +}
+static inline void set_trbe_enabled(void) +{ +Â Â Â u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+Â Â Â trblimitr |= TRBLIMITR_ENABLE; +Â Â Â write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1); +}
+static inline bool get_trbe_flag_update(void) +{ +Â Â Â u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
+Â Â Â return trbidr & TRBIDR_FLAG; +}
+static inline bool is_trbe_programmable(void) +{ +Â Â Â u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
+Â Â Â return !(trbidr & TRBIDR_PROG); +} +# +enum trbe_buffer_align { +Â Â Â TRBE_BUFFER_BYTE, +Â Â Â TRBE_BUFFER_HALF_WORD, +Â Â Â TRBE_BUFFER_WORD, +Â Â Â TRBE_BUFFER_DOUBLE_WORD, +Â Â Â TRBE_BUFFER_16_BYTES, +Â Â Â TRBE_BUFFER_32_BYTES, +Â Â Â TRBE_BUFFER_64_BYTES, +Â Â Â TRBE_BUFFER_128_BYTES, +Â Â Â TRBE_BUFFER_256_BYTES, +Â Â Â TRBE_BUFFER_512_BYTES, +Â Â Â TRBE_BUFFER_1K_BYTES, +Â Â Â TRBE_BUFFER_2K_BYTES, +};
Remove ^^
Sure, will do.
+static const char *const trbe_buffer_align_str[] = { +Â Â Â [TRBE_BUFFER_BYTE]Â Â Â Â Â Â Â = "Byte", +Â Â Â [TRBE_BUFFER_HALF_WORD]Â Â Â Â Â Â Â = "Half word", +Â Â Â [TRBE_BUFFER_WORD]Â Â Â Â Â Â Â = "Word", +Â Â Â [TRBE_BUFFER_DOUBLE_WORD]Â Â Â = "Double word", +Â Â Â [TRBE_BUFFER_16_BYTES]Â Â Â Â Â Â Â = "16 bytes", +Â Â Â [TRBE_BUFFER_32_BYTES]Â Â Â Â Â Â Â = "32 bytes", +Â Â Â [TRBE_BUFFER_64_BYTES]Â Â Â Â Â Â Â = "64 bytes", +Â Â Â [TRBE_BUFFER_128_BYTES]Â Â Â Â Â Â Â = "128 bytes", +Â Â Â [TRBE_BUFFER_256_BYTES]Â Â Â Â Â Â Â = "256 bytes", +Â Â Â [TRBE_BUFFER_512_BYTES]Â Â Â Â Â Â Â = "512 bytes", +Â Â Â [TRBE_BUFFER_1K_BYTES]Â Â Â Â Â Â Â = "1K bytes", +Â Â Â [TRBE_BUFFER_2K_BYTES]Â Â Â Â Â Â Â = "2K bytes", +};
We don't need any of this. We could simply "<<" and get the size.
Dropping all these, we will just export the hex value in the sysfs not a string from here.
+static inline enum trbe_buffer_align get_trbe_address_align(void) +{ +Â Â Â u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
+Â Â Â return (trbidr >> TRBIDR_ALIGN_SHIFT) & TRBIDR_ALIGN_MASK; +}
+static inline void assert_trbe_address_mode(unsigned long addr) +{ +Â Â Â bool virt_addr = virt_addr_valid(addr) || is_vmalloc_addr((void *)addr); +Â Â Â bool virt_mode = is_trbe_virtual_mode();
+Â Â Â WARN_ON(addr && ((virt_addr && !virt_mode) || (!virt_addr && virt_mode))); +}
I am not sure if this is really helpful. You have to trust the kernel vmalloc().
Okay, dropping both address asserts i.e mode and alignment.
Hi Anshuman,
On Tue, Nov 10, 2020 at 08:45:05PM +0800, Anshuman Khandual wrote:
Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is accessible via the system registers. The TRBE supports different addressing modes including CPU virtual address and buffer modes including the circular buffer mode. The TRBE buffer is addressed by a base pointer (TRBBASER_EL1), an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But the access to the trace buffer could be prohibited by a higher exception level (EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU private interrupt (PPI) on address translation errors and when the buffer is full. Overall implementation here is inspired from the Arm SPE driver.
Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com
Documentation/trace/coresight/coresight-trbe.rst | 36 ++ arch/arm64/include/asm/sysreg.h | 2 + drivers/hwtracing/coresight/Kconfig | 11 + drivers/hwtracing/coresight/Makefile | 1 + drivers/hwtracing/coresight/coresight-trbe.c | 766 +++++++++++++++++++++++ drivers/hwtracing/coresight/coresight-trbe.h | 525 ++++++++++++++++ 6 files changed, 1341 insertions(+) create mode 100644 Documentation/trace/coresight/coresight-trbe.rst create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
diff --git a/Documentation/trace/coresight/coresight-trbe.rst b/Documentation/trace/coresight/coresight-trbe.rst new file mode 100644 index 0000000..4320a8b --- /dev/null +++ b/Documentation/trace/coresight/coresight-trbe.rst @@ -0,0 +1,36 @@ +.. SPDX-License-Identifier: GPL-2.0
+============================== +Trace Buffer Extension (TRBE). +==============================
- :Author: Anshuman Khandual anshuman.khandual@arm.com
- :Date: November 2020
+Hardware Description +--------------------
+Trace Buffer Extension (TRBE) is a percpu hardware which captures in system +memory, CPU traces generated from a corresponding percpu tracing unit. This +gets plugged in as a coresight sink device because the corresponding trace +genarators (ETE), are plugged in as source device.
+Sysfs files and directories +---------------------------
+The TRBE devices appear on the existing coresight bus alongside the other +coresight devices::
$ ls /sys/bus/coresight/devices- trbe0 trbe1 trbe2 trbe3
+The ``trbe<N>`` named TRBEs are associated with a CPU.::
$ ls /sys/bus/coresight/devices/trbe0/- irq align dbm
+*Key file items are:-*
- ``irq``: TRBE maintenance interrupt number
- ``align``: TRBE write pointer alignment
- ``dbm``: TRBE updates memory with access and dirty flags
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 14cb156..61136f6 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -97,6 +97,7 @@ #define SET_PSTATE_UAO(x) __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift)) #define SET_PSTATE_SSBS(x) __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift)) #define SET_PSTATE_TCO(x) __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift)) +#define TSB_CSYNC __emit_inst(0xd503225f)
#define __SYS_BARRIER_INSN(CRm, op2, Rt) \ __emit_inst(0xd5000000 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & 0x1f)) @@ -865,6 +866,7 @@ #define ID_AA64MMFR2_CNP_SHIFT 0
/* id_aa64dfr0 */ +#define ID_AA64DFR0_TRBE_SHIFT 44 #define ID_AA64DFR0_TRACE_FILT_SHIFT 40 #define ID_AA64DFR0_DOUBLELOCK_SHIFT 36 #define ID_AA64DFR0_PMSVER_SHIFT 32 diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig index c119824..0f5e101 100644 --- a/drivers/hwtracing/coresight/Kconfig +++ b/drivers/hwtracing/coresight/Kconfig @@ -156,6 +156,17 @@ config CORESIGHT_CTI To compile this driver as a module, choose M here: the module will be called coresight-cti.
+config CORESIGHT_TRBE
- bool "Trace Buffer Extension (TRBE) driver"
Can you consider to support TRBE as loadable module since all coresight drivers support loadable module now.
Thanks Tingwei
- depends on ARM64
- help
This driver provides support for percpu Trace Buffer Extension (TRBE).
TRBE always needs to be used along with it's corresponding percpu ETE
component. ETE generates trace data which is then captured with TRBE.
Unlike traditional sink devices, TRBE is a CPU feature accessible via
system registers. But it's explicit dependency with trace unit (ETE)
requires it to be plugged in as a coresight sink device.
config CORESIGHT_CTI_INTEGRATION_REGS bool "Access CTI CoreSight Integration Registers" depends on CORESIGHT_CTI diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile index f20e357..d608165 100644 --- a/drivers/hwtracing/coresight/Makefile +++ b/drivers/hwtracing/coresight/Makefile @@ -21,5 +21,6 @@ obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o obj-$(CONFIG_CORESIGHT_CATU) += coresight-catu.o obj-$(CONFIG_CORESIGHT_CTI) += coresight-cti.o +obj-$(CONFIG_CORESIGHT_TRBE) += coresight-trbe.o coresight-cti-y := coresight-cti-core.o coresight-cti-platform.o \ coresight-cti-sysfs.o diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c new file mode 100644 index 0000000..48a8ec3 --- /dev/null +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -0,0 +1,766 @@ +// SPDX-License-Identifier: GPL-2.0 +/*
- This driver enables Trace Buffer Extension (TRBE) as a per-cpu coresight
- sink device could then pair with an appropriate per-cpu coresight source
- device (ETE) thus generating required trace data. Trace can be enabled
- via the perf framework.
- Copyright (C) 2020 ARM Ltd.
- Author: Anshuman Khandual anshuman.khandual@arm.com
- */
+#define DRVNAME "arm_trbe"
+#define pr_fmt(fmt) DRVNAME ": " fmt
+#include "coresight-trbe.h"
+#define PERF_IDX2OFF(idx, buf) ((idx) % ((buf)->nr_pages << PAGE_SHIFT))
+#define ETE_IGNORE_PACKET 0x70
+static const char trbe_name[] = "trbe";
+enum trbe_fault_action {
- TRBE_FAULT_ACT_WRAP,
- TRBE_FAULT_ACT_SPURIOUS,
- TRBE_FAULT_ACT_FATAL,
+};
+struct trbe_perf {
- unsigned long trbe_base;
- unsigned long trbe_limit;
- unsigned long trbe_write;
- pid_t pid;
- int nr_pages;
- void **pages;
- bool snapshot;
- struct trbe_cpudata *cpudata;
+};
+struct trbe_cpudata {
- struct coresight_device *csdev;
- bool trbe_dbm;
- u64 trbe_align;
- int cpu;
- enum cs_mode mode;
- struct trbe_perf *perf;
- struct trbe_drvdata *drvdata;
+};
+struct trbe_drvdata {
- struct trbe_cpudata __percpu *cpudata;
- struct perf_output_handle __percpu *handle;
- struct hlist_node hotplug_node;
- int irq;
- cpumask_t supported_cpus;
- enum cpuhp_state trbe_online;
- struct platform_device *pdev;
- struct clk *atclk;
+};
+static int trbe_alloc_node(struct perf_event *event) +{
- if (event->cpu == -1)
return NUMA_NO_NODE;
- return cpu_to_node(event->cpu);
+}
+static void trbe_disable_and_drain_local(void) +{
- write_sysreg_s(0, SYS_TRBLIMITR_EL1);
- isb();
- dsb(nsh);
- asm(TSB_CSYNC);
+}
+static void trbe_reset_local(void) +{
- trbe_disable_and_drain_local();
- write_sysreg_s(0, SYS_TRBPTR_EL1);
- isb();
- write_sysreg_s(0, SYS_TRBBASER_EL1);
- isb();
- write_sysreg_s(0, SYS_TRBSR_EL1);
- isb();
+}
+static void trbe_pad_buf(struct perf_output_handle *handle, int len) +{
- struct trbe_perf *perf = etm_perf_sink_config(handle);
- u64 head = PERF_IDX2OFF(handle->head, perf);
- memset((void *) perf->trbe_base + head, ETE_IGNORE_PACKET, len);
- if (!perf->snapshot)
perf_aux_output_skip(handle, len);
+}
+static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle) +{
- struct trbe_perf *perf = etm_perf_sink_config(handle);
- u64 head = PERF_IDX2OFF(handle->head, perf);
- u64 limit = perf->nr_pages * PAGE_SIZE;
- if (head < limit >> 1)
limit >>= 1;
- return limit;
+}
+static unsigned long trbe_normal_offset(struct perf_output_handle *handle) +{
- struct trbe_perf *perf = etm_perf_sink_config(handle);
- struct trbe_cpudata *cpudata = perf->cpudata;
- const u64 bufsize = perf->nr_pages * PAGE_SIZE;
- u64 limit = bufsize;
- u64 head, tail, wakeup;
- head = PERF_IDX2OFF(handle->head, perf);
- if (!IS_ALIGNED(head, cpudata->trbe_align)) {
unsigned long delta = roundup(head, cpudata->trbe_align) - head;
delta = min(delta, handle->size);
trbe_pad_buf(handle, delta);
head = PERF_IDX2OFF(handle->head, perf);
- }
- if (!handle->size) {
perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
return 0;
- }
- tail = PERF_IDX2OFF(handle->head + handle->size, perf);
- wakeup = PERF_IDX2OFF(handle->wakeup, perf);
- if (head < tail)
limit = round_down(tail, PAGE_SIZE);
- if (handle->wakeup < (handle->head + handle->size) && head <= wakeup)
limit = min(limit, round_up(wakeup, PAGE_SIZE));
- if (limit > head)
return limit;
- trbe_pad_buf(handle, handle->size);
- perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
- return 0;
+}
+static unsigned long get_trbe_limit(struct perf_output_handle *handle) +{
- struct trbe_perf *perf = etm_perf_sink_config(handle);
- unsigned long offset;
- if (perf->snapshot)
offset = trbe_snapshot_offset(handle);
- else
offset = trbe_normal_offset(handle);
- return perf->trbe_base + offset;
+}
+static void trbe_enable_hw(struct trbe_perf *perf) +{
- WARN_ON(perf->trbe_write < perf->trbe_base);
- WARN_ON(perf->trbe_write >= perf->trbe_limit);
- set_trbe_disabled();
- clr_trbe_irq();
- clr_trbe_wrap();
- clr_trbe_abort();
- clr_trbe_ec();
- clr_trbe_bsc();
- clr_trbe_fsc();
- set_trbe_virtual_mode();
- set_trbe_fill_mode(TRBE_FILL_STOP);
- set_trbe_trig_mode(TRBE_TRIGGER_IGNORE);
- isb();
- set_trbe_base_pointer(perf->trbe_base);
- set_trbe_limit_pointer(perf->trbe_limit);
- set_trbe_write_pointer(perf->trbe_write);
- isb();
- dsb(ishst);
- flush_tlb_all();
- set_trbe_running();
- set_trbe_enabled();
- asm(TSB_CSYNC);
+}
+static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
struct perf_event *event, void **pages,
int nr_pages, bool snapshot)
+{
- struct trbe_perf *perf;
- struct page **pglist;
- int i;
- if ((nr_pages < 2) || (snapshot && (nr_pages & 1)))
return NULL;
- perf = kzalloc_node(sizeof(*perf), GFP_KERNEL, trbe_alloc_node(event));
- if (IS_ERR(perf))
return ERR_PTR(-ENOMEM);
- pglist = kcalloc(nr_pages, sizeof(*pglist), GFP_KERNEL);
- if (IS_ERR(pglist)) {
kfree(perf);
return ERR_PTR(-ENOMEM);
- }
- for (i = 0; i < nr_pages; i++)
pglist[i] = virt_to_page(pages[i]);
- perf->trbe_base = (unsigned long) vmap(pglist, nr_pages, VM_MAP,
PAGE_KERNEL);
- if (IS_ERR((void *) perf->trbe_base)) {
kfree(pglist);
kfree(perf);
return ERR_PTR(perf->trbe_base);
- }
- perf->trbe_limit = perf->trbe_base + nr_pages * PAGE_SIZE;
- perf->trbe_write = perf->trbe_base;
- perf->pid = task_pid_nr(event->owner);
- perf->snapshot = snapshot;
- perf->nr_pages = nr_pages;
- perf->pages = pages;
- kfree(pglist);
- return perf;
+}
+void arm_trbe_free_buffer(void *config) +{
- struct trbe_perf *perf = config;
- vunmap((void *) perf->trbe_base);
- kfree(perf);
+}
+static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
struct perf_output_handle *handle,
void *config)
+{
- struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
- struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
- struct trbe_perf *perf = config;
- unsigned long size, offset;
- WARN_ON(perf->cpudata != cpudata);
- WARN_ON(cpudata->cpu != smp_processor_id());
- WARN_ON(cpudata->mode != CS_MODE_PERF);
- WARN_ON(cpudata->drvdata != drvdata);
- offset = get_trbe_write_pointer() - get_trbe_base_pointer();
- size = offset - PERF_IDX2OFF(handle->head, perf);
- if (perf->snapshot)
handle->head += size;
- return size;
+}
+static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data) +{
- struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
- struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
- struct perf_output_handle *handle = data;
- struct trbe_perf *perf = etm_perf_sink_config(handle);
- WARN_ON(cpudata->cpu != smp_processor_id());
- WARN_ON(mode != CS_MODE_PERF);
- WARN_ON(cpudata->drvdata != drvdata);
- *this_cpu_ptr(drvdata->handle) = *handle;
- cpudata->perf = perf;
- cpudata->mode = mode;
- perf->cpudata = cpudata;
- perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
- perf->trbe_limit = get_trbe_limit(handle);
- if (perf->trbe_limit == perf->trbe_base) {
trbe_disable_and_drain_local();
return 0;
- }
- trbe_enable_hw(perf);
- return 0;
+}
+static int arm_trbe_disable(struct coresight_device *csdev) +{
- struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
- struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
- struct trbe_perf *perf = cpudata->perf;
- WARN_ON(perf->cpudata != cpudata);
- WARN_ON(cpudata->cpu != smp_processor_id());
- WARN_ON(cpudata->mode != CS_MODE_PERF);
- WARN_ON(cpudata->drvdata != drvdata);
- trbe_disable_and_drain_local();
- perf->cpudata = NULL;
- cpudata->perf = NULL;
- cpudata->mode = CS_MODE_DISABLED;
- return 0;
+}
+static void trbe_handle_fatal(struct perf_output_handle *handle) +{
- perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
- perf_aux_output_end(handle, 0);
- trbe_disable_and_drain_local();
+}
+static void trbe_handle_spurious(struct perf_output_handle *handle) +{
- struct trbe_perf *perf = etm_perf_sink_config(handle);
- perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
- perf->trbe_limit = get_trbe_limit(handle);
- if (perf->trbe_limit == perf->trbe_base) {
trbe_disable_and_drain_local();
return;
- }
- trbe_enable_hw(perf);
+}
+static void trbe_handle_overflow(struct perf_output_handle *handle) +{
- struct perf_event *event = handle->event;
- struct trbe_perf *perf = etm_perf_sink_config(handle);
- unsigned long offset, size;
- struct etm_event_data *event_data;
- offset = get_trbe_limit_pointer() - get_trbe_base_pointer();
- size = offset - PERF_IDX2OFF(handle->head, perf);
- if (perf->snapshot)
handle->head = offset;
- perf_aux_output_end(handle, size);
- event_data = perf_aux_output_begin(handle, event);
- if (!event_data) {
event->hw.state |= PERF_HES_STOPPED;
trbe_disable_and_drain_local();
return;
- }
- perf->trbe_write = perf->trbe_base;
- perf->trbe_limit = get_trbe_limit(handle);
- if (perf->trbe_limit == perf->trbe_base) {
trbe_disable_and_drain_local();
return;
- }
- *this_cpu_ptr(perf->cpudata->drvdata->handle) = *handle;
- trbe_enable_hw(perf);
+}
+static bool is_perf_trbe(struct perf_output_handle *handle) +{
- struct trbe_perf *perf = etm_perf_sink_config(handle);
- struct trbe_cpudata *cpudata = perf->cpudata;
- struct trbe_drvdata *drvdata = cpudata->drvdata;
- int cpu = smp_processor_id();
- WARN_ON(perf->trbe_base != get_trbe_base_pointer());
- WARN_ON(perf->trbe_limit != get_trbe_limit_pointer());
- if (cpudata->mode != CS_MODE_PERF)
return false;
- if (cpudata->cpu != cpu)
return false;
- if (!cpumask_test_cpu(cpu, &drvdata->supported_cpus))
return false;
- return true;
+}
+static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle) +{
- enum trbe_ec ec = get_trbe_ec();
- enum trbe_bsc bsc = get_trbe_bsc();
- WARN_ON(is_trbe_running());
- asm(TSB_CSYNC);
- dsb(nsh);
- isb();
- if (is_trbe_trg() || is_trbe_abort())
return TRBE_FAULT_ACT_FATAL;
- if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
return TRBE_FAULT_ACT_FATAL;
- if (is_trbe_wrap() && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED))
{
if (get_trbe_write_pointer() == get_trbe_base_pointer())
return TRBE_FAULT_ACT_WRAP;
- }
- return TRBE_FAULT_ACT_SPURIOUS;
+}
+static irqreturn_t arm_trbe_irq_handler(int irq, void *dev) +{
- struct perf_output_handle *handle = dev;
- enum trbe_fault_action act;
- WARN_ON(!is_trbe_irq());
- clr_trbe_irq();
- if (!perf_get_aux(handle))
return IRQ_NONE;
- if (!is_perf_trbe(handle))
return IRQ_NONE;
- irq_work_run();
- act = trbe_get_fault_act(handle);
- switch (act) {
- case TRBE_FAULT_ACT_WRAP:
trbe_handle_overflow(handle);
break;
- case TRBE_FAULT_ACT_SPURIOUS:
trbe_handle_spurious(handle);
break;
- case TRBE_FAULT_ACT_FATAL:
trbe_handle_fatal(handle);
break;
- }
- return IRQ_HANDLED;
+}
+static const struct coresight_ops_sink arm_trbe_sink_ops = {
- .enable = arm_trbe_enable,
- .disable = arm_trbe_disable,
- .alloc_buffer = arm_trbe_alloc_buffer,
- .free_buffer = arm_trbe_free_buffer,
- .update_buffer = arm_trbe_update_buffer,
+};
+static const struct coresight_ops arm_trbe_cs_ops = {
- .sink_ops = &arm_trbe_sink_ops,
+};
+static ssize_t irq_show(struct device *dev, struct device_attribute *attr, char *buf) +{
- struct trbe_drvdata *drvdata = dev_get_drvdata(dev->parent);
- return sprintf(buf, "%d\n", drvdata->irq);
+} +static DEVICE_ATTR_RO(irq);
+static ssize_t align_show(struct device *dev, struct device_attribute *attr, char *buf) +{
- struct trbe_cpudata *cpudata = dev_get_drvdata(dev);
- return sprintf(buf, "%s\n",
trbe_buffer_align_str[ilog2(cpudata->trbe_align)]); +} +static DEVICE_ATTR_RO(align);
+static ssize_t dbm_show(struct device *dev, struct device_attribute *attr, char *buf) +{
- struct trbe_cpudata *cpudata = dev_get_drvdata(dev);
- return sprintf(buf, "%d\n", cpudata->trbe_dbm);
+} +static DEVICE_ATTR_RO(dbm);
+static struct attribute *arm_trbe_attrs[] = {
- &dev_attr_align.attr,
- &dev_attr_irq.attr,
- &dev_attr_dbm.attr,
- NULL,
+};
+static const struct attribute_group arm_trbe_group = {
- .attrs = arm_trbe_attrs,
+};
+static const struct attribute_group *arm_trbe_groups[] = {
- &arm_trbe_group,
- NULL,
+};
+static void arm_trbe_probe_coresight_cpu(void *info) +{
- struct trbe_cpudata *cpudata = info;
- struct device *dev = &cpudata->drvdata->pdev->dev;
- struct coresight_desc desc = { 0 };
- if (WARN_ON(!cpudata))
goto cpu_clear;
- if (!is_trbe_available()) {
pr_err("TRBE is not implemented on cpu %d\n", cpudata->cpu);
goto cpu_clear;
- }
- if (!is_trbe_programmable()) {
pr_err("TRBE is owned in higher exception level on cpu %d\n",
cpudata->cpu);
goto cpu_clear;
- }
- desc.name = devm_kasprintf(dev, GFP_KERNEL, "%s%d", trbe_name,
smp_processor_id());
- if (IS_ERR(desc.name))
goto cpu_clear;
- desc.type = CORESIGHT_DEV_TYPE_SINK;
- desc.subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_SYSMEM;
- desc.ops = &arm_trbe_cs_ops;
- desc.pdata = dev_get_platdata(dev);
- desc.groups = arm_trbe_groups;
- desc.dev = dev;
- cpudata->csdev = coresight_register(&desc);
- if (IS_ERR(cpudata->csdev))
goto cpu_clear;
- dev_set_drvdata(&cpudata->csdev->dev, cpudata);
- cpudata->trbe_dbm = get_trbe_flag_update();
- cpudata->trbe_align = 1ULL << get_trbe_address_align();
- if (cpudata->trbe_align > SZ_2K) {
pr_err("Unsupported alignment on cpu %d\n", cpudata->cpu);
goto cpu_clear;
- }
- return;
+cpu_clear:
- cpumask_clear_cpu(cpudata->cpu, &cpudata->drvdata->supported_cpus);
+}
+static int arm_trbe_probe_coresight(struct trbe_drvdata *drvdata) +{
- struct trbe_cpudata *cpudata;
- int cpu;
- drvdata->cpudata = alloc_percpu(typeof(*drvdata->cpudata));
- if (IS_ERR(drvdata->cpudata))
return PTR_ERR(drvdata->cpudata);
- for_each_cpu(cpu, &drvdata->supported_cpus) {
cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
cpudata->cpu = cpu;
cpudata->drvdata = drvdata;
smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);
- }
- return 0;
+}
+static void arm_trbe_remove_coresight_cpu(void *info) +{
- struct trbe_drvdata *drvdata = info;
- disable_percpu_irq(drvdata->irq);
+}
+static int arm_trbe_remove_coresight(struct trbe_drvdata *drvdata) +{
- struct trbe_cpudata *cpudata;
- int cpu;
- for_each_cpu(cpu, &drvdata->supported_cpus) {
smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu, drvdata, 1);
cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
if (cpudata->csdev) {
coresight_unregister(cpudata->csdev);
cpudata->drvdata = NULL;
cpudata->csdev = NULL;
}
- }
- free_percpu(drvdata->cpudata);
- return 0;
+}
+static int arm_trbe_cpu_startup(unsigned int cpu, struct hlist_node *node) +{
- struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata,
hotplug_node);
- struct trbe_cpudata *cpudata;
- if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
if (!cpudata->csdev) {
cpudata->drvdata = drvdata;
smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);
}
trbe_reset_local();
enable_percpu_irq(drvdata->irq, IRQ_TYPE_NONE);
- }
- return 0;
+}
+static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node) +{
- struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata,
hotplug_node);
- struct trbe_cpudata *cpudata;
- if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
if (cpudata->csdev) {
coresight_unregister(cpudata->csdev);
cpudata->drvdata = NULL;
cpudata->csdev = NULL;
}
disable_percpu_irq(drvdata->irq);
trbe_reset_local();
- }
- return 0;
+}
+static int arm_trbe_probe_cpuhp(struct trbe_drvdata *drvdata) +{
- enum cpuhp_state trbe_online;
- trbe_online = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, DRVNAME,
arm_trbe_cpu_startup, arm_trbe_cpu_teardown);
- if (trbe_online < 0)
return -EINVAL;
- if (cpuhp_state_add_instance(trbe_online, &drvdata->hotplug_node))
return -EINVAL;
- drvdata->trbe_online = trbe_online;
- return 0;
+}
+static void arm_trbe_remove_cpuhp(struct trbe_drvdata *drvdata) +{
- cpuhp_remove_multi_state(drvdata->trbe_online);
+}
+static int arm_trbe_probe_irq(struct platform_device *pdev,
struct trbe_drvdata *drvdata)
+{
- drvdata->irq = platform_get_irq(pdev, 0);
- if (!drvdata->irq) {
pr_err("IRQ not found for the platform device\n");
return -ENXIO;
- }
- if (!irq_is_percpu(drvdata->irq)) {
pr_err("IRQ is not a PPI\n");
return -EINVAL;
- }
- if (irq_get_percpu_devid_partition(drvdata->irq,
&drvdata->supported_cpus))
return -EINVAL;
- drvdata->handle = alloc_percpu(typeof(*drvdata->handle));
- if (!drvdata->handle)
return -ENOMEM;
- if (request_percpu_irq(drvdata->irq, arm_trbe_irq_handler, DRVNAME,
drvdata->handle)) {
free_percpu(drvdata->handle);
return -EINVAL;
- }
- return 0;
+}
+static void arm_trbe_remove_irq(struct trbe_drvdata *drvdata) +{
- free_percpu_irq(drvdata->irq, drvdata->handle);
- free_percpu(drvdata->handle);
+}
+static int arm_trbe_device_probe(struct platform_device *pdev) +{
- struct coresight_platform_data *pdata;
- struct trbe_drvdata *drvdata;
- struct device *dev = &pdev->dev;
- int ret;
- drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
- if (IS_ERR(drvdata))
return -ENOMEM;
- pdata = coresight_get_platform_data(dev);
- if (IS_ERR(pdata)) {
kfree(drvdata);
return -ENOMEM;
- }
- drvdata->atclk = devm_clk_get(dev, "atclk");
- if (!IS_ERR(drvdata->atclk)) {
ret = clk_prepare_enable(drvdata->atclk);
if (ret)
return ret;
- }
- dev_set_drvdata(dev, drvdata);
- dev->platform_data = pdata;
- drvdata->pdev = pdev;
- ret = arm_trbe_probe_irq(pdev, drvdata);
- if (ret)
goto irq_failed;
- ret = arm_trbe_probe_coresight(drvdata);
- if (ret)
goto probe_failed;
- ret = arm_trbe_probe_cpuhp(drvdata);
- if (ret)
goto cpuhp_failed;
- return 0;
+cpuhp_failed:
- arm_trbe_remove_coresight(drvdata);
+probe_failed:
- arm_trbe_remove_irq(drvdata);
+irq_failed:
- kfree(pdata);
- kfree(drvdata);
- return ret;
+}
+static int arm_trbe_device_remove(struct platform_device *pdev) +{
- struct coresight_platform_data *pdata = dev_get_platdata(&pdev->dev);
- struct trbe_drvdata *drvdata = platform_get_drvdata(pdev);
- arm_trbe_remove_coresight(drvdata);
- arm_trbe_remove_cpuhp(drvdata);
- arm_trbe_remove_irq(drvdata);
- kfree(pdata);
- kfree(drvdata);
- return 0;
+}
+#ifdef CONFIG_PM +static int arm_trbe_runtime_suspend(struct device *dev) +{
- struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
- if (drvdata && !IS_ERR(drvdata->atclk))
clk_disable_unprepare(drvdata->atclk);
- return 0;
+}
+static int arm_trbe_runtime_resume(struct device *dev) +{
- struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
- if (drvdata && !IS_ERR(drvdata->atclk))
clk_prepare_enable(drvdata->atclk);
- return 0;
+} +#endif
+static const struct dev_pm_ops arm_trbe_dev_pm_ops = {
- SET_RUNTIME_PM_OPS(arm_trbe_runtime_suspend, arm_trbe_runtime_resume,
NULL) +};
+static const struct of_device_id arm_trbe_of_match[] = {
- { .compatible = "arm,arm-trbe", .data = (void *)1 },
- {},
+}; +MODULE_DEVICE_TABLE(of, arm_trbe_of_match);
+static const struct platform_device_id arm_trbe_match[] = {
- { "arm,trbe", 0},
- { }
+}; +MODULE_DEVICE_TABLE(platform, arm_trbe_match);
+static struct platform_driver arm_trbe_driver = {
- .id_table = arm_trbe_match,
- .driver = {
.name = DRVNAME,
.of_match_table = of_match_ptr(arm_trbe_of_match),
.pm = &arm_trbe_dev_pm_ops,
.suppress_bind_attrs = true,
- },
- .probe = arm_trbe_device_probe,
- .remove = arm_trbe_device_remove,
+}; +builtin_platform_driver(arm_trbe_driver) diff --git a/drivers/hwtracing/coresight/coresight-trbe.h b/drivers/hwtracing/coresight/coresight-trbe.h new file mode 100644 index 0000000..82ffbfc --- /dev/null +++ b/drivers/hwtracing/coresight/coresight-trbe.h @@ -0,0 +1,525 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/*
- This contains all required hardware related helper functions for
- Trace Buffer Extension (TRBE) driver in the coresight framework.
- Copyright (C) 2020 ARM Ltd.
- Author: Anshuman Khandual anshuman.khandual@arm.com
- */
+#include <linux/coresight.h> +#include <linux/device.h> +#include <linux/irq.h> +#include <linux/kernel.h> +#include <linux/of.h> +#include <linux/platform_device.h> +#include <linux/smp.h>
+#include "coresight-etm-perf.h"
+static inline bool is_trbe_available(void) +{
- u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
- int trbe = cpuid_feature_extract_unsigned_field(aa64dfr0,
ID_AA64DFR0_TRBE_SHIFT);
- return trbe >= 0b0001;
+}
+static inline bool is_ete_available(void) +{
- u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
- int tracever = cpuid_feature_extract_unsigned_field(aa64dfr0,
ID_AA64DFR0_TRACEVER_SHIFT);
- return (tracever != 0b0000);
+}
+static inline bool is_trbe_enabled(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- return trblimitr & TRBLIMITR_ENABLE;
+}
+enum trbe_ec {
- TRBE_EC_OTHERS = 0,
- TRBE_EC_STAGE1_ABORT = 36,
- TRBE_EC_STAGE2_ABORT = 37,
+};
+static const char *const trbe_ec_str[] = {
- [TRBE_EC_OTHERS] = "Maintenance exception",
- [TRBE_EC_STAGE1_ABORT] = "Stage-1 exception",
- [TRBE_EC_STAGE2_ABORT] = "Stage-2 exception",
+};
+static inline enum trbe_ec get_trbe_ec(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- return (trbsr >> TRBSR_EC_SHIFT) & TRBSR_EC_MASK;
+}
+static inline void clr_trbe_ec(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- trbsr &= ~(TRBSR_EC_MASK << TRBSR_EC_SHIFT);
- write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+enum trbe_bsc {
- TRBE_BSC_NOT_STOPPED = 0,
- TRBE_BSC_FILLED = 1,
- TRBE_BSC_TRIGGERED = 2,
+};
+static const char *const trbe_bsc_str[] = {
- [TRBE_BSC_NOT_STOPPED] = "TRBE collection not stopped",
- [TRBE_BSC_FILLED] = "TRBE filled",
- [TRBE_BSC_TRIGGERED] = "TRBE triggered",
+};
+static inline enum trbe_bsc get_trbe_bsc(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- return (trbsr >> TRBSR_BSC_SHIFT) & TRBSR_BSC_MASK;
+}
+static inline void clr_trbe_bsc(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- trbsr &= ~(TRBSR_BSC_MASK << TRBSR_BSC_SHIFT);
- write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+enum trbe_fsc {
- TRBE_FSC_ASF_LEVEL0 = 0,
- TRBE_FSC_ASF_LEVEL1 = 1,
- TRBE_FSC_ASF_LEVEL2 = 2,
- TRBE_FSC_ASF_LEVEL3 = 3,
- TRBE_FSC_TF_LEVEL0 = 4,
- TRBE_FSC_TF_LEVEL1 = 5,
- TRBE_FSC_TF_LEVEL2 = 6,
- TRBE_FSC_TF_LEVEL3 = 7,
- TRBE_FSC_AFF_LEVEL0 = 8,
- TRBE_FSC_AFF_LEVEL1 = 9,
- TRBE_FSC_AFF_LEVEL2 = 10,
- TRBE_FSC_AFF_LEVEL3 = 11,
- TRBE_FSC_PF_LEVEL0 = 12,
- TRBE_FSC_PF_LEVEL1 = 13,
- TRBE_FSC_PF_LEVEL2 = 14,
- TRBE_FSC_PF_LEVEL3 = 15,
- TRBE_FSC_SEA_WRITE = 16,
- TRBE_FSC_ASEA_WRITE = 17,
- TRBE_FSC_SEA_LEVEL0 = 20,
- TRBE_FSC_SEA_LEVEL1 = 21,
- TRBE_FSC_SEA_LEVEL2 = 22,
- TRBE_FSC_SEA_LEVEL3 = 23,
- TRBE_FSC_ALIGN_FAULT = 33,
- TRBE_FSC_TLB_FAULT = 48,
- TRBE_FSC_ATOMIC_FAULT = 49,
+};
+static const char *const trbe_fsc_str[] = {
- [TRBE_FSC_ASF_LEVEL0] = "Address size fault - level 0",
- [TRBE_FSC_ASF_LEVEL1] = "Address size fault - level 1",
- [TRBE_FSC_ASF_LEVEL2] = "Address size fault - level 2",
- [TRBE_FSC_ASF_LEVEL3] = "Address size fault - level 3",
- [TRBE_FSC_TF_LEVEL0] = "Translation fault - level 0",
- [TRBE_FSC_TF_LEVEL1] = "Translation fault - level 1",
- [TRBE_FSC_TF_LEVEL2] = "Translation fault - level 2",
- [TRBE_FSC_TF_LEVEL3] = "Translation fault - level 3",
- [TRBE_FSC_AFF_LEVEL0] = "Access flag fault - level 0",
- [TRBE_FSC_AFF_LEVEL1] = "Access flag fault - level 1",
- [TRBE_FSC_AFF_LEVEL2] = "Access flag fault - level 2",
- [TRBE_FSC_AFF_LEVEL3] = "Access flag fault - level 3",
- [TRBE_FSC_PF_LEVEL0] = "Permission fault - level 0",
- [TRBE_FSC_PF_LEVEL1] = "Permission fault - level 1",
- [TRBE_FSC_PF_LEVEL2] = "Permission fault - level 2",
- [TRBE_FSC_PF_LEVEL3] = "Permission fault - level 3",
- [TRBE_FSC_SEA_WRITE] = "Synchronous external abort on write",
- [TRBE_FSC_ASEA_WRITE] = "Asynchronous external abort on write",
- [TRBE_FSC_SEA_LEVEL0] = "Syncrhonous external abort on table walk - level
0",
- [TRBE_FSC_SEA_LEVEL1] = "Syncrhonous external abort on table walk - level
1",
- [TRBE_FSC_SEA_LEVEL2] = "Syncrhonous external abort on table walk - level
2",
- [TRBE_FSC_SEA_LEVEL3] = "Syncrhonous external abort on table walk - level
3",
- [TRBE_FSC_ALIGN_FAULT] = "Alignment fault",
- [TRBE_FSC_TLB_FAULT] = "TLB conflict fault",
- [TRBE_FSC_ATOMIC_FAULT] = "Atmoc fault",
+};
+static inline enum trbe_fsc get_trbe_fsc(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- return (trbsr >> TRBSR_FSC_SHIFT) & TRBSR_FSC_MASK;
+}
+static inline void clr_trbe_fsc(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- trbsr &= ~(TRBSR_FSC_MASK << TRBSR_FSC_SHIFT);
- write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+static inline void set_trbe_irq(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- WARN_ON(is_trbe_enabled());
- trbsr |= TRBSR_IRQ;
- write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+static inline void clr_trbe_irq(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- trbsr &= ~TRBSR_IRQ;
- write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+static inline void set_trbe_trg(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- WARN_ON(is_trbe_enabled());
- trbsr |= TRBSR_TRG;
- write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+static inline void clr_trbe_trg(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- WARN_ON(is_trbe_enabled());
- trbsr &= ~TRBSR_TRG;
- write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+static inline void set_trbe_wrap(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- WARN_ON(is_trbe_enabled());
- trbsr |= TRBSR_WRAP;
- write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+static inline void clr_trbe_wrap(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- WARN_ON(is_trbe_enabled());
- trbsr &= ~TRBSR_WRAP;
- write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+static inline void set_trbe_abort(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- WARN_ON(is_trbe_enabled());
- trbsr |= TRBSR_ABORT;
- write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+static inline void clr_trbe_abort(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- WARN_ON(is_trbe_enabled());
- trbsr &= ~TRBSR_ABORT;
- write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+static inline bool is_trbe_irq(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- return trbsr & TRBSR_IRQ;
+}
+static inline bool is_trbe_trg(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- return trbsr & TRBSR_TRG;
+}
+static inline bool is_trbe_wrap(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- return trbsr & TRBSR_WRAP;
+}
+static inline bool is_trbe_abort(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- return trbsr & TRBSR_ABORT;
+}
+static inline bool is_trbe_running(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- return !(trbsr & TRBSR_STOP);
+}
+static inline void set_trbe_running(void) +{
- u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
- trbsr &= ~TRBSR_STOP;
- write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+enum trbe_address_mode {
- TRBE_ADDRESS_VIRTUAL,
- TRBE_ADDRESS_PHYSICAL,
+};
+static const char *const trbe_address_mode_str[] = {
- [TRBE_ADDRESS_VIRTUAL] = "Address mode - virtual",
- [TRBE_ADDRESS_PHYSICAL] = "Address mode - physical",
+};
+static inline bool is_trbe_virtual_mode(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- return !(trblimitr & TRBLIMITR_NVM);
+}
+static inline bool is_trbe_physical_mode(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- return trblimitr & TRBLIMITR_NVM;
+}
+static inline void set_trbe_virtual_mode(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- trblimitr &= ~TRBLIMITR_NVM;
- write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+static inline void set_trbe_physical_mode(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- trblimitr |= TRBLIMITR_NVM;
- write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+enum trbe_trig_mode {
- TRBE_TRIGGER_STOP = 0,
- TRBE_TRIGGER_IRQ = 1,
- TRBE_TRIGGER_IGNORE = 3,
+};
+static const char *const trbe_trig_mode_str[] = {
- [TRBE_TRIGGER_STOP] = "Trigger mode - stop",
- [TRBE_TRIGGER_IRQ] = "Trigger mode - irq",
- [TRBE_TRIGGER_IGNORE] = "Trigger mode - ignore",
+};
+static inline enum trbe_trig_mode get_trbe_trig_mode(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- return (trblimitr >> TRBLIMITR_TRIG_MODE_SHIFT) &
TRBLIMITR_TRIG_MODE_MASK; +}
+static inline void set_trbe_trig_mode(enum trbe_trig_mode mode) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- trblimitr &= ~(TRBLIMITR_TRIG_MODE_MASK << TRBLIMITR_TRIG_MODE_SHIFT);
- trblimitr |= ((mode & TRBLIMITR_TRIG_MODE_MASK) <<
TRBLIMITR_TRIG_MODE_SHIFT);
- write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+enum trbe_fill_mode {
- TRBE_FILL_STOP = 0,
- TRBE_FILL_WRAP = 1,
- TRBE_FILL_CIRCULAR = 3,
+};
+static const char *const trbe_fill_mode_str[] = {
- [TRBE_FILL_STOP] = "Buffer mode - stop",
- [TRBE_FILL_WRAP] = "Buffer mode - wrap",
- [TRBE_FILL_CIRCULAR] = "Buffer mode - circular",
+};
+static inline enum trbe_fill_mode get_trbe_fill_mode(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- return (trblimitr >> TRBLIMITR_FILL_MODE_SHIFT) &
TRBLIMITR_FILL_MODE_MASK; +}
+static inline void set_trbe_fill_mode(enum trbe_fill_mode mode) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- trblimitr &= ~(TRBLIMITR_FILL_MODE_MASK << TRBLIMITR_FILL_MODE_SHIFT);
- trblimitr |= ((mode & TRBLIMITR_FILL_MODE_MASK) <<
TRBLIMITR_FILL_MODE_SHIFT);
- write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+static inline void set_trbe_disabled(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- trblimitr &= ~TRBLIMITR_ENABLE;
- write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+static inline void set_trbe_enabled(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- trblimitr |= TRBLIMITR_ENABLE;
- write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+static inline bool get_trbe_flag_update(void) +{
- u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
- return trbidr & TRBIDR_FLAG;
+}
+static inline bool is_trbe_programmable(void) +{
- u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
- return !(trbidr & TRBIDR_PROG);
+}
+enum trbe_buffer_align {
- TRBE_BUFFER_BYTE,
- TRBE_BUFFER_HALF_WORD,
- TRBE_BUFFER_WORD,
- TRBE_BUFFER_DOUBLE_WORD,
- TRBE_BUFFER_16_BYTES,
- TRBE_BUFFER_32_BYTES,
- TRBE_BUFFER_64_BYTES,
- TRBE_BUFFER_128_BYTES,
- TRBE_BUFFER_256_BYTES,
- TRBE_BUFFER_512_BYTES,
- TRBE_BUFFER_1K_BYTES,
- TRBE_BUFFER_2K_BYTES,
+};
+static const char *const trbe_buffer_align_str[] = {
- [TRBE_BUFFER_BYTE] = "Byte",
- [TRBE_BUFFER_HALF_WORD] = "Half word",
- [TRBE_BUFFER_WORD] = "Word",
- [TRBE_BUFFER_DOUBLE_WORD] = "Double word",
- [TRBE_BUFFER_16_BYTES] = "16 bytes",
- [TRBE_BUFFER_32_BYTES] = "32 bytes",
- [TRBE_BUFFER_64_BYTES] = "64 bytes",
- [TRBE_BUFFER_128_BYTES] = "128 bytes",
- [TRBE_BUFFER_256_BYTES] = "256 bytes",
- [TRBE_BUFFER_512_BYTES] = "512 bytes",
- [TRBE_BUFFER_1K_BYTES] = "1K bytes",
- [TRBE_BUFFER_2K_BYTES] = "2K bytes",
+};
+static inline enum trbe_buffer_align get_trbe_address_align(void) +{
- u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
- return (trbidr >> TRBIDR_ALIGN_SHIFT) & TRBIDR_ALIGN_MASK;
+}
+static inline void assert_trbe_address_mode(unsigned long addr) +{
- bool virt_addr = virt_addr_valid(addr) || is_vmalloc_addr((void *)addr);
- bool virt_mode = is_trbe_virtual_mode();
- WARN_ON(addr && ((virt_addr && !virt_mode) || (!virt_addr && virt_mode)));
+}
+static inline void assert_trbe_address_align(unsigned long addr) +{
- unsigned long nr_bytes = 1ULL << get_trbe_address_align();
- WARN_ON(addr & (nr_bytes - 1));
+}
+static inline unsigned long get_trbe_write_pointer(void) +{
- u64 trbptr = read_sysreg_s(SYS_TRBPTR_EL1);
- unsigned long addr = (trbptr >> TRBPTR_PTR_SHIFT) & TRBPTR_PTR_MASK;
- assert_trbe_address_mode(addr);
- assert_trbe_address_align(addr);
- return addr;
+}
+static inline void set_trbe_write_pointer(unsigned long addr) +{
- WARN_ON(is_trbe_enabled());
- assert_trbe_address_mode(addr);
- assert_trbe_address_align(addr);
- addr = (addr >> TRBPTR_PTR_SHIFT) & TRBPTR_PTR_MASK;
- write_sysreg_s(addr, SYS_TRBPTR_EL1);
+}
+static inline unsigned long get_trbe_limit_pointer(void) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- unsigned long limit = (trblimitr >> TRBLIMITR_LIMIT_SHIFT) &
TRBLIMITR_LIMIT_MASK;
- unsigned long addr = limit << TRBLIMITR_LIMIT_SHIFT;
- WARN_ON(addr & (PAGE_SIZE - 1));
- assert_trbe_address_mode(addr);
- assert_trbe_address_align(addr);
- return addr;
+}
+static inline void set_trbe_limit_pointer(unsigned long addr) +{
- u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- WARN_ON(is_trbe_enabled());
- assert_trbe_address_mode(addr);
- assert_trbe_address_align(addr);
- WARN_ON(addr & ((1UL << TRBLIMITR_LIMIT_SHIFT) - 1));
- WARN_ON(addr & (PAGE_SIZE - 1));
- trblimitr &= ~(TRBLIMITR_LIMIT_MASK << TRBLIMITR_LIMIT_SHIFT);
- trblimitr |= (addr & PAGE_MASK);
- write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+static inline unsigned long get_trbe_base_pointer(void) +{
- u64 trbbaser = read_sysreg_s(SYS_TRBBASER_EL1);
- unsigned long addr = (trbbaser >> TRBBASER_BASE_SHIFT) &
TRBBASER_BASE_MASK;
- addr = addr << TRBBASER_BASE_SHIFT;
- WARN_ON(addr & (PAGE_SIZE - 1));
- assert_trbe_address_mode(addr);
- assert_trbe_address_align(addr);
- return addr;
+}
+static inline void set_trbe_base_pointer(unsigned long addr) +{
- WARN_ON(is_trbe_enabled());
- assert_trbe_address_mode(addr);
- assert_trbe_address_align(addr);
- WARN_ON(addr & ((1UL << TRBBASER_BASE_SHIFT) - 1));
- WARN_ON(addr & (PAGE_SIZE - 1));
- write_sysreg_s(addr, SYS_TRBBASER_EL1);
+}
2.7.4
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
On 11/14/20 11:08 AM, Tingwei Zhang wrote:
Hi Anshuman,
On Tue, Nov 10, 2020 at 08:45:05PM +0800, Anshuman Khandual wrote:
Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is accessible via the system registers. The TRBE supports different addressing modes including CPU virtual address and buffer modes including the circular buffer mode. The TRBE buffer is addressed by a base pointer (TRBBASER_EL1), an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But the access to the trace buffer could be prohibited by a higher exception level (EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU private interrupt (PPI) on address translation errors and when the buffer is full. Overall implementation here is inspired from the Arm SPE driver.
Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com
Documentation/trace/coresight/coresight-trbe.rst | 36 ++ arch/arm64/include/asm/sysreg.h | 2 + drivers/hwtracing/coresight/Kconfig | 11 + drivers/hwtracing/coresight/Makefile | 1 + drivers/hwtracing/coresight/coresight-trbe.c | 766 +++++++++++++++++++++++ drivers/hwtracing/coresight/coresight-trbe.h | 525 ++++++++++++++++ 6 files changed, 1341 insertions(+) create mode 100644 Documentation/trace/coresight/coresight-trbe.rst create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
diff --git a/Documentation/trace/coresight/coresight-trbe.rst b/Documentation/trace/coresight/coresight-trbe.rst new file mode 100644 index 0000000..4320a8b --- /dev/null +++ b/Documentation/trace/coresight/coresight-trbe.rst @@ -0,0 +1,36 @@ +.. SPDX-License-Identifier: GPL-2.0
+============================== +Trace Buffer Extension (TRBE). +==============================
- :Author: Anshuman Khandual anshuman.khandual@arm.com
- :Date: November 2020
+Hardware Description +--------------------
+Trace Buffer Extension (TRBE) is a percpu hardware which captures in system +memory, CPU traces generated from a corresponding percpu tracing unit. This +gets plugged in as a coresight sink device because the corresponding trace +genarators (ETE), are plugged in as source device.
+Sysfs files and directories +---------------------------
+The TRBE devices appear on the existing coresight bus alongside the other +coresight devices::
$ ls /sys/bus/coresight/devices- trbe0 trbe1 trbe2 trbe3
+The ``trbe<N>`` named TRBEs are associated with a CPU.::
$ ls /sys/bus/coresight/devices/trbe0/- irq align dbm
+*Key file items are:-*
- ``irq``: TRBE maintenance interrupt number
- ``align``: TRBE write pointer alignment
- ``dbm``: TRBE updates memory with access and dirty flags
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 14cb156..61136f6 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -97,6 +97,7 @@ #define SET_PSTATE_UAO(x) __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift)) #define SET_PSTATE_SSBS(x) __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift)) #define SET_PSTATE_TCO(x) __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift)) +#define TSB_CSYNC __emit_inst(0xd503225f)
#define __SYS_BARRIER_INSN(CRm, op2, Rt) \ __emit_inst(0xd5000000 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & 0x1f)) @@ -865,6 +866,7 @@ #define ID_AA64MMFR2_CNP_SHIFT 0
/* id_aa64dfr0 */ +#define ID_AA64DFR0_TRBE_SHIFT 44 #define ID_AA64DFR0_TRACE_FILT_SHIFT 40 #define ID_AA64DFR0_DOUBLELOCK_SHIFT 36 #define ID_AA64DFR0_PMSVER_SHIFT 32 diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig index c119824..0f5e101 100644 --- a/drivers/hwtracing/coresight/Kconfig +++ b/drivers/hwtracing/coresight/Kconfig @@ -156,6 +156,17 @@ config CORESIGHT_CTI To compile this driver as a module, choose M here: the module will be called coresight-cti.
+config CORESIGHT_TRBE
- bool "Trace Buffer Extension (TRBE) driver"
Can you consider to support TRBE as loadable module since all coresight drivers support loadable module now.
Reworking the TRBE driver and making it a loadable module is part of it.
- Anshuman
While starting off the etm event, just abort and truncate the perf record if the perf handle as no space left. This avoids configuring both source and sink devices in case the data cannot be consumed in perf.
Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com --- drivers/hwtracing/coresight/coresight-etm-perf.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index ea73cfa..534e205 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -347,6 +347,9 @@ static void etm_event_start(struct perf_event *event, int flags) if (!event_data) goto fail;
+ if (!handle->size) + goto fail_end_stop; + /* * Check if this ETM is allowed to trace, as decided * at etm_setup_aux(). This could be due to an unreachable
perf handle structure needs to be shared with the TRBE IRQ handler for capturing trace data and restarting the handle. There is a probability of an undefined reference based crash when etm event is being stopped while a TRBE IRQ also getting processed. This happens due the release of perf handle via perf_aux_output_end(). This stops the sinks via the link before releasing the handle, which will ensure that a simultaneous TRBE IRQ could not happen.
Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com --- This might cause problem with traditional sink devices which can be operated in both sysfs and perf mode. This needs to be addressed correctly. One option would be to move the update_buffer callback into the respective sink devices. e.g, disable().
drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index 534e205..1a37991 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int mode)
size = sink_ops(sink)->update_buffer(sink, handle, event_data->snk_config); + coresight_disable_path(path); perf_aux_output_end(handle, size); + return; }
/* Disabling the path make its elements available to other sessions */
On 11/10/20 12:45 PM, Anshuman Khandual wrote:
perf handle structure needs to be shared with the TRBE IRQ handler for capturing trace data and restarting the handle. There is a probability of an undefined reference based crash when etm event is being stopped while a TRBE IRQ also getting processed. This happens due the release of perf handle via perf_aux_output_end(). This stops the sinks via the link before releasing the handle, which will ensure that a simultaneous TRBE IRQ could not happen.
Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com
This might cause problem with traditional sink devices which can be operated in both sysfs and perf mode. This needs to be addressed correctly. One option would be to move the update_buffer callback into the respective sink devices. e.g, disable().
drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index 534e205..1a37991 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int mode) size = sink_ops(sink)->update_buffer(sink, handle, event_data->snk_config);
perf_aux_output_end(handle, size);coresight_disable_path(path);
}return;
As you mentioned, this is not ideal where another session could be triggered on the sink from a different ETM (not for per-CPU sink) in a different mode before you collect the buffer. I believe the best option is to leave the update_buffer() to disable_hw. This would need to pass on the "handle" to the disable_path.
That way the races can be handled inside the sinks. Also, this aligns the perf mode of the sinks with that of the sysfs mode.
Suzuki
On 11/12/20 2:57 PM, Suzuki K Poulose wrote:
On 11/10/20 12:45 PM, Anshuman Khandual wrote:
perf handle structure needs to be shared with the TRBE IRQ handler for capturing trace data and restarting the handle. There is a probability of an undefined reference based crash when etm event is being stopped while a TRBE IRQ also getting processed. This happens due the release of perf handle via perf_aux_output_end(). This stops the sinks via the link before releasing the handle, which will ensure that a simultaneous TRBE IRQ could not happen.
Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com
This might cause problem with traditional sink devices which can be operated in both sysfs and perf mode. This needs to be addressed correctly. One option would be to move the update_buffer callback into the respective sink devices. e.g, disable().
drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ Â 1 file changed, 2 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index 534e205..1a37991 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int mode) Â Â Â Â Â Â Â Â Â Â size = sink_ops(sink)->update_buffer(sink, handle, Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â event_data->snk_config); +Â Â Â Â Â Â Â coresight_disable_path(path); Â Â Â Â Â Â Â Â Â perf_aux_output_end(handle, size); +Â Â Â Â Â Â Â return; Â Â Â Â Â }
As you mentioned, this is not ideal where another session could be triggered on the sink from a different ETM (not for per-CPU sink) in a different mode before you collect the buffer. I believe the best option is to leave the update_buffer() to disable_hw. This would need to pass on the "handle" to the disable_path.
Passing 'handle' into coresight_ops_sink->disable() would enable pushing updated trace data into perf aux buffer. But do you propose to drop the update_buffer() call back completely or just move it into disable() call back (along with PERF_EF_UPDATE mode check) for all individual sinks for now. May be, later it can be dropped off completely.
That way the races can be handled inside the sinks. Also, this aligns the perf mode of the sinks with that of the sysfs mode.
Did not get that, could you please elaborate ?
On 11/23/20 6:08 AM, Anshuman Khandual wrote:
On 11/12/20 2:57 PM, Suzuki K Poulose wrote:
On 11/10/20 12:45 PM, Anshuman Khandual wrote:
perf handle structure needs to be shared with the TRBE IRQ handler for capturing trace data and restarting the handle. There is a probability of an undefined reference based crash when etm event is being stopped while a TRBE IRQ also getting processed. This happens due the release of perf handle via perf_aux_output_end(). This stops the sinks via the link before releasing the handle, which will ensure that a simultaneous TRBE IRQ could not happen.
Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com
This might cause problem with traditional sink devices which can be operated in both sysfs and perf mode. This needs to be addressed correctly. One option would be to move the update_buffer callback into the respective sink devices. e.g, disable().
drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ Â 1 file changed, 2 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index 534e205..1a37991 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int mode) Â Â Â Â Â Â Â Â Â Â size = sink_ops(sink)->update_buffer(sink, handle, Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â event_data->snk_config); +Â Â Â Â Â Â Â coresight_disable_path(path); Â Â Â Â Â Â Â Â Â perf_aux_output_end(handle, size); +Â Â Â Â Â Â Â return; Â Â Â Â Â }
As you mentioned, this is not ideal where another session could be triggered on the sink from a different ETM (not for per-CPU sink) in a different mode before you collect the buffer. I believe the best option is to leave the update_buffer() to disable_hw. This would need to pass on the "handle" to the disable_path.
Passing 'handle' into coresight_ops_sink->disable() would enable pushing updated trace data into perf aux buffer. But do you propose to drop the update_buffer() call back completely or just move it into disable() call back (along with PERF_EF_UPDATE mode check) for all individual sinks for now. May be, later it can be dropped off completely.
Yes, once we update the buffer from within the sink_ops->disable(), we don't need the update buffer anymore. It is pointless to have a function that is provided to the external user.
That way the races can be handled inside the sinks. Also, this aligns the perf mode of the sinks with that of the sysfs mode.
Did not get that, could you please elaborate ?
In sysfs mode, we already do an action similar to "update buffer" for all the sinks. (e.g, see tmc_etr_sync_sysfs_buf() ). i.e, update the buffer before the sink is disabled. That is the same we propose above.
Suzuki
On 11/10/20 12:45 PM, Anshuman Khandual wrote:
perf handle structure needs to be shared with the TRBE IRQ handler for capturing trace data and restarting the handle. There is a probability of an undefined reference based crash when etm event is being stopped while a TRBE IRQ also getting processed. This happens due the release of perf handle via perf_aux_output_end(). This stops the sinks via the link before releasing the handle, which will ensure that a simultaneous TRBE IRQ could not happen.
Or in other words :
We now have :
update_buffer()
perf_aux_output_end(handle)
... disable_path()
This is problematic due to various reasons :
1) The semantics of update_buffer() is not clear. i.e, whether it should leave the "sink" "stopped" or "disabled" or "active"
2) This breaks the recommended trace collection sequence of "flush" and "stop" from source to the sink for trace collection. i.e, we stop the source now. But don't flush the components from source to sink, rather we stop and flush from the sink. And we flush and stop the path after we have collected the trace data at sink, which is pointless.
3) For a sink with IRQ handler, if we don't stop the sink with update_buffer(), we could have a situation :
update_buffer()
perf_aux_outpuf_end(handle) # handle is invalid now
-----------------> IRQ -> irq_handler() perf_aux_output_end(handle) # Wrong !
disable_path()
The sysfs mode is fine, as we defer the trace collection to disable_path().
The proposed patch is still racy, as we could still hit the problem.
So, to avoid all of these situations, I think we should defer the the update_buffer() to sink_ops->disable(), when we have flushed and stopped the all the components upstream and avoid any races with the IRQ handler.
i.e,
source_ops->stop(csdev);
disable_path(handle); // similar to the enable_path
sink_ops->disable(csdev, handle) { /* flush & stop */
/* collect trace */ perf_aux_output_end(handle, size); }
Kind regards Suzuki
Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com
This might cause problem with traditional sink devices which can be operated in both sysfs and perf mode. This needs to be addressed correctly. One option would be to move the update_buffer callback into the respective sink devices. e.g, disable().
drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index 534e205..1a37991 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int mode) size = sink_ops(sink)->update_buffer(sink, handle, event_data->snk_config);
perf_aux_output_end(handle, size);coresight_disable_path(path);
}return;
/* Disabling the path make its elements available to other sessions */
On Fri, Nov 27, 2020 at 10:32:28AM +0000, Suzuki K Poulose wrote:
On 11/10/20 12:45 PM, Anshuman Khandual wrote:
perf handle structure needs to be shared with the TRBE IRQ handler for capturing trace data and restarting the handle. There is a probability of an undefined reference based crash when etm event is being stopped while a TRBE IRQ also getting processed. This happens due the release of perf handle via perf_aux_output_end(). This stops the sinks via the link before releasing the handle, which will ensure that a simultaneous TRBE IRQ could not happen.
Or in other words :
We now have :
update_buffer()
perf_aux_output_end(handle)
... disable_path()
This is problematic due to various reasons :
- The semantics of update_buffer() is not clear. i.e, whether it should leave the "sink" "stopped" or "disabled" or "active"
I'm a little confused by the above as the modes that apply here are CS_MODE_DISABLED and CS_MODE_PERF, so I'll go with those. Let me know if you meant something else.
So far ->update_buffer() doesn't touch drvdata->mode and as such it is still set to CS_MODE_PERF when the update has completed.
- This breaks the recommended trace collection sequence of "flush" and "stop" from source to the sink for trace collection. i.e, we stop the source now. But don't flush the components from source to sink, rather we stop and flush from the sink. And we flush and stop the path after we have collected the trace data at sink, which is pointless.
The above assesment is correct. Fixing it though has far reaching ramifications that go far beyond the scope of this patch.
For a sink with IRQ handler, if we don't stop the sink with update_buffer(), we could have a situation :
update_buffer()
perf_aux_outpuf_end(handle) # handle is invalid now
-----------------> IRQ -> irq_handler() perf_aux_output_end(handle) # Wrong !
disable_path()
That's the picture of the issue I had in my head when looking at the code - I'm glad we came to the same conclusion.
The sysfs mode is fine, as we defer the trace collection to disable_path().
The proposed patch is still racy, as we could still hit the problem.
So, to avoid all of these situations, I think we should defer the the update_buffer() to sink_ops->disable(), when we have flushed and stopped the all the components upstream and avoid any races with the IRQ handler.
i.e,
source_ops->stop(csdev);
disable_path(handle); // similar to the enable_path
sink_ops->disable(csdev, handle) { /* flush & stop */
/* collect trace */ perf_aux_output_end(handle, size); }
That is one solution. The advantage here is that it takes care of the flusing problem you described above. On the flip side it is moving a lot of code around, something that is better to do in another set.
Another solution is to disable the TRBE IRQ in ->udpate_buffer(). The ETR does the same kind of thing with tmc_flush_and_stop(). I don't know how feasible that is but it would be a simple solution for this set. Properly flushing the pipeline could be done later. I'm fine with either approach.
Thanks, Mathieu
Kind regards Suzuki
Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com
This might cause problem with traditional sink devices which can be operated in both sysfs and perf mode. This needs to be addressed correctly. One option would be to move the update_buffer callback into the respective sink devices. e.g, disable().
drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index 534e205..1a37991 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int mode) size = sink_ops(sink)->update_buffer(sink, handle, event_data->snk_config);
perf_aux_output_end(handle, size);coresight_disable_path(path);
} /* Disabling the path make its elements available to other sessions */return;
On 12/11/20 8:31 PM, Mathieu Poirier wrote:
On Fri, Nov 27, 2020 at 10:32:28AM +0000, Suzuki K Poulose wrote:
On 11/10/20 12:45 PM, Anshuman Khandual wrote:
perf handle structure needs to be shared with the TRBE IRQ handler for capturing trace data and restarting the handle. There is a probability of an undefined reference based crash when etm event is being stopped while a TRBE IRQ also getting processed. This happens due the release of perf handle via perf_aux_output_end(). This stops the sinks via the link before releasing the handle, which will ensure that a simultaneous TRBE IRQ could not happen.
Or in other words :
We now have :
update_buffer()
perf_aux_output_end(handle)
... disable_path()
This is problematic due to various reasons :
- The semantics of update_buffer() is not clear. i.e, whether it should leave the "sink" "stopped" or "disabled" or "active"
I'm a little confused by the above as the modes that apply here are CS_MODE_DISABLED and CS_MODE_PERF, so I'll go with those. Let me know if you meant something else.
Sorry, I think it is a bit confusing.
stopped => Sink is in stopped HW state, but the software mode is not changed (i.e, could be PERF or SYSF)
disabled => Sink is in stopped hw state, the software mode is DISABLED
active => Sink is active and flushing trace, with respective mode (PERF vs SYSFS).
So far ->update_buffer() doesn't touch drvdata->mode and as such it is still set to CS_MODE_PERF when the update has completed.
- This breaks the recommended trace collection sequence of "flush" and "stop" from source to the sink for trace collection. i.e, we stop the source now. But don't flush the components from source to sink, rather we stop and flush from the sink. And we flush and stop the path after we have collected the trace data at sink, which is pointless.
The above assesment is correct. Fixing it though has far reaching ramifications that go far beyond the scope of this patch.
For a sink with IRQ handler, if we don't stop the sink with update_buffer(), we could have a situation :
update_buffer()
perf_aux_outpuf_end(handle) # handle is invalid now
-----------------> IRQ -> irq_handler() perf_aux_output_end(handle) # Wrong !
disable_path()
That's the picture of the issue I had in my head when looking at the code - I'm glad we came to the same conclusion.
The sysfs mode is fine, as we defer the trace collection to disable_path().
The proposed patch is still racy, as we could still hit the problem.
So, to avoid all of these situations, I think we should defer the the update_buffer() to sink_ops->disable(), when we have flushed and stopped the all the components upstream and avoid any races with the IRQ handler.
i.e,
source_ops->stop(csdev);
disable_path(handle); // similar to the enable_path
sink_ops->disable(csdev, handle) { /* flush & stop */
/* collect trace */ perf_aux_output_end(handle, size); }
That is one solution. The advantage here is that it takes care of the flusing problem you described above. On the flip side it is moving a lot of code around, something that is better to do in another set.
Another solution is to disable the TRBE IRQ in ->udpate_buffer(). The ETR does the same kind of thing with tmc_flush_and_stop(). I don't know how feasible that is but it would be a simple solution for this set. Properly flushing the pipeline could be done later. I'm fine with either approach.
Agreed. I think this is reasonable forthis set. i.e, leave the hardware disabled. We could do the proper solution above as a separate series, to keep the changes incremental.
Kind regards Suzuki
Unlike traditional sink devices, individual TRBE instances are not detected via DT or ACPI nodes. Instead TRBE instances are detected during CPU online process. Hence a path connecting ETE and TRBE on a given CPU would not have been established until then. This adds two coresight helpers that will help modify outward connections from a source device to establish and terminate path to a given sink device. But this method might not be optimal and would be reworked later.
Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com --- drivers/hwtracing/coresight/coresight-etm-perf.c | 30 ++++++++++++++++++++++++ drivers/hwtracing/coresight/coresight-etm-perf.h | 4 ++++ drivers/hwtracing/coresight/coresight-platform.c | 3 ++- drivers/hwtracing/coresight/coresight-trbe.c | 2 ++ include/linux/coresight.h | 2 ++ 5 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index 1a37991..b4ab1d4 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -664,3 +664,33 @@ void __exit etm_perf_exit(void) { perf_pmu_unregister(&etm_pmu); } + +#ifdef CONFIG_CORESIGHT_TRBE +void coresight_trbe_connect_ete(struct coresight_device *csdev_trbe, int cpu) +{ + struct coresight_device *csdev_ete = per_cpu(csdev_src, cpu); + + if (!csdev_ete) { + pr_err("Corresponding ETE device not present on cpu %d\n", cpu); + return; + } + csdev_ete->def_sink = csdev_trbe; + csdev_ete->pdata->nr_outport++; + if (!csdev_ete->pdata->conns) + coresight_alloc_conns(&csdev_ete->dev, csdev_ete->pdata); + csdev_ete->pdata->conns[csdev_ete->pdata->nr_outport - 1].child_dev = csdev_trbe; +} + +void coresight_trbe_remove_ete(struct coresight_device *csdev_trbe, int cpu) +{ + struct coresight_device *csdev_ete = per_cpu(csdev_src, cpu); + + if (!csdev_ete) { + pr_err("Corresponding ETE device not present on cpu %d\n", cpu); + return; + } + csdev_ete->pdata->conns[csdev_ete->pdata->nr_outport - 1].child_dev = NULL; + csdev_ete->def_sink = NULL; + csdev_ete->pdata->nr_outport--; +} +#endif diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.h b/drivers/hwtracing/coresight/coresight-etm-perf.h index 3e4f2ad..20386cf 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.h +++ b/drivers/hwtracing/coresight/coresight-etm-perf.h @@ -85,4 +85,8 @@ static inline void *etm_perf_sink_config(struct perf_output_handle *handle) int __init etm_perf_init(void); void __exit etm_perf_exit(void);
+#ifdef CONFIG_CORESIGHT_TRBE +void coresight_trbe_connect_ete(struct coresight_device *csdev, int cpu); +void coresight_trbe_remove_ete(struct coresight_device *csdev, int cpu); +#endif #endif diff --git a/drivers/hwtracing/coresight/coresight-platform.c b/drivers/hwtracing/coresight/coresight-platform.c index c594f45..8fa7406 100644 --- a/drivers/hwtracing/coresight/coresight-platform.c +++ b/drivers/hwtracing/coresight/coresight-platform.c @@ -23,7 +23,7 @@ * coresight_alloc_conns: Allocate connections record for each output * port from the device. */ -static int coresight_alloc_conns(struct device *dev, +int coresight_alloc_conns(struct device *dev, struct coresight_platform_data *pdata) { if (pdata->nr_outport) { @@ -35,6 +35,7 @@ static int coresight_alloc_conns(struct device *dev,
return 0; } +EXPORT_SYMBOL_GPL(coresight_alloc_conns);
static struct device * coresight_find_device_by_fwnode(struct fwnode_handle *fwnode) diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 48a8ec3..afd1a1c 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -507,6 +507,7 @@ static void arm_trbe_probe_coresight_cpu(void *info) if (IS_ERR(cpudata->csdev)) goto cpu_clear;
+ coresight_trbe_connect_ete(cpudata->csdev, cpudata->cpu); dev_set_drvdata(&cpudata->csdev->dev, cpudata); cpudata->trbe_dbm = get_trbe_flag_update(); cpudata->trbe_align = 1ULL << get_trbe_address_align(); @@ -586,6 +587,7 @@ static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node)
if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) { cpudata = per_cpu_ptr(drvdata->cpudata, cpu); + coresight_trbe_remove_ete(cpudata->csdev, cpu); if (cpudata->csdev) { coresight_unregister(cpudata->csdev); cpudata->drvdata = NULL; diff --git a/include/linux/coresight.h b/include/linux/coresight.h index c2d0a2a..c657813 100644 --- a/include/linux/coresight.h +++ b/include/linux/coresight.h @@ -496,6 +496,8 @@ void coresight_relaxed_write64(struct coresight_device *csdev, u64 val, u32 offset); void coresight_write64(struct coresight_device *csdev, u64 val, u32 offset);
+int coresight_alloc_conns(struct device *dev, + struct coresight_platform_data *pdata);
#else static inline struct coresight_device *
Hi Anshuman, On 11/10/20 12:45 PM, Anshuman Khandual wrote:
Unlike traditional sink devices, individual TRBE instances are not detected via DT or ACPI nodes. Instead TRBE instances are detected during CPU online process. Hence a path connecting ETE and TRBE on a given CPU would not have been established until then. This adds two coresight helpers that will help modify outward connections from a source device to establish and terminate path to a given sink device. But this method might not be optimal and would be reworked later.
Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com
Instead of this, could we come up something like a percpu_sink concept ? That way, the TRBE driver could register the percpu_sink for the corresponding CPU and we don't have to worry about the order in which the ETE will be probed on a hotplugged CPU. (i.e, if the TRBE is probed before the ETE, the following approach would fail to register the sink).
And the default sink can be initialized when the ETE instance first starts looking for it.
Suzuki
On 11/12/20 3:01 PM, Suzuki K Poulose wrote:
Hi Anshuman, On 11/10/20 12:45 PM, Anshuman Khandual wrote:
Unlike traditional sink devices, individual TRBE instances are not detected via DT or ACPI nodes. Instead TRBE instances are detected during CPU online process. Hence a path connecting ETE and TRBE on a given CPU would not have been established until then. This adds two coresight helpers that will help modify outward connections from a source device to establish and terminate path to a given sink device. But this method might not be optimal and would be reworked later.
Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com
Instead of this, could we come up something like a percpu_sink concept ? That way, the TRBE driver could register the percpu_sink for the corresponding CPU and we don't have to worry about the order in which the ETE will be probed on a hotplugged CPU. (i.e, if the TRBE is probed before the ETE, the following approach would fail to register the sink).
Right, it wont work.
We already have a per cpu csdev sink. The current mechanism expects all ETEs to have been established and the TRBEs just get plugged in during their init while probing each individual cpus. During cpu hotplug in or out, a TRBE-ETE link either gets created and destroyed. But it assumes that an ETE is always present for TRBE to get plugged into or teared down from. csdev for TRBE sink too gets released during cpu hot remove path.
Are you suggesting that there should be a percpu static csdev array defined for potential all TRBEs so that the ETE-TRBE links be permanently established given that the ETEs are permanent and never really go away with cpu hot remove event (my assumption). TRBE csdevs should just get enabled or disabled without really being destroyed during cpu hotplug, so that the corresponding TRBE-ETE connection remains in place.
And the default sink can be initialized when the ETE instance first starts looking for it.
IIUC def_sink is the sink which will be selected by default for a source device while creating a path, in case there is no clear preference from the user. ETE's default sink should be fixed (TRBE) to be on the easy side and hence assigning that during connection expansion procedure, does make sense. But then it can be more complex where the 'default' sink for an ETE can be scenario specific and may not be always be its TRBE.
The expanding connections fits into a scenario where the ETE is present with all it's other traditional sinks and TRBE is the one which comes in or goes out with the cpu.
If ETE also comes in and goes out with individual cpu hotplug which is preferred ideally, we would need to also
1. Co-ordinate with TRBE bring up and connection creation to avoid race 2. Rediscover traditional sinks which were attached to the ETE before - go back, rescan the DT/ACPI entries for sinks with whom a path can be established etc.
Basically there are three choices we have here
1. ETE is permanent, TRBE and ETE-TRBE path gets created or destroyed with hotplug (current proposal) 2. ETE/TRBE/ETE-TRBE path are all permanent, ETE and TRBE get enabled or disabled with hotplug 3. ETE, TRBE and ETE-TRBE path, all get created, enabled and destroyed with hotplug in sync
- Anshuman
On Tue, Nov 10, 2020 at 06:15:08PM +0530, Anshuman Khandual wrote:
Unlike traditional sink devices, individual TRBE instances are not detected via DT or ACPI nodes. Instead TRBE instances are detected during CPU online process. Hence a path connecting ETE and TRBE on a given CPU would not have been established until then. This adds two coresight helpers that will help modify outward connections from a source device to establish and terminate path to a given sink device. But this method might not be optimal and would be reworked later.
Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com
drivers/hwtracing/coresight/coresight-etm-perf.c | 30 ++++++++++++++++++++++++ drivers/hwtracing/coresight/coresight-etm-perf.h | 4 ++++ drivers/hwtracing/coresight/coresight-platform.c | 3 ++- drivers/hwtracing/coresight/coresight-trbe.c | 2 ++ include/linux/coresight.h | 2 ++ 5 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index 1a37991..b4ab1d4 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -664,3 +664,33 @@ void __exit etm_perf_exit(void) { perf_pmu_unregister(&etm_pmu); }
+#ifdef CONFIG_CORESIGHT_TRBE +void coresight_trbe_connect_ete(struct coresight_device *csdev_trbe, int cpu) +{
- struct coresight_device *csdev_ete = per_cpu(csdev_src, cpu);
As Suzuki pointed out that won't work if the TRBE gets probed before the ETMv4-ETE. I also agree with Suzuki this situation should be better handled with a per csdev_trbe that should be declared in the coresight-core.c file. That way both sysfs and perf have access to it.
- if (!csdev_ete) {
pr_err("Corresponding ETE device not present on cpu %d\n", cpu);
return;
- }
- csdev_ete->def_sink = csdev_trbe;
That should be done in function coresight_find_default_sink(). If per_cpu(csdev_trbe, cpu) exists then that's the what we pick. If not then move along with coresight_find_sink().
- csdev_ete->pdata->nr_outport++;
- if (!csdev_ete->pdata->conns)
coresight_alloc_conns(&csdev_ete->dev, csdev_ete->pdata);
- csdev_ete->pdata->conns[csdev_ete->pdata->nr_outport - 1].child_dev = csdev_trbe;
I don't think we have to go through all that dance since the TRBE is directly connected to the ETE. With the above about coresight_find_default_sink() in mind, all we need to do is fix coresight_build_path() to check if the sink parameter is the same as csdev->def_sink. If so then just add the sink to the patch, no need to follow ports as we do for other classic components.
Thanks, Mathieu
+}
+void coresight_trbe_remove_ete(struct coresight_device *csdev_trbe, int cpu) +{
- struct coresight_device *csdev_ete = per_cpu(csdev_src, cpu);
- if (!csdev_ete) {
pr_err("Corresponding ETE device not present on cpu %d\n", cpu);
return;
- }
- csdev_ete->pdata->conns[csdev_ete->pdata->nr_outport - 1].child_dev = NULL;
- csdev_ete->def_sink = NULL;
- csdev_ete->pdata->nr_outport--;
+} +#endif diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.h b/drivers/hwtracing/coresight/coresight-etm-perf.h index 3e4f2ad..20386cf 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.h +++ b/drivers/hwtracing/coresight/coresight-etm-perf.h @@ -85,4 +85,8 @@ static inline void *etm_perf_sink_config(struct perf_output_handle *handle) int __init etm_perf_init(void); void __exit etm_perf_exit(void); +#ifdef CONFIG_CORESIGHT_TRBE +void coresight_trbe_connect_ete(struct coresight_device *csdev, int cpu); +void coresight_trbe_remove_ete(struct coresight_device *csdev, int cpu); +#endif #endif diff --git a/drivers/hwtracing/coresight/coresight-platform.c b/drivers/hwtracing/coresight/coresight-platform.c index c594f45..8fa7406 100644 --- a/drivers/hwtracing/coresight/coresight-platform.c +++ b/drivers/hwtracing/coresight/coresight-platform.c @@ -23,7 +23,7 @@
- coresight_alloc_conns: Allocate connections record for each output
- port from the device.
*/ -static int coresight_alloc_conns(struct device *dev, +int coresight_alloc_conns(struct device *dev, struct coresight_platform_data *pdata) { if (pdata->nr_outport) { @@ -35,6 +35,7 @@ static int coresight_alloc_conns(struct device *dev, return 0; } +EXPORT_SYMBOL_GPL(coresight_alloc_conns); static struct device * coresight_find_device_by_fwnode(struct fwnode_handle *fwnode) diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 48a8ec3..afd1a1c 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -507,6 +507,7 @@ static void arm_trbe_probe_coresight_cpu(void *info) if (IS_ERR(cpudata->csdev)) goto cpu_clear;
- coresight_trbe_connect_ete(cpudata->csdev, cpudata->cpu); dev_set_drvdata(&cpudata->csdev->dev, cpudata); cpudata->trbe_dbm = get_trbe_flag_update(); cpudata->trbe_align = 1ULL << get_trbe_address_align();
@@ -586,6 +587,7 @@ static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node) if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) { cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
if (cpudata->csdev) { coresight_unregister(cpudata->csdev); cpudata->drvdata = NULL;coresight_trbe_remove_ete(cpudata->csdev, cpu);
diff --git a/include/linux/coresight.h b/include/linux/coresight.h index c2d0a2a..c657813 100644 --- a/include/linux/coresight.h +++ b/include/linux/coresight.h @@ -496,6 +496,8 @@ void coresight_relaxed_write64(struct coresight_device *csdev, u64 val, u32 offset); void coresight_write64(struct coresight_device *csdev, u64 val, u32 offset); +int coresight_alloc_conns(struct device *dev,
struct coresight_platform_data *pdata);
#else static inline struct coresight_device * -- 2.7.4
This patch documents the device tree binding in use for Arm TRBE.
Signed-off-by: Anshuman Khandual anshuman.khandual@arm.com --- Documentation/devicetree/bindings/arm/trbe.txt | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) create mode 100644 Documentation/devicetree/bindings/arm/trbe.txt
diff --git a/Documentation/devicetree/bindings/arm/trbe.txt b/Documentation/devicetree/bindings/arm/trbe.txt new file mode 100644 index 0000000..4bb5b09 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/trbe.txt @@ -0,0 +1,20 @@ +* Trace Buffer Extension (TRBE) + +Trace Buffer Extension (TRBE) is used for collecting trace data generated +from a corresponding trace unit (ETE) using an in memory trace buffer. + +** TRBE Required properties: + +- compatible : should be one of: + "arm,arm-trbe" + +- interrupts : Exactly 1 PPI must be listed. For heterogeneous systems where + TRBE is only supported on a subset of the CPUs, please consult + the arm,gic-v3 binding for details on describing a PPI partition. + +** Example: + +trbe { + compatible = "arm,arm-trbe"; + interrupts = <GIC_PPI 15 IRQ_TYPE_LEVEL_HIGH>; +};
Hi Anshuman,
On Tue, 10 Nov 2020 at 05:45, Anshuman Khandual anshuman.khandual@arm.com wrote:
This series enables future IP trace features Embedded Trace Extension (ETE) and Trace Buffer Extension (TRBE). This series depends on the ETM system register instruction support series [0] and the v8.4 Self hosted tracing support series (Jonathan Zhou) [1]. The tree is available here [2] for quick access.
ETE is the PE (CPU) trace unit for CPUs, implementing future architecture extensions. ETE overlaps with the ETMv4 architecture, with additions to support the newer architecture features and some restrictions on the supported features w.r.t ETMv4. The ETE support is added by extending the ETMv4 driver to recognise the ETE and handle the features as exposed by the TRCIDRx registers. ETE only supports system instructions access from the host CPU. The ETE could be integrated with a TRBE (see below), or with the legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware description as the ETMs and requires a node per instance.
Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is accessible via the system registers and can be combined with the ETE to provide a 1x1 configuration of source & sink. TRBE is being represented here as a CoreSight sink. Primary reason is that the ETE source could work with other traditional CoreSight sink devices. As TRBE captures the trace data which is produced by ETE, it cannot work alone.
TRBE representation here have some distinct deviations from a traditional CoreSight sink device. Coresight path between ETE and TRBE are not built during boot looking at respective DT or ACPI entries. Instead TRBE gets checked on each available CPU, when found gets connected with respective ETE source device on the same CPU, after altering its outward connections. ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE coupling/decoupling method implemented here is not optimal and would be reworked later on.
Unlike traditional sinks, TRBE can generate interrupts to signal including many other things, buffer got filled. The interrupt is a PPI and should be communicated from the platform. DT or ACPI entry representing TRBE should have the PPI number for a given platform. During perf session, the TRBE IRQ handler should capture trace for perf auxiliary buffer before restarting it back. System registers being used here to configure ETE and TRBE could be referred in the link below.
https://developer.arm.com/docs/ddi0601/g/aarch64-system-registers.
This adds another change where CoreSight sink device needs to be disabled before capturing the trace data for perf in order to avoid race condition with another simultaneous TRBE IRQ handling. This might cause problem with traditional sink devices which can be operated in both sysfs and perf mode. This needs to be addressed correctly. One option would be to move the update_buffer callback into the respective sink devices. e.g, disable().
This series is primarily looking from some early feed back both on proposed design and its implementation. It acknowledges, that it might be incomplete and will have scopes for improvement.
Things todo:
- Improve ETE-TRBE coupling and decoupling method
- Improve TRBE IRQ handling for all possible corner cases
- Implement sysfs based trace sessions
[0] https://lore.kernel.org/linux-arm-kernel/20201028220945.3826358-1-suzuki.pou... [1] https://lore.kernel.org/linux-arm-kernel/1600396210-54196-1-git-send-email-j... [2] https://gitlab.arm.com/linux-arm/linux-skp/-/tree/coresight/etm/v8.4-self-ho...
Anshuman Khandual (6): arm64: Add TRBE definitions coresight: sink: Add TRBE driver coresight: etm-perf: Truncate the perf record if handle has no space coresight: etm-perf: Disable the path before capturing the trace data coresgith: etm-perf: Connect TRBE sink with ETE source dts: bindings: Document device tree binding for Arm TRBE
Suzuki K Poulose (5): coresight: etm-perf: Allow an event to use different sinks coresight: Do not scan for graph if none is present coresight: etm4x: Add support for PE OS lock coresight: ete: Add support for sysreg support coresight: ete: Detect ETE as one of the supported ETMs
.../devicetree/bindings/arm/coresight.txt | 3 + Documentation/devicetree/bindings/arm/trbe.txt | 20 + Documentation/trace/coresight/coresight-trbe.rst | 36 + arch/arm64/include/asm/sysreg.h | 51 ++ drivers/hwtracing/coresight/Kconfig | 11 + drivers/hwtracing/coresight/Makefile | 1 + drivers/hwtracing/coresight/coresight-etm-perf.c | 85 ++- drivers/hwtracing/coresight/coresight-etm-perf.h | 4 + drivers/hwtracing/coresight/coresight-etm4x-core.c | 144 +++- drivers/hwtracing/coresight/coresight-etm4x.h | 64 +- drivers/hwtracing/coresight/coresight-platform.c | 9 +- drivers/hwtracing/coresight/coresight-trbe.c | 768 +++++++++++++++++++++ drivers/hwtracing/coresight/coresight-trbe.h | 525 ++++++++++++++ include/linux/coresight.h | 2 + 14 files changed, 1680 insertions(+), 43 deletions(-)
This is to confirm that I have received your work and it is now on my list of patchset to review. However doing so likely won't happen before a couple of weeks because of patchsets already in the queue. I will touch base with you again if there are further delays.
Thanks, Mathieu
create mode 100644 Documentation/devicetree/bindings/arm/trbe.txt create mode 100644 Documentation/trace/coresight/coresight-trbe.rst create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
-- 2.7.4
Hi Anshuman,
On Tue, Nov 10, 2020 at 08:44:58PM +0800, Anshuman Khandual wrote:
This series enables future IP trace features Embedded Trace Extension (ETE) and Trace Buffer Extension (TRBE). This series depends on the ETM system register instruction support series [0] and the v8.4 Self hosted tracing support series (Jonathan Zhou) [1]. The tree is available here [2] for quick access.
ETE is the PE (CPU) trace unit for CPUs, implementing future architecture extensions. ETE overlaps with the ETMv4 architecture, with additions to support the newer architecture features and some restrictions on the supported features w.r.t ETMv4. The ETE support is added by extending the ETMv4 driver to recognise the ETE and handle the features as exposed by the TRCIDRx registers. ETE only supports system instructions access from the host CPU. The ETE could be integrated with a TRBE (see below), or with the legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware description as the ETMs and requires a node per instance.
Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is accessible via the system registers and can be combined with the ETE to provide a 1x1 configuration of source & sink. TRBE is being represented here as a CoreSight sink. Primary reason is that the ETE source could work with other traditional CoreSight sink devices. As TRBE captures the trace data which is produced by ETE, it cannot work alone.
TRBE representation here have some distinct deviations from a traditional CoreSight sink device. Coresight path between ETE and TRBE are not built during boot looking at respective DT or ACPI entries. Instead TRBE gets checked on each available CPU, when found gets connected with respective ETE source device on the same CPU, after altering its outward connections. ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE coupling/decoupling method implemented here is not optimal and would be reworked later on.
Only perf mode is supported in TRBE in current path. Will you consider support sysfs mode as well in following patch sets?
Thanks, Tingwei
Unlike traditional sinks, TRBE can generate interrupts to signal including many other things, buffer got filled. The interrupt is a PPI and should be communicated from the platform. DT or ACPI entry representing TRBE should have the PPI number for a given platform. During perf session, the TRBE IRQ handler should capture trace for perf auxiliary buffer before restarting it back. System registers being used here to configure ETE and TRBE could be referred in the link below.
https://developer.arm.com/docs/ddi0601/g/aarch64-system-registers.
This adds another change where CoreSight sink device needs to be disabled before capturing the trace data for perf in order to avoid race condition with another simultaneous TRBE IRQ handling. This might cause problem with traditional sink devices which can be operated in both sysfs and perf mode. This needs to be addressed correctly. One option would be to move the update_buffer callback into the respective sink devices. e.g, disable().
This series is primarily looking from some early feed back both on proposed design and its implementation. It acknowledges, that it might be incomplete and will have scopes for improvement.
Things todo:
- Improve ETE-TRBE coupling and decoupling method
- Improve TRBE IRQ handling for all possible corner cases
- Implement sysfs based trace sessions
[0] https://lore.kernel.org/linux-arm-kernel/20201028220945.3826358-1-suzuki.pou... [1] https://lore.kernel.org/linux-arm-kernel/1600396210-54196-1-git-send-email-j... [2] https://gitlab.arm.com/linux-arm/linux-skp/-/tree/coresight/etm/v8.4-self-ho...
Anshuman Khandual (6): arm64: Add TRBE definitions coresight: sink: Add TRBE driver coresight: etm-perf: Truncate the perf record if handle has no space coresight: etm-perf: Disable the path before capturing the trace data coresgith: etm-perf: Connect TRBE sink with ETE source dts: bindings: Document device tree binding for Arm TRBE
Suzuki K Poulose (5): coresight: etm-perf: Allow an event to use different sinks coresight: Do not scan for graph if none is present coresight: etm4x: Add support for PE OS lock coresight: ete: Add support for sysreg support coresight: ete: Detect ETE as one of the supported ETMs
.../devicetree/bindings/arm/coresight.txt | 3 + Documentation/devicetree/bindings/arm/trbe.txt | 20 + Documentation/trace/coresight/coresight-trbe.rst | 36 + arch/arm64/include/asm/sysreg.h | 51 ++ drivers/hwtracing/coresight/Kconfig | 11 + drivers/hwtracing/coresight/Makefile | 1 + drivers/hwtracing/coresight/coresight-etm-perf.c | 85 ++- drivers/hwtracing/coresight/coresight-etm-perf.h | 4 + drivers/hwtracing/coresight/coresight-etm4x-core.c | 144 +++- drivers/hwtracing/coresight/coresight-etm4x.h | 64 +- drivers/hwtracing/coresight/coresight-platform.c | 9 +- drivers/hwtracing/coresight/coresight-trbe.c | 768 +++++++++++++++++++++ drivers/hwtracing/coresight/coresight-trbe.h | 525 ++++++++++++++ include/linux/coresight.h | 2 + 14 files changed, 1680 insertions(+), 43 deletions(-) create mode 100644 Documentation/devicetree/bindings/arm/trbe.txt create mode 100644 Documentation/trace/coresight/coresight-trbe.rst create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
-- 2.7.4
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
Hi Anshuman,
I've not looked in detail at this set yet, but having skimmed through it I do have an initial question about the handling of wrapped data buffers.
With the ETR/ETB we found an issue with the way perf concatenated data captured from the hardware buffer into a single contiguous data block. The issue occurs when a wrapped buffer appears after another buffer in the data file. In a typical session perf would stop trace and copy the hardware buffer multiple times into the auxtrace buffer.
e.g.
For ETR/ETB we have a fixed length hardware data buffer - and no way of detecting buffer wraps using interrupts as the tracing is in progress.
If the buffer is not full at the point that perf transfers it then the data will look like this:- 1) <async><synced trace data> easy to decode, we can see the async at the start of the data - which would be the async issued at the start of trace.
If the buffer wraps we see this:-
2) <unsynced trace data><async><synced trace data>
Again no real issue, the decoder will skip to the async and trace from there - we lose the unsynced data.
Now the problem occurs when multiple transfers of data occur. We can see the following appearing as contiguous trace in the auxtrace buffer:-
3) < async><synced trace data><unsynced trace data><async><synced trace data>
Now the decoder cannot spot the point that the synced data from the first capture ends, and the unsynced data from the second capture begins. This means it will continue to decode into the unsynced data - which will result in incorrect trace / outright errors. To get round this for ETR/ETB the driver will insert barrier packets into the datafile if a wrap event is detected.
4) <async><synced trace data><barrier><unsynced trace data><async><synced trace data>
This <barrier> has the effect of resetting the decoder into the unsynced state so that the invalid trace is not decoded. This is a workaround we have to do to handle the limitations of the ETR / ETB trace hardware.
For TRBE we do have interrupts, so it should be possible to prevent the buffer wrapping in most cases - but I did see in the code that there are handlers for the TRBE buffer wrap management event. Are there other factors in play that will prevent data pattern 3) from appearing in the auxtrace buffer?
Regards
Mike
On Sat, 14 Nov 2020 at 05:17, Tingwei Zhang tingweiz@codeaurora.org wrote:
Hi Anshuman,
On Tue, Nov 10, 2020 at 08:44:58PM +0800, Anshuman Khandual wrote:
This series enables future IP trace features Embedded Trace Extension (ETE) and Trace Buffer Extension (TRBE). This series depends on the ETM system register instruction support series [0] and the v8.4 Self hosted tracing support series (Jonathan Zhou) [1]. The tree is available here [2] for quick access.
ETE is the PE (CPU) trace unit for CPUs, implementing future architecture extensions. ETE overlaps with the ETMv4 architecture, with additions to support the newer architecture features and some restrictions on the supported features w.r.t ETMv4. The ETE support is added by extending the ETMv4 driver to recognise the ETE and handle the features as exposed by the TRCIDRx registers. ETE only supports system instructions access from the host CPU. The ETE could be integrated with a TRBE (see below), or with the legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware description as the ETMs and requires a node per instance.
Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is accessible via the system registers and can be combined with the ETE to provide a 1x1 configuration of source & sink. TRBE is being represented here as a CoreSight sink. Primary reason is that the ETE source could work with other traditional CoreSight sink devices. As TRBE captures the trace data which is produced by ETE, it cannot work alone.
TRBE representation here have some distinct deviations from a traditional CoreSight sink device. Coresight path between ETE and TRBE are not built during boot looking at respective DT or ACPI entries. Instead TRBE gets checked on each available CPU, when found gets connected with respective ETE source device on the same CPU, after altering its outward connections. ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE coupling/decoupling method implemented here is not optimal and would be reworked later on.
Only perf mode is supported in TRBE in current path. Will you consider support sysfs mode as well in following patch sets?
Thanks, Tingwei
Unlike traditional sinks, TRBE can generate interrupts to signal including many other things, buffer got filled. The interrupt is a PPI and should be communicated from the platform. DT or ACPI entry representing TRBE should have the PPI number for a given platform. During perf session, the TRBE IRQ handler should capture trace for perf auxiliary buffer before restarting it back. System registers being used here to configure ETE and TRBE could be referred in the link below.
https://developer.arm.com/docs/ddi0601/g/aarch64-system-registers.
This adds another change where CoreSight sink device needs to be disabled before capturing the trace data for perf in order to avoid race condition with another simultaneous TRBE IRQ handling. This might cause problem with traditional sink devices which can be operated in both sysfs and perf mode. This needs to be addressed correctly. One option would be to move the update_buffer callback into the respective sink devices. e.g, disable().
This series is primarily looking from some early feed back both on proposed design and its implementation. It acknowledges, that it might be incomplete and will have scopes for improvement.
Things todo:
- Improve ETE-TRBE coupling and decoupling method
- Improve TRBE IRQ handling for all possible corner cases
- Implement sysfs based trace sessions
[0] https://lore.kernel.org/linux-arm-kernel/20201028220945.3826358-1-suzuki.pou... [1] https://lore.kernel.org/linux-arm-kernel/1600396210-54196-1-git-send-email-j... [2] https://gitlab.arm.com/linux-arm/linux-skp/-/tree/coresight/etm/v8.4-self-ho...
Anshuman Khandual (6): arm64: Add TRBE definitions coresight: sink: Add TRBE driver coresight: etm-perf: Truncate the perf record if handle has no space coresight: etm-perf: Disable the path before capturing the trace data coresgith: etm-perf: Connect TRBE sink with ETE source dts: bindings: Document device tree binding for Arm TRBE
Suzuki K Poulose (5): coresight: etm-perf: Allow an event to use different sinks coresight: Do not scan for graph if none is present coresight: etm4x: Add support for PE OS lock coresight: ete: Add support for sysreg support coresight: ete: Detect ETE as one of the supported ETMs
.../devicetree/bindings/arm/coresight.txt | 3 + Documentation/devicetree/bindings/arm/trbe.txt | 20 + Documentation/trace/coresight/coresight-trbe.rst | 36 + arch/arm64/include/asm/sysreg.h | 51 ++ drivers/hwtracing/coresight/Kconfig | 11 + drivers/hwtracing/coresight/Makefile | 1 + drivers/hwtracing/coresight/coresight-etm-perf.c | 85 ++- drivers/hwtracing/coresight/coresight-etm-perf.h | 4 + drivers/hwtracing/coresight/coresight-etm4x-core.c | 144 +++- drivers/hwtracing/coresight/coresight-etm4x.h | 64 +- drivers/hwtracing/coresight/coresight-platform.c | 9 +- drivers/hwtracing/coresight/coresight-trbe.c | 768 +++++++++++++++++++++ drivers/hwtracing/coresight/coresight-trbe.h | 525 ++++++++++++++ include/linux/coresight.h | 2 + 14 files changed, 1680 insertions(+), 43 deletions(-) create mode 100644 Documentation/devicetree/bindings/arm/trbe.txt create mode 100644 Documentation/trace/coresight/coresight-trbe.rst create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
-- 2.7.4
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
Hello Mike,
On 11/16/20 8:30 PM, Mike Leach wrote:
Hi Anshuman,
I've not looked in detail at this set yet, but having skimmed through it I do have an initial question about the handling of wrapped data buffers.
With the ETR/ETB we found an issue with the way perf concatenated data captured from the hardware buffer into a single contiguous data block. The issue occurs when a wrapped buffer appears after another buffer in the data file. In a typical session perf would stop trace and copy the hardware buffer multiple times into the auxtrace buffer.
The hardware buffer and perf aux trace buffer are the same for TRBE and hence there is no actual copy involved. Trace data gets pushed into the user space via perf_aux_output_end() either via etm_event_stop() or via the IRQ handler i.e arm_trbe_irq_handler(). Data transfer to user space happens via updates to perf aux buffer indices i.e head, tail, wake up. But logically, they will appear as a stream of records to the user space while parsing perf.data file.
e.g.
For ETR/ETB we have a fixed length hardware data buffer - and no way of detecting buffer wraps using interrupts as the tracing is in progress.
TRBE has an interrupt. Hence there will be an opportunity to insert any additional packets if required to demarcate pre and post IRQ trace data streams.
If the buffer is not full at the point that perf transfers it then the data will look like this:-
- <async><synced trace data>
easy to decode, we can see the async at the start of the data - which would be the async issued at the start of trace.
Just curious, what makes the tracer to generate the <async> trace packet. Is there an explicit instruction or that is how the tracer starts when enabled ?
If the buffer wraps we see this:-
- <unsynced trace data><async><synced trace data>
Again no real issue, the decoder will skip to the async and trace from there - we lose the unsynced data.
Could you please elaborate more on the difference between sync and async trace data ?
Now the problem occurs when multiple transfers of data occur. We can see the following appearing as contiguous trace in the auxtrace buffer:-
- < async><synced trace data><unsynced trace data><async><synced trace data>
So there is an wrap around event between <synced trace data> and <unsynced trace data> ? Are there any other situations where this might happen ?
Now the decoder cannot spot the point that the synced data from the first capture ends, and the unsynced data from the second capture begins.
Got it.
This means it will continue to decode into the unsynced data - which will result in incorrect trace / outright errors. To get round this for ETR/ETB the driver will insert barrier packets into the datafile if a wrap event is detected.
But you mentioned there are on IRQs on ETR/ETB. So how the wrap event is even detected ?
- <async><synced trace data><barrier><unsynced trace
data><async><synced trace data>
This <barrier> has the effect of resetting the decoder into the unsynced state so that the invalid trace is not decoded. This is a workaround we have to do to handle the limitations of the ETR / ETB trace hardware.
Got it.
For TRBE we do have interrupts, so it should be possible to prevent the buffer wrapping in most cases - but I did see in the code that there are handlers for the TRBE buffer wrap management event. Are there other factors in play that will prevent data pattern 3) from appearing in the auxtrace buffer ?
On TRBE, the buffer wrapping cannot happen without generating an IRQ. I would assume that ETE will then start again with an <async> data packet first when the handler returns. Otherwise we might also have to insert a similar barrier packet for the user space tool to reset. As trace data should not get lost during an wrap event, ETE should complete the packet after the handler returns, hence aux buffer should still have logically contiguous stream of <synced trace data> to decode. I am not sure right now, but will look into this.
- Anshuman
Hi Anshuman,
On Mon, 23 Nov 2020 at 03:40, Anshuman Khandual anshuman.khandual@arm.com wrote:
Hello Mike,
On 11/16/20 8:30 PM, Mike Leach wrote:
Hi Anshuman,
I've not looked in detail at this set yet, but having skimmed through it I do have an initial question about the handling of wrapped data buffers.
With the ETR/ETB we found an issue with the way perf concatenated data captured from the hardware buffer into a single contiguous data block. The issue occurs when a wrapped buffer appears after another buffer in the data file. In a typical session perf would stop trace and copy the hardware buffer multiple times into the auxtrace buffer.
The hardware buffer and perf aux trace buffer are the same for TRBE and hence there is no actual copy involved. Trace data gets pushed into the user space via perf_aux_output_end() either via etm_event_stop() or via the IRQ handler i.e arm_trbe_irq_handler(). Data transfer to user space happens via updates to perf aux buffer indices i.e head, tail, wake up. But logically, they will appear as a stream of records to the user space while parsing perf.data file.
Understood - I suspected this would use direct write to the aux trace buffer, but the principle is the same. TRBE determines the location of data in the buffer so even without a copy, it is possible to get multiple TRBE "buffers" in the auxbuffer as the TRBE is stopped and restarted. The later copy to userspace is independent of this.
e.g.
For ETR/ETB we have a fixed length hardware data buffer - and no way of detecting buffer wraps using interrupts as the tracing is in progress.
TRBE has an interrupt. Hence there will be an opportunity to insert any additional packets if required to demarcate pre and post IRQ trace data streams.
If the buffer is not full at the point that perf transfers it then the data will look like this:-
- <async><synced trace data>
easy to decode, we can see the async at the start of the data - which would be the async issued at the start of trace.
Just curious, what makes the tracer to generate the <async> trace packet. Is there an explicit instruction or that is how the tracer starts when enabled ?
ETM / ETE will generate an async at the start of trace, and then periodically afterwards.
If the buffer wraps we see this:-
- <unsynced trace data><async><synced trace data>
Again no real issue, the decoder will skip to the async and trace from there - we lose the unsynced data.
Could you please elaborate more on the difference between sync and async trace data ?
The decoder will start reading trace from the start of the buffer. Unsynced trace is trace data that appears before the first async packet. We cannot decode this as we do not know where the packet boundaries are. Synced trace is any data after the first async packet - the async enables us to determine where the packet boundaries are so we can now determine the packets and decode the trace.
For an unwrapped buffer, we always see the first async that the ETE generated when the trace generation was started. In a wrapped buffer we search till we find an async generated as part of the periodic async packets.
Now the problem occurs when multiple transfers of data occur. We can see the following appearing as contiguous trace in the auxtrace buffer:-
- < async><synced trace data><unsynced trace data><async><synced trace data>
So there is an wrap around event between <synced trace data> and <unsynced trace data> ? Are there any other situations where this might happen ?
Not that I am aware of.
Now the decoder cannot spot the point that the synced data from the first capture ends, and the unsynced data from the second capture begins.
Got it.
This means it will continue to decode into the unsynced data - which will result in incorrect trace / outright errors. To get round this for ETR/ETB the driver will insert barrier packets into the datafile if a wrap event is detected.
But you mentioned there are on IRQs on ETR/ETB. So how the wrap event is even detected ?
A bit in the status register tells us the buffer is full - i.e. the write pointer has wrapped around to the location it started at. We cannot tell how far, or if multiple wraps have occurred, just that the event has occurred.
- <async><synced trace data><barrier><unsynced trace
data><async><synced trace data>
This <barrier> has the effect of resetting the decoder into the unsynced state so that the invalid trace is not decoded. This is a workaround we have to do to handle the limitations of the ETR / ETB trace hardware.
Got it.
For TRBE we do have interrupts, so it should be possible to prevent the buffer wrapping in most cases - but I did see in the code that there are handlers for the TRBE buffer wrap management event. Are there other factors in play that will prevent data pattern 3) from appearing in the auxtrace buffer ?
On TRBE, the buffer wrapping cannot happen without generating an IRQ. I would assume that ETE will then start again with an <async> data packet first when the handler returns.
This would only occur if the ETE was stopped and flushed prior to the wrap event. Does this happen? I am assuming that the sink is independent from the ETE, as ETM are from ETR.
Otherwise we might also have to insert a similar barrier packet for the user space tool to reset. As trace data should not get lost during an wrap event,
My understanding is that if a wrap has even occurred, then data is already lost.
ETE should complete the packet after the handler returns, hence aux buffer should still have logically contiguous stream of <synced trace data> to decode. I am not sure right now, but will look into this.
So you are relying on backpressure to stop ETE emitting packets? This could result in trace being lost due to overflow if the IRQ is not handled sufficiently quickly/.
Regards
Mike
- Anshuman
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK
Hello Tingwei,
On 11/14/20 10:47 AM, Tingwei Zhang wrote:
Hi Anshuman,
On Tue, Nov 10, 2020 at 08:44:58PM +0800, Anshuman Khandual wrote:
This series enables future IP trace features Embedded Trace Extension (ETE) and Trace Buffer Extension (TRBE). This series depends on the ETM system register instruction support series [0] and the v8.4 Self hosted tracing support series (Jonathan Zhou) [1]. The tree is available here [2] for quick access.
ETE is the PE (CPU) trace unit for CPUs, implementing future architecture extensions. ETE overlaps with the ETMv4 architecture, with additions to support the newer architecture features and some restrictions on the supported features w.r.t ETMv4. The ETE support is added by extending the ETMv4 driver to recognise the ETE and handle the features as exposed by the TRCIDRx registers. ETE only supports system instructions access from the host CPU. The ETE could be integrated with a TRBE (see below), or with the legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware description as the ETMs and requires a node per instance.
Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is accessible via the system registers and can be combined with the ETE to provide a 1x1 configuration of source & sink. TRBE is being represented here as a CoreSight sink. Primary reason is that the ETE source could work with other traditional CoreSight sink devices. As TRBE captures the trace data which is produced by ETE, it cannot work alone.
TRBE representation here have some distinct deviations from a traditional CoreSight sink device. Coresight path between ETE and TRBE are not built during boot looking at respective DT or ACPI entries. Instead TRBE gets checked on each available CPU, when found gets connected with respective ETE source device on the same CPU, after altering its outward connections. ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE coupling/decoupling method implemented here is not optimal and would be reworked later on.
Only perf mode is supported in TRBE in current path. Will you consider support sysfs mode as well in following patch sets?
Yes, either in subsequent versions or later on, after first getting the perf based functionality enabled. Nonetheless, sysfs is also on the todo list as mentioned in the cover letter.
- Anshuman