On Mon, 03 Nov 2025 15:06:20 +0800, Jie Gan wrote:
> Enable CTCU device for QCS8300 platform. Add a fallback mechnasim in binding to utilize
> the compitable of the SA8775p platform becuase the CTCU for QCS8300 shares same
> configurations as SA8775p platform.
>
> Changes in V4:
> 1. dtsi file has been renamed from qcs8300.dtsi -> monaco.dtsi
> Link to V3 - https://lore.kernel.org/all/20251013-enable-ctcu-for-qcs8300-v3-0-611e6e0d3…
>
> [...]
Applied, thanks!
[1/2] dt-bindings: arm: add CTCU device for monaco
https://git.kernel.org/coresight/c/51cd1fb70e08
Best regards,
--
Suzuki K Poulose <suzuki.poulose(a)arm.com>
Hi,
On Fri, Dec 19, 2025 at 10:39:49AM +0800, Ma Ke wrote:
[...]
> From the discussion, I note two possible fix directions:
>
> 1. Release the initial reference in etm_setup_aux() (current v2 patch)
> 2. Modify the behavior of coresight_get_sink_by_id() itself so it
> doesn't increase the reference count.
The option 2 is the right way to go.
> To ensure the correctness of the v3 patch, I'd like to confirm which
> patch is preferred. If option 2 is the consensus, I'm happy to modify
> the implementation of coresight_get_sink_by_id() as suggested.
It is good to use a separate patch to fix
coresight_find_device_by_fwnode() mentioned by James:
diff --git a/drivers/hwtracing/coresight/coresight-platform.c b/drivers/hwtracing/coresight/coresight-platform.c
index 0db64c5f4995..2b34f818ba88 100644
--- a/drivers/hwtracing/coresight/coresight-platform.c
+++ b/drivers/hwtracing/coresight/coresight-platform.c
@@ -107,14 +107,16 @@ coresight_find_device_by_fwnode(struct fwnode_handle *fwnode)
* platform bus.
*/
dev = bus_find_device_by_fwnode(&platform_bus_type, fwnode);
- if (dev)
- return dev;
/*
* We have a configurable component - circle through the AMBA bus
* looking for the device that matches the endpoint node.
*/
- return bus_find_device_by_fwnode(&amba_bustype, fwnode);
+ if (!dev)
+ dev = bus_find_device_by_fwnode(&amba_bustype, fwnode);
+
+ put_device(dev);
+ return dev;
}
/*
@@ -274,7 +276,6 @@ static int of_coresight_parse_endpoint(struct device *dev,
of_node_put(rparent);
of_node_put(rep);
- put_device(rdev);
return ret;
}
Thanks for working on this.
On 19/12/2025 09:08, Jie Gan wrote:
>
>
> On 11/3/2025 3:06 PM, Jie Gan wrote:
>> The CTCU device for monaco shares the same configurations as SA8775p. Add
>> a fallback to enable the CTCU for monaco to utilize the compitable of the
>> SA8775p.
>>
>
> Gentle reminder.
I was under the assumption that this was going via msm tree ? Sorry, I
misunderstood. I can pull this in for v6.20
Suzuki
>
>> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
>> Acked-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
>> Reviewed-by: Bjorn Andersson <andersson(a)kernel.org>
>> Signed-off-by: Jie Gan <jie.gan(a)oss.qualcomm.com>
>> ---
>> Documentation/devicetree/bindings/arm/qcom,coresight-ctcu.yaml | 9 +
>> ++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/arm/qcom,coresight-
>> ctcu.yaml b/Documentation/devicetree/bindings/arm/qcom,coresight-
>> ctcu.yaml
>> index c969c16c21ef..460f38ddbd73 100644
>> --- a/Documentation/devicetree/bindings/arm/qcom,coresight-ctcu.yaml
>> +++ b/Documentation/devicetree/bindings/arm/qcom,coresight-ctcu.yaml
>> @@ -26,8 +26,13 @@ description: |
>> properties:
>> compatible:
>> - enum:
>> - - qcom,sa8775p-ctcu
>> + oneOf:
>> + - items:
>> + - enum:
>> + - qcom,qcs8300-ctcu
>> + - const: qcom,sa8775p-ctcu
>> + - enum:
>> + - qcom,sa8775p-ctcu
>> reg:
>> maxItems: 1
>>
>
This patch series adds support for CoreSight components local to CPU clusters,
including funnel, replicator, and TMC, which reside within CPU cluster power
domains. These components require special handling due to power domain
constraints.
Unlike system-level CoreSight devices, these components share the CPU cluster's
power domain. When the cluster enters low-power mode (LPM), their registers
become inaccessible. Notably, `pm_runtime_get` alone cannot bring the cluster
out of LPM, making standard register access unreliable.
To address this, the series introduces:
- Identifying cluster-bound devices via a new `qcom,cpu-bound-components`
device tree property.
- Implementing deferred probing: if associated CPUs are offline during
probe, initialization is deferred until a CPU hotplug notifier detects
the CPU coming online.
- Utilizing `smp_call_function_single()` to ensure register accesses
(initialization, enablement, sysfs reads) are always executed on a
powered CPU within the target cluster.
- Extending the CoreSight link `enable` callback to pass the `cs_mode`.
This allows drivers to distinguish between SysFS and Perf modes and
apply mode-specific logic.
Jie Gan (1):
arm64: dts: qcom: hamoa: add Coresight nodes for APSS debug block
Yuanfang Zhang (11):
dt-bindings: arm: coresight: Add 'qcom,cpu-bound-components' property
coresight: Pass trace mode to link enable callback
coresight-funnel: Support CPU cluster funnel initialization
coresight-funnel: Defer probe when associated CPUs are offline
coresight-replicator: Support CPU cluster replicator initialization
coresight-replicator: Defer probe when associated CPUs are offline
coresight-replicator: Update management interface for CPU-bound devices
coresight-tmc: Support probe and initialization for CPU cluster TMCs
coresight-tmc-etf: Refactor enable function for CPU cluster ETF support
coresight-tmc: Update management interface for CPU-bound TMCs
coresight-tmc: Defer probe when associated CPUs are offline
Verification:
This series has been verified on sm8750.
Test steps for delay probe:
1. limit the system to enable at most 6 CPU cores during boot.
2. echo 1 >/sys/bus/cpu/devices/cpu6/online.
3. check whether ETM6 and ETM7 have been probed.
Test steps for sysfs mode:
echo 1 >/sys/bus/coresight/devices/tmc_etf0/enable_sink
echo 1 >/sys/bus/coresight/devices/etm0/enable_source
echo 1 >/sys/bus/coresight/devices/etm6/enable_source
echo 0 >/sys/bus/coresight/devices/etm0/enable_source
echo 0 >/sys/bus/coresight/devicse/etm6/enable_source
echo 0 >/sys/bus/coresight/devices/tmc_etf0/enable_sink
echo 1 >/sys/bus/coresight/devices/tmc_etf1/enable_sink
echo 1 >/sys/bus/coresight/devcies/etm0/enable_source
cat /dev/tmc_etf1 >/tmp/etf1.bin
echo 0 >/sys/bus/coresight/devices/etm0/enable_source
echo 0 >/sys/bus/coresight/devices/tmc_etf1/enable_sink
echo 1 >/sys/bus/coresight/devices/tmc_etf2/enable_sink
echo 1 >/sys/bus/coresight/devices/etm6/enable_source
cat /dev/tmc_etf2 >/tmp/etf2.bin
echo 0 >/sys/bus/coresight/devices/etm6/enable_source
echo 0 >/sys/bus/coresight/devices/tmc_etf2/enable_sink
Test steps for sysfs node:
cat /sys/bus/coresight/devices/tmc_etf*/mgmt/*
cat /sys/bus/coresight/devices/funnel*/funnel_ctrl
cat /sys/bus/coresight/devices/replicator*/mgmt/*
Test steps for perf mode:
perf record -a -e cs_etm//k -- sleep 5
Signed-off-by: Yuanfang Zhang <yuanfang.zhang(a)oss.qualcomm.com>
---
Changes in v2:
- Use the qcom,cpu-bound-components device tree property to identify devices
bound to a cluster.
- Refactor commit message.
- Introduce a supported_cpus field in the drvdata structure to record the CPUs
that belong to the cluster where the local component resides.
- Link to v1: https://lore.kernel.org/r/20251027-cpu_cluster_component_pm-v1-0-31355ac588…
---
Jie Gan (1):
arm64: dts: qcom: hamoa: Add CoreSight nodes for APSS debug block
Yuanfang Zhang (11):
dt-bindings: arm: coresight: Add 'qcom,cpu-bound-components' property
coresight-funnel: Support CPU cluster funnel initialization
coresight-funnel: Defer probe when associated CPUs are offline
coresight-replicator: Support CPU cluster replicator initialization
coresight-replicator: Defer probe when associated CPUs are offline
coresight-replicator: Update management interface for CPU-bound devices
coresight-tmc: Support probe and initialization for CPU cluster TMCs
coresight-tmc-etf: Refactor enable function for CPU cluster ETF support
coresight-tmc: Update management interface for CPU-bound TMCs
coresight-tmc: Defer probe when associated CPUs are offline
coresight: Pass trace mode to link enable callback
.../bindings/arm/arm,coresight-dynamic-funnel.yaml | 5 +
.../arm/arm,coresight-dynamic-replicator.yaml | 5 +
.../devicetree/bindings/arm/arm,coresight-tmc.yaml | 5 +
arch/arm64/boot/dts/qcom/hamoa.dtsi | 926 +++++++++++++++++++++
arch/arm64/boot/dts/qcom/purwa.dtsi | 12 +
drivers/hwtracing/coresight/coresight-core.c | 7 +-
drivers/hwtracing/coresight/coresight-funnel.c | 258 +++++-
drivers/hwtracing/coresight/coresight-replicator.c | 341 +++++++-
drivers/hwtracing/coresight/coresight-tmc-core.c | 387 +++++++--
drivers/hwtracing/coresight/coresight-tmc-etf.c | 106 ++-
drivers/hwtracing/coresight/coresight-tmc.h | 10 +
drivers/hwtracing/coresight/coresight-tnoc.c | 3 +-
drivers/hwtracing/coresight/coresight-tpda.c | 3 +-
include/linux/coresight.h | 3 +-
14 files changed, 1902 insertions(+), 169 deletions(-)
---
base-commit: 008d3547aae5bc86fac3eda317489169c3fda112
change-id: 20251016-cpu_cluster_component_pm-ce518f510433
Best regards,
--
Yuanfang Zhang <yuanfang.zhang(a)oss.qualcomm.com>
On 19/12/2025 10:21, Sudeep Holla wrote:
> On Fri, Dec 19, 2025 at 10:13:14AM +0800, yuanfang zhang wrote:
>>
>>
>> On 12/18/2025 7:33 PM, Sudeep Holla wrote:
>>> On Thu, Dec 18, 2025 at 12:09:40AM -0800, Yuanfang Zhang wrote:
>>>> This patch series adds support for CoreSight components local to CPU clusters,
>>>> including funnel, replicator, and TMC, which reside within CPU cluster power
>>>> domains. These components require special handling due to power domain
>>>> constraints.
>>>>
>>>
>>> Could you clarify why PSCI-based power domains associated with clusters in
>>> domain-idle-states cannot address these requirements, given that PSCI CPU-idle
>>> OSI mode was originally intended to support them? My understanding of this
>>> patch series is that OSI mode is unable to do so, which, if accurate, appears
>>> to be a flaw that should be corrected.
>>
>> It is due to the particular characteristics of the CPU cluster power
>> domain.Runtime PM for CPU devices works little different, it is mostly used
>> to manage hierarchicalCPU topology (PSCI OSI mode) to talk with genpd
>> framework to manage the last CPU handling in cluster.
>
> That is indeed the intended design. Could you clarify which specific
> characteristics differentiate it here?
>
>> It doesn’t really send IPI to wakeup CPU device (It don’t have
>> .power_on/.power_off) callback implemented which gets invoked from
>> .runtime_resume callback. This behavior is aligned with the upstream Kernel.
>>
>
> I am quite lost here. Why is it necessary to wake up the CPU? If I understand
> correctly, all of this complexity is meant to ensure that the cluster power
> domain is enabled before any of the funnel registers are accessed. Is that
> correct?
>
> If so, and if the cluster domains are already defined as the power domains for
> these funnel devices, then they should be requested to power on automatically
> before any register access occurs. Is that not the case?
>
> What am I missing in this reasoning?
Exactly, this is what I am too. But then you get the "pre-formated
standard response" without answering our questions.
Suzuki
On 19/12/2025 10:04, Jie Gan wrote:
> From: Tao Zhang <tao.zhang(a)oss.qualcomm.com>
>
> The TPDA_SYNC counter tracks the number of bytes transferred from the
> aggregator. When this count reaches the value programmed in the
> TPDA_SYNCR register, an ASYNC request is triggered, allowing userspace
> tools to accurately parse each valid packet.
>
> Signed-off-by: Tao Zhang <tao.zhang(a)oss.qualcomm.com>
> Reviewed-by: James Clark <james.clark(a)linaro.org>
> Co-developed-by: Jie Gan <jie.gan(a)oss.qualcomm.com>
> Signed-off-by: Jie Gan <jie.gan(a)oss.qualcomm.com>
> ---
> drivers/hwtracing/coresight/coresight-tpda.c | 7 +++++++
> drivers/hwtracing/coresight/coresight-tpda.h | 5 +++++
> 2 files changed, 12 insertions(+)
>
> diff --git a/drivers/hwtracing/coresight/coresight-tpda.c b/drivers/hwtracing/coresight/coresight-tpda.c
> index d25a8bcfb3d4..d378ff8ad77d 100644
> --- a/drivers/hwtracing/coresight/coresight-tpda.c
> +++ b/drivers/hwtracing/coresight/coresight-tpda.c
> @@ -163,6 +163,13 @@ static void tpda_enable_pre_port(struct tpda_drvdata *drvdata)
> */
> if (drvdata->trig_flag_ts)
> writel_relaxed(0x0, drvdata->base + TPDA_FPID_CR);
> +
> + val = readl_relaxed(drvdata->base + TPDA_SYNCR);
> + /* Reset the mode ctrl */
> + val &= ~TPDA_SYNCR_MODE_CTRL;
> + /* Program the counter value for TPDA_SYNCR */
> + val |= TPDA_SYNCR_COUNTER_MASK;
Do we plan to change this value via sysfs ? If not whats the point of
clearing the field. Why not simply set it (as it is all 1s anyways).
> + writel_relaxed(val, drvdata->base + TPDA_SYNCR);
> }
>
> static int tpda_enable_port(struct tpda_drvdata *drvdata, int port)
> diff --git a/drivers/hwtracing/coresight/coresight-tpda.h b/drivers/hwtracing/coresight/coresight-tpda.h
> index 8a075cfbc3cc..97e2729c15c9 100644
> --- a/drivers/hwtracing/coresight/coresight-tpda.h
> +++ b/drivers/hwtracing/coresight/coresight-tpda.h
> @@ -9,6 +9,7 @@
> #define TPDA_CR (0x000)
> #define TPDA_Pn_CR(n) (0x004 + (n * 4))
> #define TPDA_FPID_CR (0x084)
> +#define TPDA_SYNCR (0x08C)
>
> /* Cross trigger Global (all ports) flush request bit */
> #define TPDA_CR_FLREQ BIT(0)
> @@ -38,6 +39,10 @@
> #define TPDA_Pn_CR_CMBSIZE GENMASK(7, 6)
> /* Aggregator port DSB data set element size bit */
> #define TPDA_Pn_CR_DSBSIZE BIT(8)
Newline to separate the defintions of different registers, please.
> +/* TPDA_SYNCR mode control bit */
> +#define TPDA_SYNCR_MODE_CTRL BIT(12)
> +/* TPDA_SYNCR counter mask */
> +#define TPDA_SYNCR_COUNTER_MASK GENMASK(11, 0)
>
> #define TPDA_MAX_INPORTS 32
>
Suzuki
>
On 19/12/2025 02:05, Jie Gan wrote:
>
>
> On 12/19/2025 7:19 AM, Suzuki K Poulose wrote:
>> On 18/12/2025 10:17, Krzysztof Kozlowski wrote:
>>> On 12/12/2025 02:12, Jie Gan wrote:
>>>>
>>>>
>>>> On 12/11/2025 9:37 PM, Rob Herring wrote:
>>>>> On Thu, Dec 11, 2025 at 02:10:44PM +0800, Jie Gan wrote:
>>>>>> Add an interrupt property to CTCU device. The interrupt will be
>>>>>> triggered
>>>>>> when the data size in the ETR buffer exceeds the threshold of the
>>>>>> BYTECNTRVAL register. Programming a threshold in the BYTECNTRVAL
>>>>>> register
>>>>>> of CTCU device will enable the interrupt.
>>>>>>
>>>>>> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
>>>>>> Reviewed-by: Mike Leach <mike.leach(a)linaro.org>
>>>>>> Signed-off-by: Jie Gan <jie.gan(a)oss.qualcomm.com>
>>>>>> ---
>>>>>> .../devicetree/bindings/arm/qcom,coresight-ctcu.yaml | 17 ++
>>>>>> + ++++++++++++++
>>>>>> 1 file changed, 17 insertions(+)
>>>>>>
>>>>>> diff --git a/Documentation/devicetree/bindings/arm/qcom,coresight-
>>>>>> ctcu.yaml b/Documentation/devicetree/bindings/arm/qcom,coresight-
>>>>>> ctcu.yaml
>>>>>> index c969c16c21ef..90f88cc6cd3e 100644
>>>>>> --- a/Documentation/devicetree/bindings/arm/qcom,coresight-ctcu.yaml
>>>>>> +++ b/Documentation/devicetree/bindings/arm/qcom,coresight-ctcu.yaml
>>>>>> @@ -39,6 +39,16 @@ properties:
>>>>>> items:
>>>>>> - const: apb
>>>>>> + interrupts:
>>>>>> + items:
>>>>>> + - description: Byte cntr interrupt for the first etr device
>>>>>> + - description: Byte cntr interrupt for the second etr device
>>
>> This is really vague. How do you define first vs second ? Probe order ?
>> No way. This must be the "port" number to which the ETR is connected
>> to the CTCU. IIUC, there is a config area for each ETR (e.g., trace id
>> filter) connected to the CTCU. I was under the assumption that they
>> are identified as "ports" (input ports). I don't really understand how
>> this interrupt mapping works now. Please explain it clearly.
>>
>
> Sorry for the misunderstanding.
>
> Each ETR device should have its own interrupt line and an IRQ register
> within the CTCU device, as defined by the specification. In existing
> projects, the maximum supported number of ETR devices is 2.
>
> Each interrupt is directly mapped to a specific ETR device, for example:
> tmc@1000 → interrupt line 0
> tmc@1001 → interrupt line 1
>
> The suggestion to identify devices by ‘ports’ is much clearer than my
> previous explanation, as it explicitly shows which device is connected
> to which port.
Thanks for confirming.
>
>>>>>> +
>>>>>> + interrupt-names:
>>>>>> + items:
>>>>>> + - const: etrirq0
>>>>>> + - const: etrirq1
>>>>>
>>>>> Names are kind of pointless when it is just foo<index>.
>>>>
>>>> Hi Rob,
>>>>
>>>> I was naming them as etr0/etr1. Are these names acceptable?
>>>
>>> Obviously irq is redundant, but how does etr0 solves the problem of
>>> calling it foo0?
>>>
>>> I don't think you really read Rob's comment.
>>>
>>>> The interrupts are assigned exclusively to a specific ETR device.
>>>>
>>>> But Suzuki is concerned that this might cause confusion because the ETR
>>>> device is named randomly in the driver. Suzuki suggested using ‘port-0’
>>>> and ‘port-1’ and would also like to hear your feedback on these names.
>>>
>>> There is no confusion here. Writing bindings luckily clarifies this what
>>> the indices in the array mean.
>>
>> The point is there are "n" interrupts. Question is, could there be more
>> devices(ETRs) connected to the CTCU than "n".
>>
>> e.g., Lets CTCU can control upto 4 ETRs and on a particular system, the
>>
>> TMC-ETR0 -> CTCU-Port0
>>
>> TMC-ETR1 -> CTCU-Port2
>> TMC-ETR2 -> CTCU-Port3
>>
>> Now, how many interrupts are described in the DT ? How do we map which
>> interrupts correspond to the CTCU-Portn. (Finding the TMC-ETRx back
>> from the port is possible, with the topology).
>>
>
> Got your point and it's much clearer.
>
>> This is what I raised in the previous version. Again, happy to hear
>> if there is a standard way to describe the interrupts.
>>
>> Suzuki
>>
>>
>>>
>>>>
>>>> Usually, the probe sequence follows the order of the addresses. In our
>>>> specification, ‘ETR0’ is always probed before ‘ETR1’ because its
>>>> address
>>>> is lower.
>>>
>>> How is this even relevant? You are answering to something completely
>>> different, so I don't think you really tried to understand review.
>>>
>
> My previous explanation was definitely unclear. As Suzuki suggested,
> mapping the interrupt to the port number (to identify the relevant
> device based on topology) makes sense and provides a much easier way to
> understand the relationship between the interrupt and the ETR device.
>
> So with the suggestion, here is the new description about the interrupts:
>
> interrupts:
> items:
> - description: Interrupt for the ETR device connected to in-port0.
> - description: Interrupt for the ETR device connected to in-port1.
>
> interrupt-names:
> items:
> - const: port0
> - const: port1
Which brings us back to the question I posted in the previous version.
Do we really need a "name" or are there other ways to define, a sparse
list of interrupts ?
Suzuki
>
> Thanks,
> Jie
>
>>>
>>>
>>> Best regards,
>>> Krzysztof
>>
>
On 12/18/2025 7:33 PM, Sudeep Holla wrote:
> On Thu, Dec 18, 2025 at 12:09:40AM -0800, Yuanfang Zhang wrote:
>> This patch series adds support for CoreSight components local to CPU clusters,
>> including funnel, replicator, and TMC, which reside within CPU cluster power
>> domains. These components require special handling due to power domain
>> constraints.
>>
>
> Could you clarify why PSCI-based power domains associated with clusters in
> domain-idle-states cannot address these requirements, given that PSCI CPU-idle
> OSI mode was originally intended to support them? My understanding of this
> patch series is that OSI mode is unable to do so, which, if accurate, appears
> to be a flaw that should be corrected.
It is due to the particular characteristics of the CPU cluster power domain.Runtime PM for CPU devices works little different, it is mostly used to manage hierarchical
CPU topology (PSCI OSI mode) to talk with genpd framework to manage the last CPU handling in
cluster.
It doesn’t really send IPI to wakeup CPU device (It don’t have .power_on/.power_off) callback
implemented which gets invoked from .runtime_resume callback. This behavior is aligned with
the upstream Kernel.
>
>> Unlike system-level CoreSight devices, these components share the CPU cluster's
>> power domain. When the cluster enters low-power mode (LPM), their registers
>> become inaccessible. Notably, `pm_runtime_get` alone cannot bring the cluster
>> out of LPM, making standard register access unreliable.
>>
>
> Are these devices the only ones on the system that are uniquely bound to
> cluster-level power domains? If not, what additional devices share this
> dependency so that we can understand how they are managed in comparison?
>
Yes, devices like ETM and TRBE also share this power domain and access constraint.
Their drivers naturally handle enablement/disablement on the specific CPU they
belong to (e.g., via hotplug callbacks or existing smp_call_function paths).
>> To address this, the series introduces:
>> - Identifying cluster-bound devices via a new `qcom,cpu-bound-components`
>> device tree property.
>
> Really, no please.
>
Our objective is to determine which CoreSight components are physically locate
within the CPU cluster power domain.
Would it be acceptable to derive this relationship from the existing power-domains binding?
For example, if a Funnel or Replicator node is linked to a power-domains entry that specifies a cpumask,
the driver could recognize this shared dependency and automatically apply the appropriate cluster-aware behavior.
thanks,
yuanfang.