FEAT_TRF is a Coresight feature that allows trace capture to be
completely filtered at different exception levels, unlike the existing
TRCVICTLR controls which may still emit target addresses of branches,
even if the following trace is filtered.
Without FEAT_TRF, it was possible to start a trace session on a host and
also collect trace from the guest as TRCVICTLR was never programmed to
exclude guests (and it could still emit target addresses even if it
was).
With FEAT_TRF, the current behavior of trace in guests exists depends on
whether nVHE or VHE are being used. Both of the examples below are from
the host's point of view, as Coresight isn't accessible from guests.
This patchset is only relevant to when FEAT_TRF exists, otherwise there
is no change.
Current behavior:
nVHE/pKVM:
Because the host and the guest are both using TRFCR_EL1, trace will be
generated in guests depending on the same filter rules the host is
using. For example if the host is tracing userspace only, then guest
userspace trace will also be collected.
(This is further limited by whether TRBE is used because an issue
with TRBE means that it's completely disabled in nVHE guests, but it's
possible to have other tracing components.)
VHE:
With VHE, the host filters will be in TRFCR_EL2, but the filters in
TRFCR_EL1 will be active when the guest is running. Because we don't
write to TRFCR_EL1, guest trace will be completely disabled.
New behavior:
The guest filtering rules from the Perf session are now honored for both
nVHE and VHE modes. This is done by either writing to TRFCR_EL12 at the
start of the Perf session and doing nothing else further, or caching the
guest value and writing it at guest switch for nVHE. In pKVM, trace is
now be disabled for both protected and unprotected guests.
There is also an optimization where the Coresight drivers pass their
enabled state to KVM. This means in the common case KVM doesn't have to
touch any sysregs when the feature isn't in use.
Applies to 6.13-rc6
---
Changes since V9 [9]:
* Rebase onto 6.13-rc6 and pickup commit 38131c02a53f ("KVM: arm64:
Track presence of SPE/TRBE in kvm_host_data instead of vCPU") to get
host_data_set_flag(). Also requires two intertwined parent commits.
Changes since V8 [8]:
* Rename guest_trfcr_el1 -> trfcr_while_in_guest
* Rename GUEST_FILTER -> EL1_TRACING_CONFIGURED
* Rename kvm_set_trfcr() -> kvm_tracing_set_el1_configuration()
* #include ordering
* Reorder Coresight driver to remove need for preempt_disable()
to avoid the warning
* Force EL1_TRACING_CONFIGURED on for pKVM which drops an additional
special case but still disables trace
* Change set/clear trfcr to a single function that disables swapping
if it has the same value as the host
* Make the drain condition a bit clearer with __trace_needs_drain()
instead of host trfcr != 0 (Or checking individual E*TRE bits)
* Drain is only really required on switch to guest so move it there
* Only for pKVM, restore the original behavior for draining whenever
trbe is enabled. This prevents hypothetical case where a host has
the filters disabled but hasn't drained yet which we had by only
looking at host trfcr != 0
Changes since V7 [6]:
* Drop SPE changes
* Change the interface to be based on intent, i.e kvm_enable_trbe()
rather than passing the raw register value
* Drop change to re-use vcpu_flags mechanism in favour of [7]
* Simplify by using the same switch function to and from guest
Changes since V6 [5]:
* Implement a better "do nothing" case where both the SPE and Coresight
drivers give the enabled state to KVM, allowing some register
reads to be dropped.
* Move the state and feature flags out of the vCPU into the per-CPU
host_debug_state.
* Simplify the switch logic by adding a new flag HOST_STATE_SWAP_TRFCR
and only storing a single TRFCR value.
* Rename vcpu flag macros to a more generic kvm_flag...
Changes since V5 [4]:
* Sort new sysreg entries by encoding
* Add a comment about sorting arch/arm64/tools/sysreg
* Warn on preemptible() before calling smp_processor_id()
* Pickup tags
* Change TRFCR_EL2 from SysregFields to Sysreg because it was only
used once
Changes since V4 [3]:
* Remove all V3 changes that made it work in pKVM and just disable
trace there instead
* Restore PMU host/hyp state sharing back to how it was
(kvm_pmu_update_vcpu_events())
* Simplify some of the duplication in the comments and function docs
* Add a WARN_ON_ONCE() if kvm_etm_set_guest_trfcr() is called when
the trace filtering feature doesn't exist.
* Split sysreg change into a tools update followed by the new register
addition
Changes since V3:
* Create a new shared area to store the host state instead of copying
it before each VCPU run
* Drop commit that moved SPE and trace registers from host_debug_state
into the kvm sysregs array because the guest values were never used
* Document kvm_etm_set_guest_trfcr()
* Guard kvm_etm_set_guest_trfcr() with a feature check
* Drop Mark B and Suzuki's review tags on the sysreg patch because it
turned out that broke the Perf build and needed some unconventional
changes to fix it (as in: to update the tools copy of the headers in
the same commit as the kernel changes)
Changes since V2:
* Add a new iflag to signify presence of FEAT_TRF and keep the
existing TRBE iflag. This fixes the issue where TRBLIMITR_EL1 was
being accessed even if TRBE didn't exist
* Reword a commit message
Changes since V1:
* Squashed all the arm64/tools/sysreg changes into the first commit
* Add a new commit to move SPE and TRBE regs into the kvm sysreg array
* Add a comment above the TRFCR global that it's per host CPU rather
than vcpu
Changes since nVHE RFC [1]:
* Re-write just in terms of the register value to be written for the
host and the guest. This removes some logic from the hyp code and
a value of kvm_vcpu_arch:trfcr_el1 = 0 no longer means "don't
restore".
* Remove all the conditional compilation and new files.
* Change the kvm_etm_update_vcpu_events macro to a function.
* Re-use DEBUG_STATE_SAVE_TRFCR so iflags don't need to be expanded
anymore.
* Expand the cover letter.
Changes since VHE v3 [2]:
* Use the same interface as nVHE mode so TRFCR_EL12 is now written by
kvm.
[1]: https://lore.kernel.org/kvmarm/20230804101317.460697-1-james.clark@arm.com/
[2]: https://lore.kernel.org/kvmarm/20230905102117.2011094-1-james.clark@arm.com/
[3]: https://lore.kernel.org/linux-arm-kernel/20240104162714.1062610-1-james.cla…
[4]: https://lore.kernel.org/all/20240220100924.2761706-1-james.clark@arm.com/
[5]: https://lore.kernel.org/linux-arm-kernel/20240226113044.228403-1-james.clar…
[6]: https://lore.kernel.org/kvmarm/20241112103717.589952-1-james.clark@linaro.o…
[7]: https://lore.kernel.org/kvmarm/20241115224924.2132364-4-oliver.upton@linux.…
[8]: https://lore.kernel.org/linux-arm-kernel/20241127100130.1162639-1-james.cla…
[9]: https://lore.kernel.org/linux-arm-kernel/20250106142446.628923-1-james.clar…
James Clark (7):
arm64/sysreg: Add a comment that the sysreg file should be sorted
tools: arm64: Update sysreg.h header files
arm64/sysreg/tools: Move TRFCR definitions to sysreg
coresight: trbe: Remove redundant disable call
KVM: arm64: coresight: Give TRBE enabled state to KVM
KVM: arm64: Support trace filtering for guests
coresight: Pass guest TRFCR value to KVM
Oliver Upton (3):
KVM: arm64: Drop MDSCR_EL1_DEBUG_MASK
KVM: arm64: Get rid of __kvm_get_mdcr_el2() and related warts
KVM: arm64: Track presence of SPE/TRBE in kvm_host_data instead of
vCPU
arch/arm64/include/asm/kvm_asm.h | 5 +-
arch/arm64/include/asm/kvm_host.h | 37 +-
arch/arm64/include/asm/sysreg.h | 12 -
arch/arm64/kvm/arm.c | 5 +-
arch/arm64/kvm/debug.c | 94 ++--
arch/arm64/kvm/hyp/nvhe/debug-sr.c | 72 +--
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 6 -
arch/arm64/kvm/hyp/vhe/debug-sr.c | 5 -
arch/arm64/tools/sysreg | 38 ++
.../coresight/coresight-etm4x-core.c | 49 ++-
drivers/hwtracing/coresight/coresight-etm4x.h | 2 +-
drivers/hwtracing/coresight/coresight-priv.h | 3 +
.../coresight/coresight-self-hosted-trace.h | 9 -
drivers/hwtracing/coresight/coresight-trbe.c | 15 +-
tools/arch/arm64/include/asm/sysreg.h | 410 +++++++++++++++++-
tools/include/linux/kasan-tags.h | 15 +
16 files changed, 626 insertions(+), 151 deletions(-)
create mode 100644 tools/include/linux/kasan-tags.h
--
2.34.1
Change since V5:
1. Fix the &CPUn vs &cpun issue in device tree file.
Change since V4:
1. Use ^ete(-[0-9]+)?$ for the pattern of node name -- comments from Krzysztof Kozlowski <krzk(a)kernel.org>
2. Update commit message --- comments from Rob Herring <robh(a)kernel.org>
Change since V3:
1. Use ^ete-[0-9]+$ for the pattern of node name -- comments from Rob Herring
Change since V2:
1. Change the name in binding as 'ete'.
Change since V1:
1. Remove the pattern match of ETE node name.
2. Update the tmc-etr node name in DT.
Mao Jinlong (2):
dt-bindings: arm: coresight: Update the pattern of ete node name
arm64: dts: qcom: sm8450: Add coresight nodes
.../arm/arm,embedded-trace-extension.yaml | 6 +-
arch/arm64/boot/dts/qcom/sm8450.dtsi | 726 ++++++++++++++++++
2 files changed, 729 insertions(+), 3 deletions(-)
--
2.17.1
FEAT_TRF is a Coresight feature that allows trace capture to be
completely filtered at different exception levels, unlike the existing
TRCVICTLR controls which may still emit target addresses of branches,
even if the following trace is filtered.
Without FEAT_TRF, it was possible to start a trace session on a host and
also collect trace from the guest as TRCVICTLR was never programmed to
exclude guests (and it could still emit target addresses even if it
was).
With FEAT_TRF, the current behavior of trace in guests exists depends on
whether nVHE or VHE are being used. Both of the examples below are from
the host's point of view, as Coresight isn't accessible from guests.
This patchset is only relevant to when FEAT_TRF exists, otherwise there
is no change.
Current behavior:
nVHE/pKVM:
Because the host and the guest are both using TRFCR_EL1, trace will be
generated in guests depending on the same filter rules the host is
using. For example if the host is tracing userspace only, then guest
userspace trace will also be collected.
(This is further limited by whether TRBE is used because an issue
with TRBE means that it's completely disabled in nVHE guests, but it's
possible to have other tracing components.)
VHE:
With VHE, the host filters will be in TRFCR_EL2, but the filters in
TRFCR_EL1 will be active when the guest is running. Because we don't
write to TRFCR_EL1, guest trace will be completely disabled.
New behavior:
The guest filtering rules from the Perf session are now honored for both
nVHE and VHE modes. This is done by either writing to TRFCR_EL12 at the
start of the Perf session and doing nothing else further, or caching the
guest value and writing it at guest switch for nVHE. In pKVM, trace is
now be disabled for both protected and unprotected guests.
There is also an optimization where the Coresight drivers pass their
enabled state to KVM. This means in the common case KVM doesn't have to
touch any sysregs when the feature isn't in use.
Applies to kvmarm/next (00163be8bb59).
---
Changes since V8 [8]:
* Rename guest_trfcr_el1 -> trfcr_while_in_guest
* Rename GUEST_FILTER -> EL1_TRACING_CONFIGURED
* Rename kvm_set_trfcr() -> kvm_tracing_set_el1_configuration()
* #include ordering
* Reorder Coresight driver to remove need for preempt_disable()
to avoid the warning
* Force EL1_TRACING_CONFIGURED on for pKVM which drops an additional
special case but still disables trace
* Change set/clear trfcr to a single function that disables swapping
if it has the same value as the host
* Make the drain condition a bit clearer with __trace_needs_drain()
instead of host trfcr != 0 (Or checking individual E*TRE bits)
* Drain is only really required on switch to guest so move it there
* Only for pKVM, restore the original behavior for draining whenever
trbe is enabled. This prevents hypothetical case where a host has
the filters disabled but hasn't drained yet which we had by only
looking at host trfcr != 0
Changes since V7 [6]:
* Drop SPE changes
* Change the interface to be based on intent, i.e kvm_enable_trbe()
rather than passing the raw register value
* Drop change to re-use vcpu_flags mechanism in favour of [7]
* Simplify by using the same switch function to and from guest
Changes since V6 [5]:
* Implement a better "do nothing" case where both the SPE and Coresight
drivers give the enabled state to KVM, allowing some register
reads to be dropped.
* Move the state and feature flags out of the vCPU into the per-CPU
host_debug_state.
* Simplify the switch logic by adding a new flag HOST_STATE_SWAP_TRFCR
and only storing a single TRFCR value.
* Rename vcpu flag macros to a more generic kvm_flag...
Changes since V5 [4]:
* Sort new sysreg entries by encoding
* Add a comment about sorting arch/arm64/tools/sysreg
* Warn on preemptible() before calling smp_processor_id()
* Pickup tags
* Change TRFCR_EL2 from SysregFields to Sysreg because it was only
used once
Changes since V4 [3]:
* Remove all V3 changes that made it work in pKVM and just disable
trace there instead
* Restore PMU host/hyp state sharing back to how it was
(kvm_pmu_update_vcpu_events())
* Simplify some of the duplication in the comments and function docs
* Add a WARN_ON_ONCE() if kvm_etm_set_guest_trfcr() is called when
the trace filtering feature doesn't exist.
* Split sysreg change into a tools update followed by the new register
addition
Changes since V3:
* Create a new shared area to store the host state instead of copying
it before each VCPU run
* Drop commit that moved SPE and trace registers from host_debug_state
into the kvm sysregs array because the guest values were never used
* Document kvm_etm_set_guest_trfcr()
* Guard kvm_etm_set_guest_trfcr() with a feature check
* Drop Mark B and Suzuki's review tags on the sysreg patch because it
turned out that broke the Perf build and needed some unconventional
changes to fix it (as in: to update the tools copy of the headers in
the same commit as the kernel changes)
Changes since V2:
* Add a new iflag to signify presence of FEAT_TRF and keep the
existing TRBE iflag. This fixes the issue where TRBLIMITR_EL1 was
being accessed even if TRBE didn't exist
* Reword a commit message
Changes since V1:
* Squashed all the arm64/tools/sysreg changes into the first commit
* Add a new commit to move SPE and TRBE regs into the kvm sysreg array
* Add a comment above the TRFCR global that it's per host CPU rather
than vcpu
Changes since nVHE RFC [1]:
* Re-write just in terms of the register value to be written for the
host and the guest. This removes some logic from the hyp code and
a value of kvm_vcpu_arch:trfcr_el1 = 0 no longer means "don't
restore".
* Remove all the conditional compilation and new files.
* Change the kvm_etm_update_vcpu_events macro to a function.
* Re-use DEBUG_STATE_SAVE_TRFCR so iflags don't need to be expanded
anymore.
* Expand the cover letter.
Changes since VHE v3 [2]:
* Use the same interface as nVHE mode so TRFCR_EL12 is now written by
kvm.
[1]: https://lore.kernel.org/kvmarm/20230804101317.460697-1-james.clark@arm.com/
[2]: https://lore.kernel.org/kvmarm/20230905102117.2011094-1-james.clark@arm.com/
[3]: https://lore.kernel.org/linux-arm-kernel/20240104162714.1062610-1-james.cla…
[4]: https://lore.kernel.org/all/20240220100924.2761706-1-james.clark@arm.com/
[5]: https://lore.kernel.org/linux-arm-kernel/20240226113044.228403-1-james.clar…
[6]: https://lore.kernel.org/kvmarm/20241112103717.589952-1-james.clark@linaro.o…
[7]: https://lore.kernel.org/kvmarm/20241115224924.2132364-4-oliver.upton@linux.…
[8]: https://lore.kernel.org/linux-arm-kernel/20241127100130.1162639-1-james.cla…
James Clark (7):
arm64/sysreg: Add a comment that the sysreg file should be sorted
tools: arm64: Update sysreg.h header files
arm64/sysreg/tools: Move TRFCR definitions to sysreg
coresight: trbe: Remove redundant disable call
KVM: arm64: coresight: Give TRBE enabled state to KVM
KVM: arm64: Support trace filtering for guests
coresight: Pass guest TRFCR value to KVM
arch/arm64/include/asm/kvm_host.h | 11 +
arch/arm64/include/asm/sysreg.h | 12 -
arch/arm64/kvm/debug.c | 50 ++-
arch/arm64/kvm/hyp/nvhe/debug-sr.c | 63 +--
arch/arm64/tools/sysreg | 38 ++
.../coresight/coresight-etm4x-core.c | 49 ++-
drivers/hwtracing/coresight/coresight-etm4x.h | 2 +-
drivers/hwtracing/coresight/coresight-priv.h | 3 +
.../coresight/coresight-self-hosted-trace.h | 9 -
drivers/hwtracing/coresight/coresight-trbe.c | 15 +-
tools/arch/arm64/include/asm/sysreg.h | 410 +++++++++++++++++-
tools/include/linux/kasan-tags.h | 15 +
12 files changed, 599 insertions(+), 78 deletions(-)
create mode 100644 tools/include/linux/kasan-tags.h
--
2.34.1
Introduction of TPDM MCMB(Multi-lane Continuous Multi Bit) subunit
MCMB (Multi-lane CMB) is a special form of CMB dataset type. MCMB
subunit has the same number and usage of registers as CMB subunit.
Just like the CMB subunit, the MCMB subunit must be configured prior
to enablement. This series adds support for TPDM to configure the
MCMB subunit.
Once this series patches are applied properly, the new tpdm nodes for
should be observed at the tpdm path /sys/bus/coresight/devices/tpdm*
which supports MCMB subunit. All sysfs files of CMB subunit TPDM are
included in MCMB subunit TPDM. On this basis, MCMB subunit TPDM will
have new sysfs files to select and enable the lane.
Changes in V3:
1. Update the date in ABI file.
2. Remove the unrelated change.
3. Correct typo.
4. Move the CMB_CR related definitions together.
Changes in V2:
1. Use tdpm_data->cmb instead of (tpdm_has_cmb_dataset(tpdm_data) ||
tpdm_has_mcmb_dataset(tpdm_data)) for cmb dataset support.
2. Embed mcmb_dataset struct into cmb struct.
3. Update the date and version in sysfs-bus-coresight-devices-tpdm
Link: https://patchwork.kernel.org/project/linux-arm-msm/patch/20241105123940.396…
Mao Jinlong (1):
coresight-tpdm: Add MCMB dataset support
Tao Zhang (2):
coresight-tpdm: Add support to select lane
coresight-tpdm: Add support to enable the lane for MCMB TPDM
.../testing/sysfs-bus-coresight-devices-tpdm | 15 +++
drivers/hwtracing/coresight/coresight-tpda.c | 7 +-
drivers/hwtracing/coresight/coresight-tpdm.c | 120 +++++++++++++++++-
drivers/hwtracing/coresight/coresight-tpdm.h | 33 +++--
4 files changed, 155 insertions(+), 20 deletions(-)
--
2.17.1
On 24/12/2024 10:13 am, Yeoreum Yun wrote:
> Hi James.
>>> diff --git a/drivers/hwtracing/coresight/coresight-syscfg.c b/drivers/hwtracing/coresight/coresight-syscfg.c
>>> index a70c1454b410..dfa7dcbaf25d 100644
>>> --- a/drivers/hwtracing/coresight/coresight-syscfg.c
>>> +++ b/drivers/hwtracing/coresight/coresight-syscfg.c
>>> @@ -953,7 +953,8 @@ int cscfg_config_sysfs_activate(struct cscfg_config_desc *config_desc, bool acti
>>> cscfg_mgr->sysfs_active_config = cfg_hash;
>>> } else {
>>> /* disable if matching current value */
>>> - if (cscfg_mgr->sysfs_active_config == cfg_hash) {
>>> + if (cscfg_mgr->sysfs_active_config == cfg_hash &&
>>> + !atomic_read(&cscfg_mgr->sys_enable_cnt)) {
>>> _cscfg_deactivate_config(cfg_hash);
>>
>> So is sys_enable_cnt a global value? If a fix is needed doesn't it need to
>> be a per-config refcount?
>>
>> Say you have two active configs, sys_enable_cnt is now 2, how do you disable
>> one without it always skipping here when the other config is enabled?
>
> Sorry to miss this one!.
> Because when one configuration is enabled,
> cscfg_mgr->sysfs_active_config becomes !NULL, so it wouldn't happen
> there is no two activate configurations. so sys_enable_cnt wouldn't be
> 2.
>
>
>
Maybe "sys_enabled" is a better name then. Count implies that it can be
more than one. And the doc could be updated to say it's only ever 0 or 1.
But what about my other point about enabled always being a subset of
active? Can we not change "sys_active_cnt" to a more generic "refcount",
then both activation and enabling steps increment that same refcount,
because they are both technically users of the config. Then you can
solve the problem without adding another separate counter. I think
that's potentially easier to understand.
Although the easiest is just locking every function with the mutex (or a
spinlock if it also needs to be used from Perf). Obviously all these
atomics are harder to get right than that, and this isn't performance
sensitive in any way.
On 18/12/2024 8:48 am, Yeoreum Yun wrote:
> While enable active config via cscfg_csdev_enable_active_config(),
> active config could be deactivated via configfs' sysfs interface.
> This could make UAF issue in below scenario:
>
> CPU0 CPU1
> (perf or sysfs enable) load module
> cscfg_load_config_sets()
> activate config. // sysfs
> (sys_active_cnt == 1)
> ...
> cscfg_csdev_enable_active_config()
> lock(csdev->cscfg_csdev_lock)
> // here load config activate by CPU1
> unlock(csdev->cscfg_csdev_lock)
>
> deactivate config // sysfs
> (sys_activec_cnt == 0)
Assuming the left side does Perf, are there some steps missing? To get
to enable_active_config() you first need to pass through etm_setup_aux()
-> cscfg_activate_config(). That would also increment sys_active_cnt
which would leave it at 2 if there were two concurrent sessions. Then it
would end up as 1 here after deactivate, rather than 0.
It's not explicitly mentioned in the sequence but I'm assuming the left
and right are the same config, but I suppose it could be an issue with
different configs too.
> cscfg_unload_config_sets()
> unload module
On the left cscfg_activate_config() also bumps the module refcount, so
unload wouldn't cause a UAF here as far as I can see.
>
> // access to config_desc which freed
> // while unloading module.
> cfs_csdev_enable_config
>
> To address this, introduce sys_enable_cnt in cscfg_mgr to prevent
> deactivate while there is enabled configuration.
>
> Signed-off-by: Yeoreum Yun <yeoreum.yun(a)arm.com>
> ---
> .../hwtracing/coresight/coresight-etm4x-core.c | 3 +++
> drivers/hwtracing/coresight/coresight-syscfg.c | 18 ++++++++++++++++--
> drivers/hwtracing/coresight/coresight-syscfg.h | 2 ++
> 3 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> index 86893115df17..6218ef40acbc 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> @@ -986,6 +986,9 @@ static void etm4_disable_sysfs(struct coresight_device *csdev)
> smp_call_function_single(drvdata->cpu, etm4_disable_hw, drvdata, 1);
>
> raw_spin_unlock(&drvdata->spinlock);
> +
> + cscfg_csdev_disable_active_config(csdev);
> +
> cpus_read_unlock();
>
> /*
> diff --git a/drivers/hwtracing/coresight/coresight-syscfg.c b/drivers/hwtracing/coresight/coresight-syscfg.c
> index a70c1454b410..dfa7dcbaf25d 100644
> --- a/drivers/hwtracing/coresight/coresight-syscfg.c
> +++ b/drivers/hwtracing/coresight/coresight-syscfg.c
> @@ -953,7 +953,8 @@ int cscfg_config_sysfs_activate(struct cscfg_config_desc *config_desc, bool acti
> cscfg_mgr->sysfs_active_config = cfg_hash;
> } else {
> /* disable if matching current value */
> - if (cscfg_mgr->sysfs_active_config == cfg_hash) {
> + if (cscfg_mgr->sysfs_active_config == cfg_hash &&
> + !atomic_read(&cscfg_mgr->sys_enable_cnt)) {
> _cscfg_deactivate_config(cfg_hash);
So is sys_enable_cnt a global value? If a fix is needed doesn't it need
to be a per-config refcount?
Say you have two active configs, sys_enable_cnt is now 2, how do you
disable one without it always skipping here when the other config is
enabled?
> cscfg_mgr->sysfs_active_config = 0;
> } else
> @@ -1055,6 +1056,12 @@ int cscfg_csdev_enable_active_config(struct coresight_device *csdev,
> if (!atomic_read(&cscfg_mgr->sys_active_cnt))
> return 0;
>
> + /*
> + * increment sys_enable_cnt first to prevent deactivate the config
> + * while enable active config.
> + */
> + atomic_inc(&cscfg_mgr->sys_enable_cnt);
> +
> /*
> * Look for matching configuration - set the active configuration
> * context if found.
> @@ -1098,6 +1105,10 @@ int cscfg_csdev_enable_active_config(struct coresight_device *csdev,
> raw_spin_unlock_irqrestore(&csdev->cscfg_csdev_lock, flags);
> }
> }
> +
> + if (!config_csdev_active || err)
> + atomic_dec(&cscfg_mgr->sys_enable_cnt);
> +
> return err;
> }
> EXPORT_SYMBOL_GPL(cscfg_csdev_enable_active_config);
> @@ -1129,8 +1140,10 @@ void cscfg_csdev_disable_active_config(struct coresight_device *csdev)
> if (config_csdev) {
> if (!config_csdev->enabled)
> config_csdev = NULL;
> - else
> + else {
> config_csdev->enabled = false;
> + atomic_dec(&cscfg_mgr->sys_enable_cnt);
> + }
> }
> csdev->active_cscfg_ctxt = NULL;
> raw_spin_unlock_irqrestore(&csdev->cscfg_csdev_lock, flags);
> @@ -1179,6 +1192,7 @@ static int cscfg_create_device(void)
> INIT_LIST_HEAD(&cscfg_mgr->config_desc_list);
> INIT_LIST_HEAD(&cscfg_mgr->load_order_list);
> atomic_set(&cscfg_mgr->sys_active_cnt, 0);
> + atomic_set(&cscfg_mgr->sys_enable_cnt, 0);
> cscfg_mgr->load_state = CSCFG_NONE;
>
> /* setup the device */
> diff --git a/drivers/hwtracing/coresight/coresight-syscfg.h b/drivers/hwtracing/coresight/coresight-syscfg.h
> index 66e2db890d82..2fc397919985 100644
> --- a/drivers/hwtracing/coresight/coresight-syscfg.h
> +++ b/drivers/hwtracing/coresight/coresight-syscfg.h
> @@ -38,6 +38,7 @@ enum cscfg_load_ops {
> * @config_desc_list: List of system configuration descriptors to load into registered devices.
> * @load_order_list: Ordered list of owners for dynamically loaded configurations.
> * @sys_active_cnt: Total number of active config descriptor references.
> + * @sys_enable_cnt: Total number of enable of active config descriptor references.
When these are next to each other it makes me wonder why active_cnt
isn't enough to prevent unloading? Enabled is always a subset of active,
so as long as you gate unloads or modifications on the existing active
count it seems fine?
> * @cfgfs_subsys: configfs subsystem used to manage configurations.
> * @sysfs_active_config:Active config hash used if CoreSight controlled from sysfs.
> * @sysfs_active_preset:Active preset index used if CoreSight controlled from sysfs.
> @@ -50,6 +51,7 @@ struct cscfg_manager {
> struct list_head config_desc_list;
> struct list_head load_order_list;
> atomic_t sys_active_cnt;
> + atomic_t sys_enable_cnt;
> struct configfs_subsystem cfgfs_subsys;
> u32 sysfs_active_config;
> int sysfs_active_preset;
On 21/12/2024 11:54 am, Marc Zyngier wrote:
> On Fri, 20 Dec 2024 17:32:17 +0000,
> James Clark <james.clark(a)linaro.org> wrote:
>>
>>
>>
>> On 20/12/2024 5:05 pm, Marc Zyngier wrote:
>>> On Wed, 27 Nov 2024 10:01:23 +0000,
>>> James Clark <james.clark(a)linaro.org> wrote:
>>>>
>>>> Currently in nVHE, KVM has to check if TRBE is enabled on every guest
>>>> switch even if it was never used. Because it's a debug feature and is
>>>> more likely to not be used than used, give KVM the TRBE buffer status to
>>>> allow a much simpler and faster do-nothing path in the hyp.
>>>>
>>>> This is always called with preemption disabled except for probe/hotplug
>>>> which gets wrapped with preempt_disable().
>>>>
>>>> Protected mode disables trace regardless of TRBE (because
>>>> guest_trfcr_el1 is always 0), which was not previously done. HAS_TRBE
>>>> becomes redundant, but HAS_TRF is now required for this.
>>>>
>>>> Signed-off-by: James Clark <james.clark(a)linaro.org>
>>>> ---
>>>> arch/arm64/include/asm/kvm_host.h | 10 +++-
>>>> arch/arm64/kvm/debug.c | 25 ++++++++--
>>>> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 51 +++++++++++---------
>>>> drivers/hwtracing/coresight/coresight-trbe.c | 5 ++
>>>> 4 files changed, 65 insertions(+), 26 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>>> index 7e3478386351..ba251caa593b 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -611,7 +611,8 @@ struct cpu_sve_state {
>>>> */
>>>> struct kvm_host_data {
>>>> #define KVM_HOST_DATA_FLAG_HAS_SPE 0
>>>> -#define KVM_HOST_DATA_FLAG_HAS_TRBE 1
>>>> +#define KVM_HOST_DATA_FLAG_HAS_TRF 1
>>>> +#define KVM_HOST_DATA_FLAG_TRBE_ENABLED 2
>>>> unsigned long flags;
>>>> struct kvm_cpu_context host_ctxt;
>>>> @@ -657,6 +658,9 @@ struct kvm_host_data {
>>>> u64 mdcr_el2;
>>>> } host_debug_state;
>>>> + /* Guest trace filter value */
>>>> + u64 guest_trfcr_el1;
>>>
>>> Guest value? Or host state while running the guest? If the former,
>>> then this has nothing to do here. If the latter, this should be
>>> spelled out (trfcr_in_guest?), and the comment amended.
>>>
>>>> +
>>>> /* Number of programmable event counters (PMCR_EL0.N) for this CPU */
>>>> unsigned int nr_event_counters;
>>>> };
>>>> @@ -1381,6 +1385,8 @@ static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr)
>>>> void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr);
>>>> void kvm_clr_pmu_events(u64 clr);
>>>> bool kvm_set_pmuserenr(u64 val);
>>>> +void kvm_enable_trbe(void);
>>>> +void kvm_disable_trbe(void);
>>>> #else
>>>> static inline void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) {}
>>>> static inline void kvm_clr_pmu_events(u64 clr) {}
>>>> @@ -1388,6 +1394,8 @@ static inline bool kvm_set_pmuserenr(u64 val)
>>>> {
>>>> return false;
>>>> }
>>>> +static inline void kvm_enable_trbe(void) {}
>>>> +static inline void kvm_disable_trbe(void) {}
>>>> #endif
>>>> void kvm_vcpu_load_vhe(struct kvm_vcpu *vcpu);
>>>> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
>>>> index dd9e139dfd13..0c340ae7b5d1 100644
>>>> --- a/arch/arm64/kvm/debug.c
>>>> +++ b/arch/arm64/kvm/debug.c
>>>> @@ -314,7 +314,26 @@ void kvm_init_host_debug_data(void)
>>>> !(read_sysreg_s(SYS_PMBIDR_EL1) & PMBIDR_EL1_P))
>>>> host_data_set_flag(HAS_SPE);
>>>> - if (cpuid_feature_extract_unsigned_field(dfr0,
>>>> ID_AA64DFR0_EL1_TraceBuffer_SHIFT) &&
>>>> - !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_EL1_P))
>>>> - host_data_set_flag(HAS_TRBE);
>>>> + if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceFilt_SHIFT))
>>>> + host_data_set_flag(HAS_TRF);
>>>> }
>>>> +
>>>> +void kvm_enable_trbe(void)
>>>> +{
>>>> + if (has_vhe() || is_protected_kvm_enabled() ||
>>>> + WARN_ON_ONCE(preemptible()))
>>>> + return;
>>>> +
>>>> + host_data_set_flag(TRBE_ENABLED);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(kvm_enable_trbe);
>>>> +
>>>> +void kvm_disable_trbe(void)
>>>> +{
>>>> + if (has_vhe() || is_protected_kvm_enabled() ||
>>>> + WARN_ON_ONCE(preemptible()))
>>>> + return;
>>>> +
>>>> + host_data_clear_flag(TRBE_ENABLED);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(kvm_disable_trbe);
>>>> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>>> index 858bb38e273f..9479bee41801 100644
>>>> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>>> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>>> @@ -51,32 +51,39 @@ static void __debug_restore_spe(u64 pmscr_el1)
>>>> write_sysreg_el1(pmscr_el1, SYS_PMSCR);
>>>> }
>>>> -static void __debug_save_trace(u64 *trfcr_el1)
>>>> +static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
>>>> {
>>>> - *trfcr_el1 = 0;
>>>> + *saved_trfcr = read_sysreg_el1(SYS_TRFCR);
>>>> + write_sysreg_el1(new_trfcr, SYS_TRFCR);
>>>> - /* Check if the TRBE is enabled */
>>>> - if (!(read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E))
>>>> + /* No need to drain if going to an enabled state or from disabled state */
>>>> + if (new_trfcr || !*saved_trfcr)
>>>
>>> What if TRFCR_EL1.TS is set to something non-zero? I'd rather you
>>> check for the E*TRE bits instead of assuming things.
>>>
>>
>> Yeah it's probably better that way. TS is actually always set when any
>> tracing session starts and then never cleared, so doing it the simpler
>> way made it always flush even after tracing finished, which probably
>> wasn't great.
>
> Quite. Can you please *test* these things?
>
> [...]
>
Sorry to confuse things I wasn't 100% accurate here, yes it's tested and
working. It works because of the split set/clear_trfcr() API. The
Coresight driver specifically calls clear at the end of the session
rather than a set of 0. That signals this function not to be called so
there's no excessive swapping.
Secondly, the buffer flushing case is triggered by TRBE_ENABLED, which
forces TRFCR to 0, so "if (new_trfcr)" is an OK way to gate the flush.
>>>> @@ -253,8 +256,10 @@ static void trbe_drain_and_disable_local(struct trbe_cpudata *cpudata)
>>>> static void trbe_reset_local(struct trbe_cpudata *cpudata)
>>>> {
>>>> + preempt_disable();
>>>> trbe_drain_and_disable_local(cpudata);
>>>> write_sysreg_s(0, SYS_TRBLIMITR_EL1);
>>>> + preempt_enable();
>>>
>>> This looks terribly wrong. If you need to disable preemption here, why
>>> doesn't the critical section cover all register accesses? Surely you
>>> don't want to nuke another CPU's context?
>>>
>>> But looking at the calling sites, this makes even less sense. The two
>>> callers of this thing mess with *per-CPU* interrupts. Dealing with
>>> per-CPU interrupts in preemptible context is a big no-no (hint: they
>>> start with a call to smp_processor_id()).
>>>
>>> So what is this supposed to ensure?
>>>
>>> M.
>>>
>>
>> These ones are only intended to silence the
>> WARN_ON_ONCE(preemptible()) in kvm_enable_trbe() when this is called
>> from boot/hotplug (arm_trbe_enable_cpu()). Preemption isn't disabled,
>> but a guest can't run at that point either.
>>
>> The "real" calls to kvm_enable_trbe() _are_ called from an atomic
>> context. I think there was a previous review comment about when it was
>> safe to call the KVM parts of this change, which is why I added the
>> warning making sure it was always called with preemption disabled. But
>> actually I could remove the warning and these preempt_disables() and
>> replace them with a comment.
>
> You should keep the WARN_ON(), and either *never* end-up calling this
> stuff during a CPUHP event, or handle the fact that preemption isn't
> initialised yet. For example by checking whether the current CPU is
> online.
>
> But this sort of random spreading of preemption disabling is not an
> acceptable outcome.
>
> M.
>
I'll look into this again. This was my initial attempt but couldn't find
any easily accessible state that allowed to to be done this way. Maybe I
missed something, but the obvious cpu_online() etc were already true at
this point.
Thanks
James