Tor, Mike and all,
Here is something i'd like your opinion on...
Before programming the ETMv3/PTM, ETMCR:10 needs to be set to one and when enabling the tracer, the bit needs to be cleared. Each time the status of ETMSR:1 needs to be probed before moving on, something that is quite costly. Is there a official limit of time for this operation to be carried out?
The same question applies for ETB's FFCR:6 and FFSR:1.
At this time the driver wait for 100 usec before complaining - from your experience, this this too short or it may need more time?
Thanks, Mathieu
Hi,
Here's some advice from our CoreSight architect John Horley. tl;dr: it should be done by 10us. If it's not done then, something's stuck somewhere.
I would suggest printing a kernel message reporting the status of all sinks and sources (and funnel ports). I wonder if the problem here could be the "deadlocked flush scenario" that John describes. When disabling an ETM you should wait for it to drain and then disable the funnel port it's connected to.
Personally I've always found these status bits are set very quickly. I.e. it is generally, write ETMCR, read ETMSR, maybe read ETMSR again and that's it. Reading these registers over debug APB is ~100ns so just polling the status register two or three times should give it enough time to complete.
" Under correctly configured and non-buggy implementations, I would expect the ETM scenario to be extremely short. The ETB scenario may be longer, depending on the system.
For the ETM case, let's say the biggest ETM FIFO is 128 bytes. If we are pessimistically draining at 1 bit per cycle (e.g. that's the bandwidth we've provided in the system), then that only takes ~1k cycles. At 1GHz, that's 1us. So 10us is plenty of headroom.
For the ETB case, it has to flush all data from the whole system so this is more dependent on the number of sources and the amount of buffering in the system (visible and invisible buffers!). For a quad-core cluster connected almost directly to an ETB, this would be 4x128 bytes, and then start rounding. 500 bytes into a 32-bit ETB at 50% core clock should only take 250 core clock cycles, so much less than 1us @ 1GHz core clock. Again, a 10us timeout is more than comfortable to be concerned there's a problem.
So, on to the problems that cause the timeout... - Check you've disabled any TPIUs in the system, because they can artificially limit bandwidth. This may involve disabling a Replicator on the path to the TPIU, or simply stopping the TPIU (it resets to an enabled state!). - Check for a deadlocked flush scenario in the system. E.g. if you've initiated a flush from an ETB, if that hasn't completed this can cause issues at funnels because they wait for all flushing data before taking post-flush data, so one port not flushing properly affects all ports. - Check you've not enabled any unnecessary ports on any funnels. Some systems get badly designed and may not behave as expected if you enable an unused port, leading to the flush scenario above.
Downstream funnel ports being disabled shouldn't be a problem, because they just discard data when disabled.
The general rule for sequencing enables/disables is: - When enabling, enable from sinks to sources. - When disabling, disable from sources to sinks."
Al
-----Original Message----- From: CoreSight [mailto:coresight-bounces@lists.linaro.org] On Behalf Of Mathieu Poirier Sent: 30 September 2015 18:07 To: Tor Jeremiassen; Mike Leach Cc: coresight@lists.linaro.org Subject: Waiting for status bits
Tor, Mike and all,
Here is something i'd like your opinion on...
Before programming the ETMv3/PTM, ETMCR:10 needs to be set to one and when enabling the tracer, the bit needs to be cleared. Each time the status of ETMSR:1 needs to be probed before moving on, something that is quite costly. Is there a official limit of time for this operation to be carried out?
The same question applies for ETB's FFCR:6 and FFSR:1.
At this time the driver wait for 100 usec before complaining - from your experience, this this too short or it may need more time?
Thanks, Mathieu _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782
Hi Mathieu,
Al has covered the most of the points I was about to make re clocks and flushing but I did have one additional thought.
I am not sure if you are currently using these strategies but you may wish to consider programming ETMs to stop tracing automatically if out of an address range or by context ID match. This could mean that you do not need to disable the ETM, saving some time.
e.g. if tracing against a particular process that has an associated context ID, then setting context ID matching could remove the need to halt the ETM, assuming that the process is completely switched out and the active context ID registers changed at the time perf wants to collect the data. The ETM should have stopped tracing as the Context ID is changed.
Also, if controlling tracing in this way, then you can then consider if there is still sufficient space in the ETB - and avoid unnecessary halts / uploads.
These are obviously very usage dependent scenarios - and may not be appropriate in all use cases. Also add complications that you may not want to have in the first iterations of support.
Regards
Mike
---------------------------------------------------------------- Mike Leach +44 (0)1254 893911 (Direct) Principal Engineer +44 (0)1254 893900 (Main) Arm Blackburn Design Centre +44 (0)1254 893901 (Fax) Belthorn House Walker Rd mailto:mike.leach@arm.com Guide Blackburn BB1 2QE ----------------------------------------------------------------
-----Original Message----- From: Al Grant Sent: 01 October 2015 10:45 To: Mathieu Poirier; Tor Jeremiassen; Mike Leach Cc: coresight@lists.linaro.org Subject: RE: Waiting for status bits
Hi,
Here's some advice from our CoreSight architect John Horley. tl;dr: it should be done by 10us. If it's not done then, something's stuck somewhere.
I would suggest printing a kernel message reporting the status of all sinks and sources (and funnel ports). I wonder if the problem here could be the "deadlocked flush scenario" that John describes. When disabling an ETM you should wait for it to drain and then disable the funnel port it's connected to.
Personally I've always found these status bits are set very quickly. I.e. it is generally, write ETMCR, read ETMSR, maybe read ETMSR again and that's it. Reading these registers over debug APB is ~100ns so just polling the status register two or three times should give it enough time to complete.
" Under correctly configured and non-buggy implementations, I would expect the ETM scenario to be extremely short. The ETB scenario may be longer, depending on the system.
For the ETM case, let's say the biggest ETM FIFO is 128 bytes. If we are pessimistically draining at 1 bit per cycle (e.g. that's the bandwidth we've provided in the system), then that only takes ~1k cycles. At 1GHz, that's 1us. So 10us is plenty of headroom.
For the ETB case, it has to flush all data from the whole system so this is more dependent on the number of sources and the amount of buffering in the system (visible and invisible buffers!). For a quad-core cluster connected almost directly to an ETB, this would be 4x128 bytes, and then start rounding. 500 bytes into a 32-bit ETB at 50% core clock should only take 250 core clock cycles, so much less than 1us @ 1GHz core clock. Again, a 10us timeout is more than comfortable to be concerned there's a problem.
So, on to the problems that cause the timeout...
- Check you've disabled any TPIUs in the system, because they can artificially
limit bandwidth. This may involve disabling a Replicator on the path to the TPIU, or simply stopping the TPIU (it resets to an enabled state!).
- Check for a deadlocked flush scenario in the system. E.g. if you've initiated a
flush from an ETB, if that hasn't completed this can cause issues at funnels because they wait for all flushing data before taking post-flush data, so one port not flushing properly affects all ports.
- Check you've not enabled any unnecessary ports on any funnels. Some
systems get badly designed and may not behave as expected if you enable an unused port, leading to the flush scenario above.
Downstream funnel ports being disabled shouldn't be a problem, because they just discard data when disabled.
The general rule for sequencing enables/disables is:
- When enabling, enable from sinks to sources.
- When disabling, disable from sources to sinks."
Al
-----Original Message----- From: CoreSight [mailto:coresight-bounces@lists.linaro.org] On Behalf Of Mathieu Poirier Sent: 30 September 2015 18:07 To: Tor Jeremiassen; Mike Leach Cc: coresight@lists.linaro.org Subject: Waiting for status bits
Tor, Mike and all,
Here is something i'd like your opinion on...
Before programming the ETMv3/PTM, ETMCR:10 needs to be set to one and when enabling the tracer, the bit needs to be cleared. Each time the status of ETMSR:1 needs to be probed before moving on, something that is quite
costly.
Is there a official limit of time for this operation to be carried out?
The same question applies for ETB's FFCR:6 and FFSR:1.
At this time the driver wait for 100 usec before complaining - from your experience, this this too short or it may need more time?
Thanks, Mathieu _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782