On 19 December 2016 at 07:45, Yehuda Yitschak yehuday@marvell.com wrote:
Hi Mathieu
After some more debug I was able to resolve the trace issue I had on Linux-4.9-rc1
If you remember I only got trace for CPU2 out of 4 CPUs which was really strange
Turns out the issue comes from some quirk in our busses
Our internal fabric is not able to write 64bit data to registers, only to memory
So the address comparators in the ETM got corrupted values and there wasn’t any match on address for most CPUs.
For some cryptic reason only CPU2 got somewhat reasonable comparator value (still not the intended, but a working one) and so it could generate trace
Now I am able to generate proper trace consistently.
Very good.
I was wondering how can I add latency or timing information to the trace
I noticed the cs_etm event can accept an option of “cycacc” and “timestamp“
How can I view this information later ?
Should I use perf script –f ?
That information, when configured on the cmd line, will end up in the perf.data file. From there it will be decoded and rendered by the openCSD library.
Mike, can you comment on the format of the information that will be found in the packet? Perhaps you have an example somewhere of traces generated by the "cycacc" and "timestamp" option?
You will definitely need to create your own scripts as nothing we have done so far uses those configuration parameters.
Thanks, Mathieu
Thanks a lot
Yehuda Yitschak
Marvell Semiconductor Ltd.
Hi,
On 21 December 2016 at 02:40, Mathieu Poirier mathieu.poirier@linaro.org wrote:
On 19 December 2016 at 07:45, Yehuda Yitschak yehuday@marvell.com wrote:
Hi Mathieu
After some more debug I was able to resolve the trace issue I had on Linux-4.9-rc1
If you remember I only got trace for CPU2 out of 4 CPUs which was really strange
Turns out the issue comes from some quirk in our busses
Our internal fabric is not able to write 64bit data to registers, only to memory
So the address comparators in the ETM got corrupted values and there
wasn’t
any match on address for most CPUs.
For some cryptic reason only CPU2 got somewhat reasonable comparator
value
(still not the intended, but a working one) and so it could generate
trace
Now I am able to generate proper trace consistently.
Very good.
I was wondering how can I add latency or timing information to the trace
I noticed the cs_etm event can accept an option of “cycacc” and
“timestamp“
How can I view this information later ?
Should I use perf script –f ?
That information, when configured on the cmd line, will end up in the perf.data file. From there it will be decoded and rendered by the openCSD library.
Mike, can you comment on the format of the information that will be found in the packet? Perhaps you have an example somewhere of traces generated by the "cycacc" and "timestamp" option?
In principle the cycle counts are generated according to the cycle_count
threshold - which means you will not ordinarily get a cycle count per instruction on ETMv4. This is to avoid flooding the trace data with lots of cycle count packets. The threshold is programmable but I am not sure what value is used by the etmv4 driver.
Timestamps are generated periodically at useful events such as trace synchronisation points, exception entry and return etc.
I tested the cycacc and timestamp options on Juno this morning and found that cycle accurate tracing did not seem to be enabled at all and timestamps appeared to all be 0x0 (from the packet printing in perf report --dump). These are probably two separate issues that require further investigation. My initial check on the perf cs-etm-decoder.c file showed that an incorrect value of TRCCONFIGR appeared to be passed to the decoder - matching the correct configuration for ETMv3 with CC and TS enabled rather than ETMv4.
Finally, again reading the code in cs-etm-decoder.c, even if we get correct timestamp and cycle count packets, the processing appears to be ignoring them - focussing on instruction ranges and exceptions to enable the trace disassembly. So I cannot see how these packets are used or transmitted to the output of perf script / report.
I am not familiar with this area of the code so I could be mis-reading it - the relevant code could be elsewhere?
Regards
Mike
You will definitely need to create your own scripts as nothing we have done so far uses those configuration parameters.
Thanks, Mathieu
Thanks a lot
Yehuda Yitschak
Marvell Semiconductor Ltd.
Hi Mike, Mathieu
Do you have any plans to fix the timestamp and cycacc support ?
Thanks
Yehuda
From: Mike Leach [mailto:mike.leach@linaro.org] Sent: Wednesday, December 21, 2016 16:42 To: Mathieu Poirier Cc: Yehuda Yitschak; coresight@lists.linaro.org Subject: [EXT] Re: Trace issue on Linux-4.9-rc1
Hi,
On 21 December 2016 at 02:40, Mathieu Poirier <mathieu.poirier@linaro.orgmailto:mathieu.poirier@linaro.org> wrote: On 19 December 2016 at 07:45, Yehuda Yitschak <yehuday@marvell.commailto:yehuday@marvell.com> wrote:
Hi Mathieu
After some more debug I was able to resolve the trace issue I had on Linux-4.9-rc1
If you remember I only got trace for CPU2 out of 4 CPUs which was really strange
Turns out the issue comes from some quirk in our busses
Our internal fabric is not able to write 64bit data to registers, only to memory
So the address comparators in the ETM got corrupted values and there wasn’t any match on address for most CPUs.
For some cryptic reason only CPU2 got somewhat reasonable comparator value (still not the intended, but a working one) and so it could generate trace
Now I am able to generate proper trace consistently.
Very good.
I was wondering how can I add latency or timing information to the trace
I noticed the cs_etm event can accept an option of “cycacc” and “timestamp“
How can I view this information later ?
Should I use perf script –f ?
That information, when configured on the cmd line, will end up in the perf.data file. From there it will be decoded and rendered by the openCSD library.
Mike, can you comment on the format of the information that will be found in the packet? Perhaps you have an example somewhere of traces generated by the "cycacc" and "timestamp" option? In principle the cycle counts are generated according to the cycle_count threshold - which means you will not ordinarily get a cycle count per instruction on ETMv4. This is to avoid flooding the trace data with lots of cycle count packets. The threshold is programmable but I am not sure what value is used by the etmv4 driver. Timestamps are generated periodically at useful events such as trace synchronisation points, exception entry and return etc. I tested the cycacc and timestamp options on Juno this morning and found that cycle accurate tracing did not seem to be enabled at all and timestamps appeared to all be 0x0 (from the packet printing in perf report --dump). These are probably two separate issues that require further investigation. My initial check on the perf cs-etm-decoder.c file showed that an incorrect value of TRCCONFIGR appeared to be passed to the decoder - matching the correct configuration for ETMv3 with CC and TS enabled rather than ETMv4. Finally, again reading the code in cs-etm-decoder.c, even if we get correct timestamp and cycle count packets, the processing appears to be ignoring them - focussing on instruction ranges and exceptions to enable the trace disassembly. So I cannot see how these packets are used or transmitted to the output of perf script / report. I am not familiar with this area of the code so I could be mis-reading it - the relevant code could be elsewhere? Regards Mike
You will definitely need to create your own scripts as nothing we have done so far uses those configuration parameters.
Thanks, Mathieu
Thanks a lot
Yehuda Yitschak
Marvell Semiconductor Ltd.
-- Mike Leach Principal Engineer, ARM Ltd. Blackburn Design Centre. UK
HI Yehuda,
I've looked inot the issues with activating cycacc using perf - there is an ETMv4 driver patch I am preparing which will be posted shortly to the openCSD lists. The patch corrects the bit used to activate cycacc and sets a default threshold value of 256 cycles.
Note that ETMv4 will emit cycle count values only at traced waypoints and when the cumulative count exceeds the threshold value.This means cycle counts apply to groups of instructions rather than individual instructions.
The issue noted with timestamps I saw during my tests on Juno was a platform issue relating to the CoreSight timestamp clock being inactive. Assuming this is not the case on your platform then timestamps should be captured normally.
There is an additional decoder update to fix an issue with count values on certain CC packet types.
We will need to discuss what can be done in relation to processing the CC values for perf report/script when the team returns after the new year.
Regards
Mike
On 26 December 2016 at 06:42, Yehuda Yitschak yehuday@marvell.com wrote:
Hi Mike, Mathieu
Do you have any plans to fix the timestamp and cycacc support ?
Thanks
Yehuda
*From:* Mike Leach [mailto:mike.leach@linaro.org] *Sent:* Wednesday, December 21, 2016 16:42 *To:* Mathieu Poirier *Cc:* Yehuda Yitschak; coresight@lists.linaro.org *Subject:* [EXT] Re: Trace issue on Linux-4.9-rc1
Hi,
On 21 December 2016 at 02:40, Mathieu Poirier mathieu.poirier@linaro.org wrote:
On 19 December 2016 at 07:45, Yehuda Yitschak yehuday@marvell.com wrote:
Hi Mathieu
After some more debug I was able to resolve the trace issue I had on Linux-4.9-rc1
If you remember I only got trace for CPU2 out of 4 CPUs which was really strange
Turns out the issue comes from some quirk in our busses
Our internal fabric is not able to write 64bit data to registers, only to memory
So the address comparators in the ETM got corrupted values and there
wasn’t
any match on address for most CPUs.
For some cryptic reason only CPU2 got somewhat reasonable comparator
value
(still not the intended, but a working one) and so it could generate
trace
Now I am able to generate proper trace consistently.
Very good.
I was wondering how can I add latency or timing information to the trace
I noticed the cs_etm event can accept an option of “cycacc” and
“timestamp“
How can I view this information later ?
Should I use perf script –f ?
That information, when configured on the cmd line, will end up in the perf.data file. From there it will be decoded and rendered by the openCSD library.
Mike, can you comment on the format of the information that will be found in the packet? Perhaps you have an example somewhere of traces generated by the "cycacc" and "timestamp" option?
In principle the cycle counts are generated according to the cycle_count threshold - which means you will not ordinarily get a cycle count per instruction on ETMv4. This is to avoid flooding the trace data with lots of cycle count packets. The threshold is programmable but I am not sure what value is used by the etmv4 driver.
Timestamps are generated periodically at useful events such as trace synchronisation points, exception entry and return etc.
I tested the cycacc and timestamp options on Juno this morning and found that cycle accurate tracing did not seem to be enabled at all and timestamps appeared to all be 0x0 (from the packet printing in perf report --dump). These are probably two separate issues that require further investigation. My initial check on the perf cs-etm-decoder.c file showed that an incorrect value of TRCCONFIGR appeared to be passed to the decoder
- matching the correct configuration for ETMv3 with CC and TS enabled
rather than ETMv4.
Finally, again reading the code in cs-etm-decoder.c, even if we get correct timestamp and cycle count packets, the processing appears to be ignoring them - focussing on instruction ranges and exceptions to enable the trace disassembly. So I cannot see how these packets are used or transmitted to the output of perf script / report.
I am not familiar with this area of the code so I could be mis-reading it
- the relevant code could be elsewhere?
Regards
Mike
You will definitely need to create your own scripts as nothing we have done so far uses those configuration parameters.
Thanks, Mathieu
Thanks a lot
Yehuda Yitschak
Marvell Semiconductor Ltd.
--
Mike Leach
Principal Engineer, ARM Ltd.
Blackburn Design Centre. UK
On 28 December 2016 at 08:26, Mike Leach mike.leach@linaro.org wrote:
HI Yehuda,
I've looked inot the issues with activating cycacc using perf - there is an ETMv4 driver patch I am preparing which will be posted shortly to the openCSD lists. The patch corrects the bit used to activate cycacc and sets a default threshold value of 256 cycles.
I have double checked the configuration bits in TRCCONFIGR and I concur with Mike - setting of the cycacc bit is completely wrong. In fact the first 6 bits are wrong (and there is more). The only thing I can guess is those bits come from a very early (and obsolete) specification or the original implementation this driver was written for strays from the standard.
We will fix this - thanks for the patience.
Mathieu
Note that ETMv4 will emit cycle count values only at traced waypoints and when the cumulative count exceeds the threshold value.This means cycle counts apply to groups of instructions rather than individual instructions.
The issue noted with timestamps I saw during my tests on Juno was a platform issue relating to the CoreSight timestamp clock being inactive. Assuming this is not the case on your platform then timestamps should be captured normally.
There is an additional decoder update to fix an issue with count values on certain CC packet types.
We will need to discuss what can be done in relation to processing the CC values for perf report/script when the team returns after the new year.
Regards
Mike
On 26 December 2016 at 06:42, Yehuda Yitschak yehuday@marvell.com wrote:
Hi Mike, Mathieu
Do you have any plans to fix the timestamp and cycacc support ?
Thanks
Yehuda
From: Mike Leach [mailto:mike.leach@linaro.org] Sent: Wednesday, December 21, 2016 16:42 To: Mathieu Poirier Cc: Yehuda Yitschak; coresight@lists.linaro.org Subject: [EXT] Re: Trace issue on Linux-4.9-rc1
Hi,
On 21 December 2016 at 02:40, Mathieu Poirier mathieu.poirier@linaro.org wrote:
On 19 December 2016 at 07:45, Yehuda Yitschak yehuday@marvell.com wrote:
Hi Mathieu
After some more debug I was able to resolve the trace issue I had on Linux-4.9-rc1
If you remember I only got trace for CPU2 out of 4 CPUs which was really strange
Turns out the issue comes from some quirk in our busses
Our internal fabric is not able to write 64bit data to registers, only to memory
So the address comparators in the ETM got corrupted values and there wasn’t any match on address for most CPUs.
For some cryptic reason only CPU2 got somewhat reasonable comparator value (still not the intended, but a working one) and so it could generate trace
Now I am able to generate proper trace consistently.
Very good.
I was wondering how can I add latency or timing information to the trace
I noticed the cs_etm event can accept an option of “cycacc” and “timestamp“
How can I view this information later ?
Should I use perf script –f ?
That information, when configured on the cmd line, will end up in the perf.data file. From there it will be decoded and rendered by the openCSD library.
Mike, can you comment on the format of the information that will be found in the packet? Perhaps you have an example somewhere of traces generated by the "cycacc" and "timestamp" option?
In principle the cycle counts are generated according to the cycle_count threshold - which means you will not ordinarily get a cycle count per instruction on ETMv4. This is to avoid flooding the trace data with lots of cycle count packets. The threshold is programmable but I am not sure what value is used by the etmv4 driver.
Timestamps are generated periodically at useful events such as trace synchronisation points, exception entry and return etc.
I tested the cycacc and timestamp options on Juno this morning and found that cycle accurate tracing did not seem to be enabled at all and timestamps appeared to all be 0x0 (from the packet printing in perf report --dump). These are probably two separate issues that require further investigation. My initial check on the perf cs-etm-decoder.c file showed that an incorrect value of TRCCONFIGR appeared to be passed to the decoder - matching the correct configuration for ETMv3 with CC and TS enabled rather than ETMv4.
Finally, again reading the code in cs-etm-decoder.c, even if we get correct timestamp and cycle count packets, the processing appears to be ignoring them - focussing on instruction ranges and exceptions to enable the trace disassembly. So I cannot see how these packets are used or transmitted to the output of perf script / report.
I am not familiar with this area of the code so I could be mis-reading it
- the relevant code could be elsewhere?
Regards
Mike
You will definitely need to create your own scripts as nothing we have done so far uses those configuration parameters.
Thanks, Mathieu
Thanks a lot
Yehuda Yitschak
Marvell Semiconductor Ltd.
--
Mike Leach
Principal Engineer, ARM Ltd.
Blackburn Design Centre. UK
-- Mike Leach Principal Engineer, ARM Ltd. Blackburn Design Centre. UK
On 25 December 2016 at 23:42, Yehuda Yitschak yehuday@marvell.com wrote:
Hi Mike, Mathieu
Do you have any plans to fix the timestamp and cycacc support ?
Hi Yehuda,
I have pushed Mike's patch to branch perf-opencsd-4.10-rc2 on github. With that cycle accurate tracing will be configured properly. On the timestamp front the FW team at ARM has produced a new FW image that will be available shortly - details are being worked out.
For now on my side I see entries like these when dumping the content of the perf.data file (perf report --dump):
138: I_CCNT_F1 : Cycle Count format 1.; Count=0x153 141: I_CCNT_F2 : Cycle Count format 2.; Count=0x10d 164: I_TIMESTAMP : Timestamp.; Updated val = 0x112f7cc01edc; CC=0x11 193: I_CCNT_F3 : Cycle Count format 3.; Count=0x100 684: I_TIMESTAMP : Timestamp.; Updated val = 0x112f7cc16dd1; CC=0xa1
That being said it doesn't change the fact that TS and CC packets aren't synthesised, i.e not conveyed to the perf tools for rendering through the conventional user interfaces. At first glance code in [1] will have to be extended to take the new packet types into account.
From there the main synthesise function [2] will need to plug the TS
and CC information in perf_samples.
I am not the author of the code in cs-etm.c and as such not familiar with exactly how the above has to be done. On the flip side it is likely that I will have to upstream it in the next little while, so that situation will change.
Thanks, Mathieu
[1]. tools/perf/util/cs-etm.c:971 [2]. tools/perf/util/cs-etm.c:724
Thanks
Yehuda
From: Mike Leach [mailto:mike.leach@linaro.org] Sent: Wednesday, December 21, 2016 16:42 To: Mathieu Poirier Cc: Yehuda Yitschak; coresight@lists.linaro.org Subject: [EXT] Re: Trace issue on Linux-4.9-rc1
Hi,
On 21 December 2016 at 02:40, Mathieu Poirier mathieu.poirier@linaro.org wrote:
On 19 December 2016 at 07:45, Yehuda Yitschak yehuday@marvell.com wrote:
Hi Mathieu
After some more debug I was able to resolve the trace issue I had on Linux-4.9-rc1
If you remember I only got trace for CPU2 out of 4 CPUs which was really strange
Turns out the issue comes from some quirk in our busses
Our internal fabric is not able to write 64bit data to registers, only to memory
So the address comparators in the ETM got corrupted values and there wasn’t any match on address for most CPUs.
For some cryptic reason only CPU2 got somewhat reasonable comparator value (still not the intended, but a working one) and so it could generate trace
Now I am able to generate proper trace consistently.
Very good.
I was wondering how can I add latency or timing information to the trace
I noticed the cs_etm event can accept an option of “cycacc” and “timestamp“
How can I view this information later ?
Should I use perf script –f ?
That information, when configured on the cmd line, will end up in the perf.data file. From there it will be decoded and rendered by the openCSD library.
Mike, can you comment on the format of the information that will be found in the packet? Perhaps you have an example somewhere of traces generated by the "cycacc" and "timestamp" option?
In principle the cycle counts are generated according to the cycle_count threshold - which means you will not ordinarily get a cycle count per instruction on ETMv4. This is to avoid flooding the trace data with lots of cycle count packets. The threshold is programmable but I am not sure what value is used by the etmv4 driver.
Timestamps are generated periodically at useful events such as trace synchronisation points, exception entry and return etc.
I tested the cycacc and timestamp options on Juno this morning and found that cycle accurate tracing did not seem to be enabled at all and timestamps appeared to all be 0x0 (from the packet printing in perf report --dump). These are probably two separate issues that require further investigation. My initial check on the perf cs-etm-decoder.c file showed that an incorrect value of TRCCONFIGR appeared to be passed to the decoder - matching the correct configuration for ETMv3 with CC and TS enabled rather than ETMv4.
Finally, again reading the code in cs-etm-decoder.c, even if we get correct timestamp and cycle count packets, the processing appears to be ignoring them - focussing on instruction ranges and exceptions to enable the trace disassembly. So I cannot see how these packets are used or transmitted to the output of perf script / report.
I am not familiar with this area of the code so I could be mis-reading it - the relevant code could be elsewhere?
Regards
Mike
You will definitely need to create your own scripts as nothing we have done so far uses those configuration parameters.
Thanks, Mathieu
Thanks a lot
Yehuda Yitschak
Marvell Semiconductor Ltd.
--
Mike Leach
Principal Engineer, ARM Ltd.
Blackburn Design Centre. UK
Hi Mathieu, Mike
Sorry for the really late response. Your e-mails were somehow filtered out of my inbox to my Coresight folder.
Thanks for taking the time to look into these issues. Let me know once you have something working for CC and TS and I will do my best to test it on my platform
Best Regards
Yehuda
-----Original Message----- From: Mathieu Poirier [mailto:mathieu.poirier@linaro.org] Sent: Friday, January 06, 2017 19:48 To: Yehuda Yitschak Cc: Mike Leach; coresight@lists.linaro.org Subject: Re: [EXT] Re: Trace issue on Linux-4.9-rc1
On 25 December 2016 at 23:42, Yehuda Yitschak yehuday@marvell.com wrote:
Hi Mike, Mathieu
Do you have any plans to fix the timestamp and cycacc support ?
Hi Yehuda,
I have pushed Mike's patch to branch perf-opencsd-4.10-rc2 on github. With that cycle accurate tracing will be configured properly. On the timestamp front the FW team at ARM has produced a new FW image that will be available shortly - details are being worked out.
For now on my side I see entries like these when dumping the content of the perf.data file (perf report --dump):
138: I_CCNT_F1 : Cycle Count format 1.; Count=0x153 141: I_CCNT_F2 : Cycle Count format 2.; Count=0x10d 164: I_TIMESTAMP : Timestamp.; Updated val = 0x112f7cc01edc; CC=0x11 193: I_CCNT_F3 : Cycle Count format 3.; Count=0x100 684: I_TIMESTAMP : Timestamp.; Updated val = 0x112f7cc16dd1; CC=0xa1
That being said it doesn't change the fact that TS and CC packets aren't synthesised, i.e not conveyed to the perf tools for rendering through the conventional user interfaces. At first glance code in [1] will have to be extended to take the new packet types into account. From there the main synthesise function [2] will need to plug the TS and CC information in perf_samples.
I am not the author of the code in cs-etm.c and as such not familiar with exactly how the above has to be done. On the flip side it is likely that I will have to upstream it in the next little while, so that situation will change.
Thanks, Mathieu
[1]. tools/perf/util/cs-etm.c:971 [2]. tools/perf/util/cs-etm.c:724
Thanks
Yehuda
From: Mike Leach [mailto:mike.leach@linaro.org] Sent: Wednesday, December 21, 2016 16:42 To: Mathieu Poirier Cc: Yehuda Yitschak; coresight@lists.linaro.org Subject: [EXT] Re: Trace issue on Linux-4.9-rc1
Hi,
On 21 December 2016 at 02:40, Mathieu Poirier mathieu.poirier@linaro.org wrote:
On 19 December 2016 at 07:45, Yehuda Yitschak yehuday@marvell.com
wrote:
Hi Mathieu
After some more debug I was able to resolve the trace issue I had on Linux-4.9-rc1
If you remember I only got trace for CPU2 out of 4 CPUs which was really strange
Turns out the issue comes from some quirk in our busses
Our internal fabric is not able to write 64bit data to registers, only to memory
So the address comparators in the ETM got corrupted values and there wasn’t any match on address for most CPUs.
For some cryptic reason only CPU2 got somewhat reasonable comparator value (still not the intended, but a working one) and so it could generate trace
Now I am able to generate proper trace consistently.
Very good.
I was wondering how can I add latency or timing information to the trace
I noticed the cs_etm event can accept an option of “cycacc” and “timestamp“
How can I view this information later ?
Should I use perf script –f ?
That information, when configured on the cmd line, will end up in the perf.data file. From there it will be decoded and rendered by the openCSD library.
Mike, can you comment on the format of the information that will be found in the packet? Perhaps you have an example somewhere of traces generated by the "cycacc" and "timestamp" option?
In principle the cycle counts are generated according to the cycle_count threshold - which means you will not ordinarily get a cycle count per instruction on ETMv4. This is to avoid flooding the trace data with lots of cycle count packets. The threshold is programmable but I am not sure what value is used by the etmv4 driver.
Timestamps are generated periodically at useful events such as trace synchronisation points, exception entry and return etc.
I tested the cycacc and timestamp options on Juno this morning and found that cycle accurate tracing did not seem to be enabled at all and timestamps appeared to all be 0x0 (from the packet printing in perf
report --dump).
These are probably two separate issues that require further investigation. My initial check on the perf cs-etm-decoder.c file showed that an incorrect value of TRCCONFIGR appeared to be passed to the decoder - matching the correct configuration for ETMv3 with CC and TS enabled
rather than ETMv4.
Finally, again reading the code in cs-etm-decoder.c, even if we get correct timestamp and cycle count packets, the processing appears to be ignoring them - focussing on instruction ranges and exceptions to enable the trace disassembly. So I cannot see how these packets are used or transmitted to the output of perf script / report.
I am not familiar with this area of the code so I could be mis-reading it - the relevant code could be elsewhere?
Regards
Mike
You will definitely need to create your own scripts as nothing we have done so far uses those configuration parameters.
Thanks, Mathieu
Thanks a lot
Yehuda Yitschak
Marvell Semiconductor Ltd.
--
Mike Leach
Principal Engineer, ARM Ltd.
Blackburn Design Centre. UK