This patch series is to address issues for synthesizing instruction samples, especially when the instruction sample period is small enough, the current logic cannot synthesize multiple instruction samples within one instruction range packet.
Patch 0001 is to swap packets for instruction samples, so this allow option '--itrace=iNNN' can work well.
Patch 0002 avoids to reset the last branches for every instruction sample; if reset the last branches for every time generating sample, the later samples in the same range packet cannot use the last branches anymore.
Patch 0003 is the fixing for handling different instruction periods, especially for small sample period.
Patch 0004 is an optimization for copying last branches; it only copies last branches once if the instruction samples share the same last branches.
Patch 0005 is a minor fix for unsigned variable comparison to zero.
This patch set has been rebased on the latest perf/core branch; and verified on Juno board with below commands:
# perf script --itrace=i2 # perf script --itrace=i2il16 # perf inject --itrace=i2il16 -i perf.data -o perf.data.new # perf inject --itrace=i100il16 -i perf.data -o perf.data.new
Changes from v4: * Added Mike's review tag for patch 03; * Added Mathieu's review tags for all patches.
Changes from v3: * Refactored patch 0001 with new function cs_etm__packet_swap() (Mike); * Refined instruction sample generation flow with single while loop, which completely uses Mike's suggestions (Mike); * Added Mike's review tags for patch 01/02/04/05.
Changes from v2: * Added patch 0001 which is to fix swapping packets for instruction samples; * Refined minor commit logs and comments; * Rebased on the latest perf/core branch.
Changes from v1: * Rebased patch set on perf/core branch with latest commit 9fec3cd5fa4a ("perf map: Check if the map still has some refcounts on exit").
Leo Yan (5): perf cs-etm: Swap packets for instruction samples perf cs-etm: Continuously record last branch perf cs-etm: Correct synthesizing instruction samples perf cs-etm: Optimize copying last branches perf cs-etm: Fix unsigned variable comparison to zero
tools/perf/util/cs-etm.c | 157 +++++++++++++++++++++++++++------------ 1 file changed, 111 insertions(+), 46 deletions(-)
If use option '--itrace=iNNN' with Arm CoreSight trace data, perf tool fails inject instruction samples; the root cause is the packets are only swapped for branch samples and last branches but not for instruction samples, so the new coming packets cannot be properly handled for only synthesizing instruction samples.
To fix this issue, this patch refactors the code with a new function cs_etm__packet_swap() which is used to swap packets and adds the condition for instruction samples.
Signed-off-by: Leo Yan leo.yan@linaro.org Reviewed-by: Mike Leach mike.leach@linaro.org Reviewed-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm.c | 39 +++++++++++++++++++-------------------- 1 file changed, 19 insertions(+), 20 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 5471045ebf5c..84f30c2de185 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -363,6 +363,23 @@ struct cs_etm_packet_queue return NULL; }
+static void cs_etm__packet_swap(struct cs_etm_auxtrace *etm, + struct cs_etm_traceid_queue *tidq) +{ + struct cs_etm_packet *tmp; + + if (etm->sample_branches || etm->synth_opts.last_branch || + etm->sample_instructions) { + /* + * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for + * the next incoming packet. + */ + tmp = tidq->packet; + tidq->packet = tidq->prev_packet; + tidq->prev_packet = tmp; + } +} + static void cs_etm__packet_dump(const char *pkt_string) { const char *color = PERF_COLOR_BLUE; @@ -1340,7 +1357,6 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq) { struct cs_etm_auxtrace *etm = etmq->etm; - struct cs_etm_packet *tmp; int ret; u8 trace_chan_id = tidq->trace_chan_id; u64 instrs_executed = tidq->packet->instr_count; @@ -1404,15 +1420,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, } }
- if (etm->sample_branches || etm->synth_opts.last_branch) { - /* - * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for - * the next incoming packet. - */ - tmp = tidq->packet; - tidq->packet = tidq->prev_packet; - tidq->prev_packet = tmp; - } + cs_etm__packet_swap(etm, tidq);
return 0; } @@ -1441,7 +1449,6 @@ static int cs_etm__flush(struct cs_etm_queue *etmq, { int err = 0; struct cs_etm_auxtrace *etm = etmq->etm; - struct cs_etm_packet *tmp;
/* Handle start tracing packet */ if (tidq->prev_packet->sample_type == CS_ETM_EMPTY) @@ -1476,15 +1483,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq, }
swap_packet: - if (etm->sample_branches || etm->synth_opts.last_branch) { - /* - * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for - * the next incoming packet. - */ - tmp = tidq->packet; - tidq->packet = tidq->prev_packet; - tidq->prev_packet = tmp; - } + cs_etm__packet_swap(etm, tidq);
return err; }
The following commit has been merged into the perf/core branch of tip:
Commit-ID: d01751563caf0dec7be36f81de77cc0197b77e59 Gitweb: https://git.kernel.org/tip/d01751563caf0dec7be36f81de77cc0197b77e59 Author: Leo Yan leo.yan@linaro.org AuthorDate: Wed, 19 Feb 2020 10:18:07 +08:00 Committer: Arnaldo Carvalho de Melo acme@redhat.com CommitterDate: Wed, 11 Mar 2020 10:48:44 -03:00
perf cs-etm: Swap packets for instruction samples
If use option '--itrace=iNNN' with Arm CoreSight trace data, perf tool fails inject instruction samples; the root cause is the packets are only swapped for branch samples and last branches but not for instruction samples, so the new coming packets cannot be properly handled for only synthesizing instruction samples.
To fix this issue, this patch refactors the code with a new function cs_etm__packet_swap() which is used to swap packets and adds the condition for instruction samples.
Signed-off-by: Leo Yan leo.yan@linaro.org Reviewed-by: Mathieu Poirier mathieu.poirier@linaro.org Reviewed-by: Mike Leach mike.leach@linaro.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Robert Walker robert.walker@arm.com Cc: Suzuki Poulouse suzuki.poulose@arm.com Cc: coresight ml coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200219021811.20067-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com --- tools/perf/util/cs-etm.c | 39 +++++++++++++++++++-------------------- 1 file changed, 19 insertions(+), 20 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index b3b3fe3..294b09c 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -363,6 +363,23 @@ struct cs_etm_packet_queue return NULL; }
+static void cs_etm__packet_swap(struct cs_etm_auxtrace *etm, + struct cs_etm_traceid_queue *tidq) +{ + struct cs_etm_packet *tmp; + + if (etm->sample_branches || etm->synth_opts.last_branch || + etm->sample_instructions) { + /* + * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for + * the next incoming packet. + */ + tmp = tidq->packet; + tidq->packet = tidq->prev_packet; + tidq->prev_packet = tmp; + } +} + static void cs_etm__packet_dump(const char *pkt_string) { const char *color = PERF_COLOR_BLUE; @@ -1342,7 +1359,6 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq) { struct cs_etm_auxtrace *etm = etmq->etm; - struct cs_etm_packet *tmp; int ret; u8 trace_chan_id = tidq->trace_chan_id; u64 instrs_executed = tidq->packet->instr_count; @@ -1406,15 +1422,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, } }
- if (etm->sample_branches || etm->synth_opts.last_branch) { - /* - * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for - * the next incoming packet. - */ - tmp = tidq->packet; - tidq->packet = tidq->prev_packet; - tidq->prev_packet = tmp; - } + cs_etm__packet_swap(etm, tidq);
return 0; } @@ -1443,7 +1451,6 @@ static int cs_etm__flush(struct cs_etm_queue *etmq, { int err = 0; struct cs_etm_auxtrace *etm = etmq->etm; - struct cs_etm_packet *tmp;
/* Handle start tracing packet */ if (tidq->prev_packet->sample_type == CS_ETM_EMPTY) @@ -1478,15 +1485,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq, }
swap_packet: - if (etm->sample_branches || etm->synth_opts.last_branch) { - /* - * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for - * the next incoming packet. - */ - tmp = tidq->packet; - tidq->packet = tidq->prev_packet; - tidq->prev_packet = tmp; - } + cs_etm__packet_swap(etm, tidq);
return err; }
Every time synthesize instruction sample, the last branch recording will be reset. This is fine if the instruction period is big enough, for example if use the option '--itrace=i100000', the last branch array is reset for every sample with 100000 instructions per period; before generate the next instruction sample, there has the sufficient packets coming to fill the last branch array.
On the other hand, if set a very small period, the packets will be significantly reduced between two continuous instruction samples, thus the last branch array is almost empty for new instruction sample by frequently resetting.
To allow the last branches to work properly for any instruction periods, this patch avoids to reset the last branch for every instruction sample and only reset it when flush the trace data. The last branches will be reset only for two cases, one is for trace starting, another case is for discontinuous trace; other cases can keep recording last branches for continuous instruction samples.
Signed-off-by: Leo Yan leo.yan@linaro.org Reviewed-by: Mike Leach mike.leach@linaro.org Reviewed-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 84f30c2de185..b2f31390126a 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1170,9 +1170,6 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, "CS ETM Trace: failed to deliver instruction event, error %d\n", ret);
- if (etm->synth_opts.last_branch) - cs_etm__reset_last_branch_rb(tidq); - return ret; }
@@ -1485,6 +1482,10 @@ static int cs_etm__flush(struct cs_etm_queue *etmq, swap_packet: cs_etm__packet_swap(etm, tidq);
+ /* Reset last branches after flush the trace */ + if (etm->synth_opts.last_branch) + cs_etm__reset_last_branch_rb(tidq); + return err; }
The following commit has been merged into the perf/core branch of tip:
Commit-ID: f1410028c762893daf353765112cf6797e4442fa Gitweb: https://git.kernel.org/tip/f1410028c762893daf353765112cf6797e4442fa Author: Leo Yan leo.yan@linaro.org AuthorDate: Wed, 19 Feb 2020 10:18:08 +08:00 Committer: Arnaldo Carvalho de Melo acme@redhat.com CommitterDate: Wed, 11 Mar 2020 10:48:44 -03:00
perf cs-etm: Continuously record last branch
Every time synthesize instruction sample, the last branch recording will be reset. This is fine if the instruction period is big enough, for example if use the option '--itrace=i100000', the last branch array is reset for every sample with 100000 instructions per period; before generate the next instruction sample, there has the sufficient packets coming to fill the last branch array.
On the other hand, if set a very small period, the packets will be significantly reduced between two continuous instruction samples, thus the last branch array is almost empty for new instruction sample by frequently resetting.
To allow the last branches to work properly for any instruction periods, this patch avoids to reset the last branch for every instruction sample and only reset it when flush the trace data. The last branches will be reset only for two cases, one is for trace starting, another case is for discontinuous trace; other cases can keep recording last branches for continuous instruction samples.
Signed-off-by: Leo Yan leo.yan@linaro.org Reviewed-by: Mathieu Poirier mathieu.poirier@linaro.org Reviewed-by: Mike Leach mike.leach@linaro.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Robert Walker robert.walker@arm.com Cc: Suzuki Poulouse suzuki.poulose@arm.com Cc: coresight ml coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200219021811.20067-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com --- tools/perf/util/cs-etm.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 294b09c..2c4156c 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1170,9 +1170,6 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, "CS ETM Trace: failed to deliver instruction event, error %d\n", ret);
- if (etm->synth_opts.last_branch) - cs_etm__reset_last_branch_rb(tidq); - return ret; }
@@ -1487,6 +1484,10 @@ static int cs_etm__flush(struct cs_etm_queue *etmq, swap_packet: cs_etm__packet_swap(etm, tidq);
+ /* Reset last branches after flush the trace */ + if (etm->synth_opts.last_branch) + cs_etm__reset_last_branch_rb(tidq); + return err; }
When 'etm->instructions_sample_period' is less than 'tidq->period_instructions', the function cs_etm__sample() cannot handle this case properly with its logic.
Let's see below flow as an example:
- If we set itrace option '--itrace=i4', then function cs_etm__sample() has variables with initialized values:
tidq->period_instructions = 0 etm->instructions_sample_period = 4
- When the first packet is coming:
packet->instr_count = 10; the number of instructions executed in this packet is 10, thus update period_instructions as below:
tidq->period_instructions = 0 + 10 = 10 instrs_over = 10 - 4 = 6 offset = 10 - 6 - 1 = 3 tidq->period_instructions = instrs_over = 6
- When the second packet is coming:
packet->instr_count = 10; in the second pass, assume 10 instructions in the trace sample again:
tidq->period_instructions = 6 + 10 = 16 instrs_over = 16 - 4 = 12 offset = 10 - 12 - 1 = -3 -> the negative value tidq->period_instructions = instrs_over = 12
So after handle these two packets, there have below issues:
The first issue is that cs_etm__instr_addr() returns the address within the current trace sample of the instruction related to offset, so the offset is supposed to be always unsigned value. But in fact, function cs_etm__sample() might calculate a negative offset value (in handling the second packet, the offset is -3) and pass to cs_etm__instr_addr() with u64 type with a big positive integer.
The second issue is it only synthesizes 2 samples for sample period = 4. In theory, every packet has 10 instructions so the two packets have total 20 instructions, 20 instructions should generate 5 samples (4 x 5 = 20). This is because cs_etm__sample() only calls once cs_etm__synth_instruction_sample() to generate instruction sample per range packet.
This patch fixes the logic in function cs_etm__sample(); the basic idea for handling coming packet is:
- To synthesize the first instruction sample, it combines the left instructions from the previous packet and the head of the new packet; then generate continuous samples with sample period; - At the tail of the new packet, if it has the rest instructions, these instructions will be left for the sequential sample.
Suggested-by: Mike Leach mike.leach@linaro.org Signed-off-by: Leo Yan leo.yan@linaro.org Reviewed-by: Mike Leach mike.leach@linaro.org Reviewed-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm.c | 87 ++++++++++++++++++++++++++++++++-------- 1 file changed, 70 insertions(+), 17 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index b2f31390126a..4b7d6c36ce3c 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1356,9 +1356,12 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, struct cs_etm_auxtrace *etm = etmq->etm; int ret; u8 trace_chan_id = tidq->trace_chan_id; - u64 instrs_executed = tidq->packet->instr_count; + u64 instrs_prev;
- tidq->period_instructions += instrs_executed; + /* Get instructions remainder from previous packet */ + instrs_prev = tidq->period_instructions; + + tidq->period_instructions += tidq->packet->instr_count;
/* * Record a branch when the last instruction in @@ -1376,26 +1379,76 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, * TODO: allow period to be defined in cycles and clock time */
- /* Get number of instructions executed after the sample point */ - u64 instrs_over = tidq->period_instructions - - etm->instructions_sample_period; + /* + * Below diagram demonstrates the instruction samples + * generation flows: + * + * Instrs Instrs Instrs Instrs + * Sample(n) Sample(n+1) Sample(n+2) Sample(n+3) + * | | | | + * V V V V + * -------------------------------------------------- + * ^ ^ + * | | + * Period Period + * instructions(Pi) instructions(Pi') + * + * | | + * ---------------- -----------------/ + * V + * tidq->packet->instr_count + * + * Instrs Sample(n...) are the synthesised samples occurring + * every etm->instructions_sample_period instructions - as + * defined on the perf command line. Sample(n) is being the + * last sample before the current etm packet, n+1 to n+3 + * samples are generated from the current etm packet. + * + * tidq->packet->instr_count represents the number of + * instructions in the current etm packet. + * + * Period instructions (Pi) contains the the number of + * instructions executed after the sample point(n) from the + * previous etm packet. This will always be less than + * etm->instructions_sample_period. + * + * When generate new samples, it combines with two parts + * instructions, one is the tail of the old packet and another + * is the head of the new coming packet, to generate + * sample(n+1); sample(n+2) and sample(n+3) consume the + * instructions with sample period. After sample(n+3), the rest + * instructions will be used by later packet and it is assigned + * to tidq->period_instructions for next round calculation. + */
/* - * Calculate the address of the sampled instruction (-1 as - * sample is reported as though instruction has just been - * executed, but PC has not advanced to next instruction) + * Get the initial offset into the current packet instructions; + * entry conditions ensure that instrs_prev is less than + * etm->instructions_sample_period. */ - u64 offset = (instrs_executed - instrs_over - 1); - u64 addr = cs_etm__instr_addr(etmq, trace_chan_id, - tidq->packet, offset); + u64 offset = etm->instructions_sample_period - instrs_prev; + u64 addr;
- ret = cs_etm__synth_instruction_sample( - etmq, tidq, addr, etm->instructions_sample_period); - if (ret) - return ret; + while (tidq->period_instructions >= + etm->instructions_sample_period) { + /* + * Calculate the address of the sampled instruction (-1 + * as sample is reported as though instruction has just + * been executed, but PC has not advanced to next + * instruction) + */ + addr = cs_etm__instr_addr(etmq, trace_chan_id, + tidq->packet, offset - 1); + ret = cs_etm__synth_instruction_sample( + etmq, tidq, addr, + etm->instructions_sample_period); + if (ret) + return ret;
- /* Carry remaining instructions into next sample period */ - tidq->period_instructions = instrs_over; + offset += etm->instructions_sample_period; + tidq->period_instructions -= + etm->instructions_sample_period; + } }
if (etm->sample_branches) {
The following commit has been merged into the perf/core branch of tip:
Commit-ID: c9f5baa136777b2c982f6f7a90c9da69a88be148 Gitweb: https://git.kernel.org/tip/c9f5baa136777b2c982f6f7a90c9da69a88be148 Author: Leo Yan leo.yan@linaro.org AuthorDate: Wed, 19 Feb 2020 10:18:09 +08:00 Committer: Arnaldo Carvalho de Melo acme@redhat.com CommitterDate: Wed, 11 Mar 2020 10:48:44 -03:00
perf cs-etm: Correct synthesizing instruction samples
When 'etm->instructions_sample_period' is less than 'tidq->period_instructions', the function cs_etm__sample() cannot handle this case properly with its logic.
Let's see below flow as an example:
- If we set itrace option '--itrace=i4', then function cs_etm__sample() has variables with initialized values:
tidq->period_instructions = 0 etm->instructions_sample_period = 4
- When the first packet is coming:
packet->instr_count = 10; the number of instructions executed in this packet is 10, thus update period_instructions as below:
tidq->period_instructions = 0 + 10 = 10 instrs_over = 10 - 4 = 6 offset = 10 - 6 - 1 = 3 tidq->period_instructions = instrs_over = 6
- When the second packet is coming:
packet->instr_count = 10; in the second pass, assume 10 instructions in the trace sample again:
tidq->period_instructions = 6 + 10 = 16 instrs_over = 16 - 4 = 12 offset = 10 - 12 - 1 = -3 -> the negative value tidq->period_instructions = instrs_over = 12
So after handle these two packets, there have below issues:
The first issue is that cs_etm__instr_addr() returns the address within the current trace sample of the instruction related to offset, so the offset is supposed to be always unsigned value. But in fact, function cs_etm__sample() might calculate a negative offset value (in handling the second packet, the offset is -3) and pass to cs_etm__instr_addr() with u64 type with a big positive integer.
The second issue is it only synthesizes 2 samples for sample period = 4. In theory, every packet has 10 instructions so the two packets have total 20 instructions, 20 instructions should generate 5 samples (4 x 5 = 20). This is because cs_etm__sample() only calls once cs_etm__synth_instruction_sample() to generate instruction sample per range packet.
This patch fixes the logic in function cs_etm__sample(); the basic idea for handling coming packet is:
- To synthesize the first instruction sample, it combines the left instructions from the previous packet and the head of the new packet; then generate continuous samples with sample period; - At the tail of the new packet, if it has the rest instructions, these instructions will be left for the sequential sample.
Suggested-by: Mike Leach mike.leach@linaro.org Signed-off-by: Leo Yan leo.yan@linaro.org Reviewed-by: Mathieu Poirier mathieu.poirier@linaro.org Reviewed-by: Mike Leach mike.leach@linaro.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Robert Walker robert.walker@arm.com Cc: Suzuki Poulouse suzuki.poulose@arm.com Cc: coresight ml coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200219021811.20067-4-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com --- tools/perf/util/cs-etm.c | 87 +++++++++++++++++++++++++++++++-------- 1 file changed, 70 insertions(+), 17 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 2c4156c..1ddcc67 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1358,9 +1358,12 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, struct cs_etm_auxtrace *etm = etmq->etm; int ret; u8 trace_chan_id = tidq->trace_chan_id; - u64 instrs_executed = tidq->packet->instr_count; + u64 instrs_prev;
- tidq->period_instructions += instrs_executed; + /* Get instructions remainder from previous packet */ + instrs_prev = tidq->period_instructions; + + tidq->period_instructions += tidq->packet->instr_count;
/* * Record a branch when the last instruction in @@ -1378,26 +1381,76 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, * TODO: allow period to be defined in cycles and clock time */
- /* Get number of instructions executed after the sample point */ - u64 instrs_over = tidq->period_instructions - - etm->instructions_sample_period; + /* + * Below diagram demonstrates the instruction samples + * generation flows: + * + * Instrs Instrs Instrs Instrs + * Sample(n) Sample(n+1) Sample(n+2) Sample(n+3) + * | | | | + * V V V V + * -------------------------------------------------- + * ^ ^ + * | | + * Period Period + * instructions(Pi) instructions(Pi') + * + * | | + * ---------------- -----------------/ + * V + * tidq->packet->instr_count + * + * Instrs Sample(n...) are the synthesised samples occurring + * every etm->instructions_sample_period instructions - as + * defined on the perf command line. Sample(n) is being the + * last sample before the current etm packet, n+1 to n+3 + * samples are generated from the current etm packet. + * + * tidq->packet->instr_count represents the number of + * instructions in the current etm packet. + * + * Period instructions (Pi) contains the the number of + * instructions executed after the sample point(n) from the + * previous etm packet. This will always be less than + * etm->instructions_sample_period. + * + * When generate new samples, it combines with two parts + * instructions, one is the tail of the old packet and another + * is the head of the new coming packet, to generate + * sample(n+1); sample(n+2) and sample(n+3) consume the + * instructions with sample period. After sample(n+3), the rest + * instructions will be used by later packet and it is assigned + * to tidq->period_instructions for next round calculation. + */
/* - * Calculate the address of the sampled instruction (-1 as - * sample is reported as though instruction has just been - * executed, but PC has not advanced to next instruction) + * Get the initial offset into the current packet instructions; + * entry conditions ensure that instrs_prev is less than + * etm->instructions_sample_period. */ - u64 offset = (instrs_executed - instrs_over - 1); - u64 addr = cs_etm__instr_addr(etmq, trace_chan_id, - tidq->packet, offset); + u64 offset = etm->instructions_sample_period - instrs_prev; + u64 addr;
- ret = cs_etm__synth_instruction_sample( - etmq, tidq, addr, etm->instructions_sample_period); - if (ret) - return ret; + while (tidq->period_instructions >= + etm->instructions_sample_period) { + /* + * Calculate the address of the sampled instruction (-1 + * as sample is reported as though instruction has just + * been executed, but PC has not advanced to next + * instruction) + */ + addr = cs_etm__instr_addr(etmq, trace_chan_id, + tidq->packet, offset - 1); + ret = cs_etm__synth_instruction_sample( + etmq, tidq, addr, + etm->instructions_sample_period); + if (ret) + return ret;
- /* Carry remaining instructions into next sample period */ - tidq->period_instructions = instrs_over; + offset += etm->instructions_sample_period; + tidq->period_instructions -= + etm->instructions_sample_period; + } }
if (etm->sample_branches) {
If an instruction range packet can generate multiple instruction samples, these samples share the same last branches; it's not necessary to copy the same last branches repeatedly for these samples within the same packet.
This patch moves out the last branches copying from function cs_etm__synth_instruction_sample(), and execute it prior to generating instruction samples.
Signed-off-by: Leo Yan leo.yan@linaro.org Reviewed-by: Mike Leach mike.leach@linaro.org Reviewed-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 4b7d6c36ce3c..aa4b6d060ebb 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1151,10 +1151,8 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
- if (etm->synth_opts.last_branch) { - cs_etm__copy_last_branch_rb(etmq, tidq); + if (etm->synth_opts.last_branch) sample.branch_stack = tidq->last_branch; - }
if (etm->synth_opts.inject) { ret = cs_etm__inject_event(event, &sample, @@ -1429,6 +1427,10 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, u64 offset = etm->instructions_sample_period - instrs_prev; u64 addr;
+ /* Prepare last branches for instruction sample */ + if (etm->synth_opts.last_branch) + cs_etm__copy_last_branch_rb(etmq, tidq); + while (tidq->period_instructions >= etm->instructions_sample_period) { /* @@ -1506,6 +1508,11 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
if (etmq->etm->synth_opts.last_branch && tidq->prev_packet->sample_type == CS_ETM_RANGE) { + u64 addr; + + /* Prepare last branches for instruction sample */ + cs_etm__copy_last_branch_rb(etmq, tidq); + /* * Generate a last branch event for the branches left in the * circular buffer at the end of the trace. @@ -1513,7 +1520,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq, * Use the address of the end of the last reported execution * range */ - u64 addr = cs_etm__last_executed_instr(tidq->prev_packet); + addr = cs_etm__last_executed_instr(tidq->prev_packet);
err = cs_etm__synth_instruction_sample( etmq, tidq, addr, @@ -1558,11 +1565,16 @@ static int cs_etm__end_block(struct cs_etm_queue *etmq, */ if (etmq->etm->synth_opts.last_branch && tidq->prev_packet->sample_type == CS_ETM_RANGE) { + u64 addr; + + /* Prepare last branches for instruction sample */ + cs_etm__copy_last_branch_rb(etmq, tidq); + /* * Use the address of the end of the last reported execution * range. */ - u64 addr = cs_etm__last_executed_instr(tidq->prev_packet); + addr = cs_etm__last_executed_instr(tidq->prev_packet);
err = cs_etm__synth_instruction_sample( etmq, tidq, addr,
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 695378b567df1fe6631c6684fcc9eeb4257df70f Gitweb: https://git.kernel.org/tip/695378b567df1fe6631c6684fcc9eeb4257df70f Author: Leo Yan leo.yan@linaro.org AuthorDate: Wed, 19 Feb 2020 10:18:10 +08:00 Committer: Arnaldo Carvalho de Melo acme@redhat.com CommitterDate: Wed, 11 Mar 2020 10:48:44 -03:00
perf cs-etm: Optimize copying last branches
If an instruction range packet can generate multiple instruction samples, these samples share the same last branches; it's not necessary to copy the same last branches repeatedly for these samples within the same packet.
This patch moves out the last branches copying from function cs_etm__synth_instruction_sample(), and execute it prior to generating instruction samples.
Signed-off-by: Leo Yan leo.yan@linaro.org Reviewed-by: Mathieu Poirier mathieu.poirier@linaro.org Reviewed-by: Mike Leach mike.leach@linaro.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Robert Walker robert.walker@arm.com Cc: Suzuki Poulouse suzuki.poulose@arm.com Cc: coresight ml coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200219021811.20067-5-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com --- tools/perf/util/cs-etm.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 1ddcc67..87d9943 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1151,10 +1151,8 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
- if (etm->synth_opts.last_branch) { - cs_etm__copy_last_branch_rb(etmq, tidq); + if (etm->synth_opts.last_branch) sample.branch_stack = tidq->last_branch; - }
if (etm->synth_opts.inject) { ret = cs_etm__inject_event(event, &sample, @@ -1431,6 +1429,10 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, u64 offset = etm->instructions_sample_period - instrs_prev; u64 addr;
+ /* Prepare last branches for instruction sample */ + if (etm->synth_opts.last_branch) + cs_etm__copy_last_branch_rb(etmq, tidq); + while (tidq->period_instructions >= etm->instructions_sample_period) { /* @@ -1508,6 +1510,11 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
if (etmq->etm->synth_opts.last_branch && tidq->prev_packet->sample_type == CS_ETM_RANGE) { + u64 addr; + + /* Prepare last branches for instruction sample */ + cs_etm__copy_last_branch_rb(etmq, tidq); + /* * Generate a last branch event for the branches left in the * circular buffer at the end of the trace. @@ -1515,7 +1522,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq, * Use the address of the end of the last reported execution * range */ - u64 addr = cs_etm__last_executed_instr(tidq->prev_packet); + addr = cs_etm__last_executed_instr(tidq->prev_packet);
err = cs_etm__synth_instruction_sample( etmq, tidq, addr, @@ -1560,11 +1567,16 @@ static int cs_etm__end_block(struct cs_etm_queue *etmq, */ if (etmq->etm->synth_opts.last_branch && tidq->prev_packet->sample_type == CS_ETM_RANGE) { + u64 addr; + + /* Prepare last branches for instruction sample */ + cs_etm__copy_last_branch_rb(etmq, tidq); + /* * Use the address of the end of the last reported execution * range. */ - u64 addr = cs_etm__last_executed_instr(tidq->prev_packet); + addr = cs_etm__last_executed_instr(tidq->prev_packet);
err = cs_etm__synth_instruction_sample( etmq, tidq, addr,
The variable 'offset' in function cs_etm__sample() is u64 type, it's not appropriate to check it with 'while (offset > 0)'; this patch changes to 'while (offset)'.
Signed-off-by: Leo Yan leo.yan@linaro.org Reviewed-by: Mike Leach mike.leach@linaro.org Reviewed-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index aa4b6d060ebb..bba969d48076 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -962,7 +962,7 @@ static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq, if (packet->isa == CS_ETM_ISA_T32) { u64 addr = packet->start_addr;
- while (offset > 0) { + while (offset) { addr += cs_etm__t32_instr_size(etmq, trace_chan_id, addr); offset--;
The following commit has been merged into the perf/core branch of tip:
Commit-ID: bc010dd657ee0309276c88ab828b9ad156f75b31 Gitweb: https://git.kernel.org/tip/bc010dd657ee0309276c88ab828b9ad156f75b31 Author: Leo Yan leo.yan@linaro.org AuthorDate: Wed, 19 Feb 2020 10:18:11 +08:00 Committer: Arnaldo Carvalho de Melo acme@redhat.com CommitterDate: Wed, 11 Mar 2020 10:48:44 -03:00
perf cs-etm: Fix unsigned variable comparison to zero
The variable 'offset' in function cs_etm__sample() is u64 type, it's not appropriate to check it with 'while (offset > 0)'; this patch changes to 'while (offset)'.
Signed-off-by: Leo Yan leo.yan@linaro.org Reviewed-by: Mathieu Poirier mathieu.poirier@linaro.org Reviewed-by: Mike Leach mike.leach@linaro.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Robert Walker robert.walker@arm.com Cc: Suzuki Poulouse suzuki.poulose@arm.com Cc: coresight ml coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200219021811.20067-6-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com --- tools/perf/util/cs-etm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 87d9943..62d2f9b 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -962,7 +962,7 @@ static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq, if (packet->isa == CS_ETM_ISA_T32) { u64 addr = packet->start_addr;
- while (offset > 0) { + while (offset) { addr += cs_etm__t32_instr_size(etmq, trace_chan_id, addr); offset--;
Hi Arnaldo,
On Wed, Feb 19, 2020 at 10:18:06AM +0800, Leo Yan wrote:
This patch series is to address issues for synthesizing instruction samples, especially when the instruction sample period is small enough, the current logic cannot synthesize multiple instruction samples within one instruction range packet.
Patch 0001 is to swap packets for instruction samples, so this allow option '--itrace=iNNN' can work well.
Patch 0002 avoids to reset the last branches for every instruction sample; if reset the last branches for every time generating sample, the later samples in the same range packet cannot use the last branches anymore.
Patch 0003 is the fixing for handling different instruction periods, especially for small sample period.
Patch 0004 is an optimization for copying last branches; it only copies last branches once if the instruction samples share the same last branches.
Patch 0005 is a minor fix for unsigned variable comparison to zero.
This patch set has been rebased on the latest perf/core branch; and verified on Juno board with below commands:
# perf script --itrace=i2 # perf script --itrace=i2il16 # perf inject --itrace=i2il16 -i perf.data -o perf.data.new # perf inject --itrace=i100il16 -i perf.data -o perf.data.new
Could you pick up this patch set? I confirmed this patch set can cleanly apply on top of the latest mainline kernel (5.6-rc5).
Or if you want me to resend this patch set, please feel free let me know. Thanks!
Leo
Changes from v4:
- Added Mike's review tag for patch 03;
- Added Mathieu's review tags for all patches.
Changes from v3:
- Refactored patch 0001 with new function cs_etm__packet_swap() (Mike);
- Refined instruction sample generation flow with single while loop, which completely uses Mike's suggestions (Mike);
- Added Mike's review tags for patch 01/02/04/05.
Changes from v2:
- Added patch 0001 which is to fix swapping packets for instruction samples;
- Refined minor commit logs and comments;
- Rebased on the latest perf/core branch.
Changes from v1:
- Rebased patch set on perf/core branch with latest commit 9fec3cd5fa4a ("perf map: Check if the map still has some refcounts on exit").
Leo Yan (5): perf cs-etm: Swap packets for instruction samples perf cs-etm: Continuously record last branch perf cs-etm: Correct synthesizing instruction samples perf cs-etm: Optimize copying last branches perf cs-etm: Fix unsigned variable comparison to zero
tools/perf/util/cs-etm.c | 157 +++++++++++++++++++++++++++------------ 1 file changed, 111 insertions(+), 46 deletions(-)
-- 2.17.1
Em Tue, Mar 10, 2020 at 01:43:05PM +0800, Leo Yan escreveu:
Hi Arnaldo,
On Wed, Feb 19, 2020 at 10:18:06AM +0800, Leo Yan wrote:
This patch series is to address issues for synthesizing instruction samples, especially when the instruction sample period is small enough, the current logic cannot synthesize multiple instruction samples within one instruction range packet.
Patch 0001 is to swap packets for instruction samples, so this allow option '--itrace=iNNN' can work well.
Patch 0002 avoids to reset the last branches for every instruction sample; if reset the last branches for every time generating sample, the later samples in the same range packet cannot use the last branches anymore.
Patch 0003 is the fixing for handling different instruction periods, especially for small sample period.
Patch 0004 is an optimization for copying last branches; it only copies last branches once if the instruction samples share the same last branches.
Patch 0005 is a minor fix for unsigned variable comparison to zero.
This patch set has been rebased on the latest perf/core branch; and verified on Juno board with below commands:
# perf script --itrace=i2 # perf script --itrace=i2il16 # perf inject --itrace=i2il16 -i perf.data -o perf.data.new # perf inject --itrace=i100il16 -i perf.data -o perf.data.new
Could you pick up this patch set? I confirmed this patch set can cleanly apply on top of the latest mainline kernel (5.6-rc5).
Or if you want me to resend this patch set, please feel free let me know. Thanks!
Thanks, all build tested on x86 and arm64 (with CORESIGHT=1, etc), applied.
- Arnaldo
On Tue, Mar 10, 2020 at 08:45:03AM -0300, Arnaldo Carvalho de Melo wrote:
[...]
Could you pick up this patch set? I confirmed this patch set can cleanly apply on top of the latest mainline kernel (5.6-rc5).
Or if you want me to resend this patch set, please feel free let me know. Thanks!
Thanks, all build tested on x86 and arm64 (with CORESIGHT=1, etc), applied.
Thank you, Arnaldo.
Leo