Re: [PATCH 1/2] perf inject: correct recording of branch address and destination

25 May 2017


      On Wed, 24 May 2017 12:48:04 -0500
Sebastian Pop sebpop@gmail.com wrote:
...
On Wed, May 24, 2017 at 11:36 AM, Mathieu Poirier
mathieu.poirier@linaro.org wrote:
...
Are the instructions in the autoFDO section of the HOWTO.md on GitHub sufficient
to test this or there is another way?
Here is how I tested it: (supposing that perf.data contains an ETM trace)
# perf inject -i perf.data -o inj --itrace=il64 --strip
# perf report -i inj -D &> dump
and I inspected the addresses from the last branch stack in the output dump
with the addresses of the disassembled program from:
# objdump -d sort
Re-running the AutoFDO process with these two patches continue to make
the resultant executable perform worse, however:
$ taskset -c 2 ./sort-O3
Bubble sorting array of 30000 elements
5306 ms
$ taskset -c 2 ./sort-O3
Bubble sorting array of 30000 elements
5304 ms
$ taskset -c 2 ./sort-O3-autofdo 
Bubble sorting array of 30000 elements
5851 ms
$ taskset -c 2 ./sort-O3-autofdo 
Bubble sorting array of 30000 elements
5889 ms
$ taskset -c 2 ./sort-O3-autofdo 
Bubble sorting array of 30000 elements
5888 ms
$ taskset -c 2 ./sort-O3
Bubble sorting array of 30000 elements
5318 ms
The gcov file generated from the inj.data (no matter whether it's
--itrace=il64 or --itrace=i100usle) still looks wrong:
$ ~/git/autofdo/dump_gcov  -gcov_version=1 sort-O3.gcov 
sort_array total:19309128 head:0
  0: 0
  1: 0
  5: 0
  6: 0
  7.1: 0
  7.3: 0
  8.3: 0
  15: 2
  16: 2
  17: 2
  10: start total:0
    1: 0
  11: bubble_sort total:19309119
    2: 1566
    4: 6266668
    5: 6071341
    7: 6266668
    9: 702876
  12: stop total:3
    2: 0
    3: 1
    4: 1
    5: 1
main total:1 head:0
  0: 0
  2: 0
  4: 1
  1: cmd_line total:0
    3: 0
    4: 0
    5: 0
    6: 0
Whereas the one generated by intel-pt run looks correct, showing the
swap (11: bubble_sort 7,8) as executed less times:
kim@juno sort-etm$ ~/git/autofdo/dump_gcov  -gcov_version=1 ../sort-O3.gcov 
sort_array total:105658 head:0
  0: 0
  5: 0
  6: 0
  7.1: 0
  7.3: 0
  8.3: 0
  16: 0
  17: 0
  1: printf total:0
    2: 0
  10: start total:0
    1: 0
  11: bubble_sort total:105658
    2: 14
    4: 28740
    5: 28628
    7: 9768
    8: 9768
    9: 28740
  12: stop total:0
    2: 0
    3: 0
    4: 0
    5: printf total:0
      2: 0
  15: printf total:0
    2: 0
I have to run the 'perf inject' on the x86 host because of the
aforementioned:
0x350 [0x50]: failed to process type: 1
problem when trying to run it natively on the aarch64 target.
However, it doesn't matter whether I run the create_gcov - like so btw:
~/git/autofdo/create_gcov --binary=sort-O3 --profile=inj.data --gcov=sort-O3.gcov -gcov_version=1
on the x86 host or the aarch64 target:  I still get the same (negative
performance) results.
As Sebastian asked, if I take the intel-pt sourced inject
generated .gcov onto the target and rebuild sort, the performance
improves:
$ gcc -g -O3 -fauto-profile=../sort-O3.gcov ./sort.c -o ./sort-O3-autofdo
$ taskset -c 2 ./sort-O3
Bubble sorting array of 30000 elements
5309 ms
$ taskset -c 2 ./sort-O3
Bubble sorting array of 30000 elements
5310 ms
$ taskset -c 2 ./sort-O3-autofdo 
Bubble sorting array of 30000 elements
4443 ms
$ taskset -c 2 ./sort-O3-autofdo 
Bubble sorting array of 30000 elements
4443 ms
And if I take the ETM-generated gcov and use that to build a new x86_64
binary, it indeed performs worse on x86_64 also:
$ taskset -c 2 ./sort-O3
Bubble sorting array of 30000 elements
1502 ms
$ taskset -c 2 ./sort-O3
Bubble sorting array of 30000 elements
1500 ms
$ taskset -c 2 ./sort-O3
Bubble sorting array of 30000 elements
1501 ms
$ taskset -c 2 ./sort-O3-autofdo-etmgcov 
Bubble sorting array of 30000 elements
1907 ms
$ taskset -c 2 ./sort-O3-autofdo-etmgcov 
Bubble sorting array of 30000 elements
1893 ms
$ taskset -c 2 ./sort-O3-autofdo-etmgcov 
Bubble sorting array of 30000 elements
1907 ms
Kim

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [PATCH 1/2] perf inject: correct recording of branch address and destination