This patch series is to support for using 'perf script' for CoreSight trace disassembler, for this purpose this patch series adds a new python script to parse CoreSight tracing event and use command 'objdump' for disassembled lines, finally this can generate readable program execution flow for reviewing tracing data.
Patch 0001 is the prerequisite to add addr into sample dict, so this value can be used by python script to analyze instruction range.
Patch 0002 is to add python script for trace disassembler.
Patch 0003 is to add doc to explain python script usage and give example for it.
This patch series has been tested on Hikey (ARM64 octa CA53 cores). The script is expected to extend a bit to support ARM32 platform, but I am lacking ARM32 platform for testing on it, so firstly upstream to support ARM64 platform firstly.
You are very welcome to test the script in this patch series, your testing result and suggestion are very valuable to perfect this script to cover more cases, at the end we will have more confidence to upstream into mainline kernel.
Leo Yan (3): perf script python: Add addr into perf sample dict perf script python: Add script for CoreSight trace disassembler coresight: Document for CoreSight trace disassembler
Documentation/trace/coresight.txt | 52 ++++ tools/perf/scripts/python/arm-cs-trace-disasm.py | 324 +++++++++++++++++++++ .../util/scripting-engines/trace-event-python.c | 2 + 3 files changed, 378 insertions(+) create mode 100644 tools/perf/scripts/python/arm-cs-trace-disasm.py
ARM CoreSight auxtrace uses 'sample->addr' to record the target address for branch instructions, so the data of 'sample->addr' is required for tracing data analysis.
This commit collects data of 'sample->addr' into perf sample dict, finally can be used for python script for parsing event.
Signed-off-by: Leo Yan leo.yan@linaro.org --- tools/perf/util/scripting-engines/trace-event-python.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c index 10dd5fc..7f8afac 100644 --- a/tools/perf/util/scripting-engines/trace-event-python.c +++ b/tools/perf/util/scripting-engines/trace-event-python.c @@ -531,6 +531,8 @@ static PyObject *get_perf_sample_dict(struct perf_sample *sample, PyLong_FromUnsignedLongLong(sample->period)); pydict_set_item_string_decref(dict_sample, "phys_addr", PyLong_FromUnsignedLongLong(sample->phys_addr)); + pydict_set_item_string_decref(dict_sample, "addr", + PyLong_FromUnsignedLongLong(sample->addr)); set_sample_read_in_dict(dict_sample, sample, evsel); pydict_set_item_string_decref(dict, "sample", dict_sample);
This commit adds python script to parse CoreSight tracing event and use command 'objdump' for disassembled lines, finally we can generate readable program execution flow for reviewing tracing data.
The script receives CoreSight tracing packet with below format:
+------------+------------+------------+ packet(n): | addr | ip | cpu | +------------+------------+------------+ packet(n+1): | addr | ip | cpu | +------------+------------+------------+
packet::ip is the last address of current branch instruction and packet::addr presents the start address of the next coming branch instruction. So for one branch instruction which starts in packet(n), its execution flow starts from packet(n)::addr and it stops at packet(n+1)::ip. As results we need to combine the two continuous packets to generate the instruction range, this is the rationale for the script implementation:
[ sample(n)::addr .. sample(n+1)::ip ]
Except for the normal branch flow, this script also parses program flow for exception handling and for the exception level change flow from user space to kernel:
- We can easily decide it's an exception handling if the stop address is in kernel's vector table. For this case, we need to divide the program range into two parts, the first part in before exception handling, and the second part is for exception handling.
- For the program flow from user space to kernel, it might be a system call or a trap, so the script reads ahead 16 instructions to check if includes 'svc' instruction or not. If the prediction includes 'svc' instruction the script prints out the flow until 'svc' instruction, otherwise it only prints the instruction which causes trap.
Credits to Tor Jeremiassen who have written the script skeleton and provides the ideas for reading symbol file according to build-id, creating memory map for dso and basic packet handling. Mathieu Poirier contributed fixes for build-id and memory map bugs. The detailed development history for this script you can find from [1]. Based on Tor and Mathieu work, the script is enhanced to support exception handling and program flow from user space to kernel as described above. Another minor enhancement is to support for without build-id case, the script can parse kernel symbols with option '-k' for vmlinux file path.
[1] https://github.com/Linaro/perf-opencsd/commits/perf-opencsd-v4.15/tools/perf...
Co-authored-by: Tor Jeremiassen tor@ti.com Co-authored-by: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Leo Yan leo.yan@linaro.org --- tools/perf/scripts/python/arm-cs-trace-disasm.py | 324 +++++++++++++++++++++++ 1 file changed, 324 insertions(+) create mode 100644 tools/perf/scripts/python/arm-cs-trace-disasm.py
diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/perf/scripts/python/arm-cs-trace-disasm.py new file mode 100644 index 0000000..4f71ee6 --- /dev/null +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py @@ -0,0 +1,324 @@ +# arm-cs-trace-disasm.py: ARM CoreSight Trace Dump With Disassember +# SPDX-License-Identifier: GPL-2.0 +# +# Tor Jeremiassen tor@ti.com is original author who wrote script +# skeleton, Mathieu Poirier mathieu.poirier@linaro.org contributed +# fixes for build-id and memory map; Leo Yan leo.yan@linaro.org +# enhanced the packet parsing to support exception handling and the +# flow from user space to kernel switching. + +import os +import sys +import re +from subprocess import * +from optparse import OptionParser, make_option + +# Command line parsing + +option_list = [ + # formatting options for the bottom entry of the stack + make_option("-k", "--vmlinux", dest="vmlinux_name", + help="Set path to vmlinux file"), + make_option("-d", "--objdump", dest="objdump_name", + help="Set path to objdump executable file"), + make_option("-v", "--verbose", dest="verbose", + action="store_true", default=False, + help="Enable debugging log") +] + +parser = OptionParser(option_list=option_list) +(options, args) = parser.parse_args() + +if (options.objdump_name == None): + sys.exit("No objdump executable file specified - use -d or --objdump option") + +# Initialize global dicts and regular expression + +build_ids = dict() +mmaps = dict() +disasm_cache = dict() +cpu_data = dict() +disasm_re = re.compile("^\s*([0-9a-fA-F]+):") +disasm_func_re = re.compile("^\s*([0-9a-fA-F]+)\s<.*>:") +cache_size = 32*1024 +vectors_start = 0 +vectors_end = 0 +prev_cpu = -1 + +def parse_buildid(): + global build_ids + + buildid_regex = "([a-fA-f0-9]+)[ \t]([^ \n]+)" + buildid_re = re.compile(buildid_regex) + + results = check_output(["perf", "buildid-list"]).split('\n'); + for line in results: + m = buildid_re.search(line) + if (m == None): + continue; + + id_name = m.group(2) + id_num = m.group(1) + + if (id_name == "[kernel.kallsyms]") : + append = "/kallsyms" + elif (id_name == "[vdso]") : + append = "/vdso" + else: + append = "/elf" + + build_ids[id_name] = os.environ['PERF_BUILDID_DIR'] + \ + "/" + id_name + "/" + id_num + append; + # Replace duplicate slash chars to single slash char + build_ids[id_name] = build_ids[id_name].replace('//', '/', 1) + + if ((options.vmlinux_name == None) and ("[kernel.kallsyms]" in build_ids)): + print "kallsyms cannot be used to dump assembler" + + # Set vmlinux path to replace kallsyms file, if without buildid we still + # can use vmlinux to prase kernel symbols + if ((options.vmlinux_name != None)): + build_ids['[kernel.kallsyms]'] = options.vmlinux_name; + +def parse_mmap(): + global mmaps + + # Check mmap for PERF_RECORD_MMAP and PERF_RECORD_MMAP2 + mmap_regex = "PERF_RECORD_MMAP.* -?[0-9]+/[0-9]+: [(0x[0-9a-fA-F]+)((0x[0-9a-fA-F]+)).*:\s.*\s(\S*)" + mmap_re = re.compile(mmap_regex) + + results = check_output("perf script --show-mmap-events | fgrep PERF_RECORD_MMAP", shell=True).split('\n') + for line in results: + m = mmap_re.search(line) + if (m != None): + if (m.group(3) == '[kernel.kallsyms]_text'): + dso = '[kernel.kallsyms]' + else: + dso = m.group(3) + + start = int(m.group(1),0) + end = int(m.group(1),0) + int(m.group(2),0) + mmaps[dso] = [start, end] + +def parse_vectors(): + global vectors_start + global vectors_end + + if (options.vmlinux_name == None): + return + + vectors_regex = "([0-9a-fA-F]+)\sg.*.text\s*[0-9a-fA-F]+\s(vectors)" + vectors_re = re.compile(vectors_regex) + + fname = options.vmlinux_name + disasm = [ options.objdump_name, "-t", fname ] + disasm_output = check_output(disasm).split('\n') + + for line in disasm_output: + m = vectors_re.search(line) + if (m != None): + vectors_start = int("0x"+m.group(1), 0) + vectors_end = vectors_start + 0x800 + +def find_dso_mmap(addr): + global mmaps + + for key, value in mmaps.items(): + if (addr >= value[0] and addr < value[1]): + return key + + return None + +def read_disam(dso, start_addr, stop_addr): + global mmaps + global build_ids + + addr_range = start_addr + ":" + stop_addr; + + # Don't let the cache get too big, clear it when it hits max size + if (len(disasm_cache) > cache_size): + disasm_cache.clear(); + + try: + disasm_output = disasm_cache[addr_range]; + except: + try: + fname = build_ids[dso]; + except KeyError: + sys.exit("cannot find symbol file for " + dso) + + disasm = [ options.objdump_name, "-d", "-z", + "--start-address="+start_addr, + "--stop-address="+stop_addr, fname ] + + disasm_output = check_output(disasm).split('\n') + disasm_cache[addr_range] = disasm_output; + + return disasm_output + +def dump_disam(dso, start_addr, stop_addr, check_svc): + for line in read_disam(dso, start_addr, stop_addr): + m = disasm_func_re.search(line) + if (m != None): + print "\t",line + continue + + m = disasm_re.search(line) + if (m == None): + continue; + + print "\t",line + + if ((check_svc == True) and "svc" in line): + return + +def is_svc_call(dso, start_addr, stop_addr): + for line in read_disam(dso, start_addr, stop_addr): + m = disasm_re.search(line) + if (m == None): + continue; + + if "svc" in line: + return True + + return False + +def in_vector_table(addr): + global vectors_start + global vectors_end + + if (addr >= vectors_start and addr < vectors_end): + return True + + return False + +def is_exception_handling(start_addr, stop_addr): + if (in_vector_table(int(start_addr, 0)) == False and + in_vector_table(int(stop_addr, 0)) == True): + return True + + return False + +def dump_packet(sample): + print "Packet = { cpu: 0x%d addr: 0x%x phys_addr: 0x%x ip: 0x%x " \ + "pid: %d tid: %d period: %d time: %d }" % \ + (sample['cpu'], sample['addr'], sample['phys_addr'], \ + sample['ip'], sample['pid'], sample['tid'], \ + sample['period'], sample['time']) + +def trace_begin(): + print 'CoreSight Trace Data Assembler Dump (ARM64)' + parse_buildid() + parse_mmap() + parse_vectors() + +def trace_end(): + print 'End' + +def trace_unhandled(event_name, context, event_fields_dict): + print ' '.join(['%s=%s'%(k,str(v))for k,v in sorted(event_fields_dict.items())]) + +def process_event(param_dict): + global cache_size + global options + global prev_cpu + + sample = param_dict["sample"] + + if (options.verbose == True): + dump_packet(sample) + + # If period doesn't equal to 1, this packet is for instruction sample + # packet, we need drop this synthetic packet. + if (sample['period'] != 1): + print "Skip synthetic instruction sample" + return + + cpu = format(sample['cpu'], "d"); + + # Initialize CPU data if it's empty, and directly return back + # if this is the first tracing event for this CPU. + if (cpu_data.get(str(cpu) + 'addr') == None): + cpu_data[str(cpu) + 'addr'] = format(sample['addr'], "#x") + prev_cpu = cpu + return + + # The format for packet is: + # + # +------------+------------+------------+ + # sample_prev: | addr | ip | cpu | + # +------------+------------+------------+ + # sample_next: | addr | ip | cpu | + # +------------+------------+------------+ + # + # We need to combine the two continuous packets to get the instruction + # range for sample_prev::cpu: + # + # [ sample_prev::addr .. sample_next::ip ] + # + # For this purose, sample_prev::addr is stored into cpu_data structure + # and read back for 'start_addr' when the new packet comes, and we need + # to use sample_next::ip to calculate 'stop_addr', plusing extra 4 for + # 'stop_addr' is for the sake of objdump so the final assembler dump can + # include last instruction for sample_next::ip. + + start_addr = cpu_data[str(prev_cpu) + 'addr'] + stop_addr = format(sample['ip'] + 4, "#x") + + # Find dso for start_addr and stop_addr, if these two addresses have + # different dso that means there have switch between kernel space + # and user space. + + prev_dso = find_dso_mmap(int(start_addr, 0)) + next_dso = find_dso_mmap(int(stop_addr, 0)) + + # If cannot find dso so cannot dump assembler, bail out + if (prev_dso == None or next_dso == None): + print "Address range [ %s .. %s ]: dso is not found" % (start_addr, stop_addr) + prev_cpu = cpu + return + + if (prev_dso != next_dso): + + # If 'prev_dso' doesn't equal to 'next_dso', this might be below cases: + # - synchronous exception with 'svc' instruction; + # - synchronous exception and trap to kernel (data abort, etc); + # - asynchronous exception for interrupt. + # + # Check if the next 16 insts (16 insts x 4 bytes =64 bytes) have included + # 'svc' insctuction and for 'svc' instruction, we print out the flow from + # start address until to 'svc' instruction. + + stop_addr = format((int(start_addr, 0) + 64), "#x") + + is_svc = is_svc_call(prev_dso, start_addr, stop_addr) + + # Otherwise we only print single instruction. + if (is_svc == False): + stop_addr = format((int(start_addr, 0) + 4), "#x") + + dump_disam(prev_dso, start_addr, stop_addr, is_svc) + + elif (is_exception_handling(start_addr, stop_addr) == True): + + # If this flow is to trigger exception handling, we need divide + # to print the flow into two parts: + # - the flow before exception; + # - the flow for exception handling, every exception handling entry + # is aligned with 0x80. + + stop_addr = format((int(start_addr, 0) + 4), "#x") + + dump_disam(prev_dso, start_addr, stop_addr, False) + + start_addr = format(sample['ip'] & ~(0x80 - 1), "#x") + stop_addr = format(sample['ip'] + 4, "#x") + + dump_disam(prev_dso, start_addr, stop_addr, False) + + else: + # This is normal case to dump range [ start_addr .. stop_addr ] + dump_disam(prev_dso, start_addr, stop_addr, False) + + cpu_data[str(cpu) + 'addr'] = format(sample['addr'], "#x") + prev_cpu = cpu
Hi Leo,
I've taken some time to look at this an figure out why your solution is so different from the previous version by Tor. This is based on analysis of the code not a direct test, so I am happy to be proven wrong.
Unfortunately I have had to conclude that the branch sample "fixes" introduced by Rob have broken the ability of the current system to support full disassembly. "branches" to exceptions - e.g. IRQ do appear to being missed.
On 12 May 2018 at 06:56, Leo Yan leo.yan@linaro.org wrote:
This commit adds python script to parse CoreSight tracing event and use command 'objdump' for disassembled lines, finally we can generate readable program execution flow for reviewing tracing data.
The script receives CoreSight tracing packet with below format:
+------------+------------+------------+
packet(n): | addr | ip | cpu | +------------+------------+------------+ packet(n+1): | addr | ip | cpu | +------------+------------+------------+
packet::ip is the last address of current branch instruction and packet::addr presents the start address of the next coming branch instruction. So for one branch instruction which starts in packet(n), its execution flow starts from packet(n)::addr and it stops at packet(n+1)::ip. As results we need to combine the two continuous packets to generate the instruction range, this is the rationale for the script implementation:
[ sample(n)::addr .. sample(n+1)::ip ]
Except for the normal branch flow, this script also parses program flow for exception handling and for the exception level change flow from user space to kernel:
- We can easily decide it's an exception handling if the stop address is in kernel's vector table. For this case, we need to divide the program range into two parts, the first part in before exception handling, and the second part is for exception handling.
the trace contains explicit "exception" packets. The packet buffer within perf is marking these (cs_etm_decoder.c) but it does not appear that the information is being further used.
- For the program flow from user space to kernel, it might be a system call or a trap, so the script reads ahead 16 instructions to check if includes 'svc' instruction or not. If the prediction includes 'svc' instruction the script prints out the flow until 'svc' instruction, otherwise it only prints the instruction which causes trap.
I do not see how this can be safe.
e.g. tracing a sequence (addresses simplified for illustration:)
0x1000: <instr_block....> 0x1080: <instr_not_br> <irq> 0x1084: <instr_not_br> <instr_not_br> <instr_svc> <instr_block>
irq: 0x10001000: <instr_block> 0x10001060: <eret>
svc: 0x20001000: <instr_block> 0x20001100 <ret>
This will result in the following trace packets: TRACE_ON<context> RANGE<0x1000,0x1080, last instr type = not BR> // PE has executed this range EXCEPTION<IRQ, 0x1084> // PE takes an exception, preferred return address is 0x1084 RANGE<0x10001000,0x10001060, lst instr type = BR, taken> // run irq routine, take branch at end (eret) RETURN_FROM_EXCEPTION<> RANGE<0x1084,0x1088, last instr type = BR taken (svc)> RANGE<20001000,0x20001100, last instr type = br taken (ret)>
Best I can tell, looking at the current branch sample code in perf this results in:- 1) first range is ignored as start of trace (no previous packet + taken branch) 2) second range would not result in a branch either - previous packet did not have a taken branch at the end. 3) 3rd range would result in some branch from 0x10001060 to 0x1084
Furthermore, should the instruction @ 0x1080 be a branch that was taken, then the disassembler look-ahead heuristic above would still fail as the exeception is missed in favour of the svc which has not been hit yet,.
The "fix" that was undertaken for generating the branch samples is dropping information resulting in a complete trace disassembly to be impossible as far as I can tell.
Prior to the "fix", the branch packets contained the ranges in single packets - which is why the previous perf script could correctly decode trace. In this case the exception and start of code in the above example would not be missed.
In conclusion some adjustment to the existing C code is required to ensure complete information on none branch based flow changes is correctly passed to the script code.
Regards
Mike.
Credits to Tor Jeremiassen who have written the script skeleton and provides the ideas for reading symbol file according to build-id, creating memory map for dso and basic packet handling. Mathieu Poirier contributed fixes for build-id and memory map bugs. The detailed development history for this script you can find from [1]. Based on Tor and Mathieu work, the script is enhanced to support exception handling and program flow from user space to kernel as described above. Another minor enhancement is to support for without build-id case, the script can parse kernel symbols with option '-k' for vmlinux file path.
[1] https://github.com/Linaro/perf-opencsd/commits/perf-opencsd-v4.15/tools/perf...
Co-authored-by: Tor Jeremiassen tor@ti.com Co-authored-by: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Leo Yan leo.yan@linaro.org
tools/perf/scripts/python/arm-cs-trace-disasm.py | 324 +++++++++++++++++++++++ 1 file changed, 324 insertions(+) create mode 100644 tools/perf/scripts/python/arm-cs-trace-disasm.py
diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/perf/scripts/python/arm-cs-trace-disasm.py new file mode 100644 index 0000000..4f71ee6 --- /dev/null +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py @@ -0,0 +1,324 @@ +# arm-cs-trace-disasm.py: ARM CoreSight Trace Dump With Disassember +# SPDX-License-Identifier: GPL-2.0 +# +# Tor Jeremiassen tor@ti.com is original author who wrote script +# skeleton, Mathieu Poirier mathieu.poirier@linaro.org contributed +# fixes for build-id and memory map; Leo Yan leo.yan@linaro.org +# enhanced the packet parsing to support exception handling and the +# flow from user space to kernel switching.
+import os +import sys +import re +from subprocess import * +from optparse import OptionParser, make_option
+# Command line parsing
+option_list = [
# formatting options for the bottom entry of the stack
make_option("-k", "--vmlinux", dest="vmlinux_name",
help="Set path to vmlinux file"),
make_option("-d", "--objdump", dest="objdump_name",
help="Set path to objdump executable file"),
make_option("-v", "--verbose", dest="verbose",
action="store_true", default=False,
help="Enable debugging log")
+]
+parser = OptionParser(option_list=option_list) +(options, args) = parser.parse_args()
+if (options.objdump_name == None):
sys.exit("No objdump executable file specified - use -d or --objdump option")
+# Initialize global dicts and regular expression
+build_ids = dict() +mmaps = dict() +disasm_cache = dict() +cpu_data = dict() +disasm_re = re.compile("^\s*([0-9a-fA-F]+):") +disasm_func_re = re.compile("^\s*([0-9a-fA-F]+)\s<.*>:") +cache_size = 32*1024 +vectors_start = 0 +vectors_end = 0 +prev_cpu = -1
+def parse_buildid():
global build_ids
buildid_regex = "([a-fA-f0-9]+)[ \t]([^ \n]+)"
buildid_re = re.compile(buildid_regex)
results = check_output(["perf", "buildid-list"]).split('\n');
for line in results:
m = buildid_re.search(line)
if (m == None):
continue;
id_name = m.group(2)
id_num = m.group(1)
if (id_name == "[kernel.kallsyms]") :
append = "/kallsyms"
elif (id_name == "[vdso]") :
append = "/vdso"
else:
append = "/elf"
build_ids[id_name] = os.environ['PERF_BUILDID_DIR'] + \
"/" + id_name + "/" + id_num + append;
# Replace duplicate slash chars to single slash char
build_ids[id_name] = build_ids[id_name].replace('//', '/', 1)
if ((options.vmlinux_name == None) and ("[kernel.kallsyms]" in build_ids)):
print "kallsyms cannot be used to dump assembler"
# Set vmlinux path to replace kallsyms file, if without buildid we still
# can use vmlinux to prase kernel symbols
if ((options.vmlinux_name != None)):
build_ids['[kernel.kallsyms]'] = options.vmlinux_name;
+def parse_mmap():
global mmaps
# Check mmap for PERF_RECORD_MMAP and PERF_RECORD_MMAP2
mmap_regex = "PERF_RECORD_MMAP.* -?[0-9]+/[0-9]+: \[(0x[0-9a-fA-F]+)\((0x[0-9a-fA-F]+)\).*:\s.*\s(\S*)"
mmap_re = re.compile(mmap_regex)
results = check_output("perf script --show-mmap-events | fgrep PERF_RECORD_MMAP", shell=True).split('\n')
for line in results:
m = mmap_re.search(line)
if (m != None):
if (m.group(3) == '[kernel.kallsyms]_text'):
dso = '[kernel.kallsyms]'
else:
dso = m.group(3)
start = int(m.group(1),0)
end = int(m.group(1),0) + int(m.group(2),0)
mmaps[dso] = [start, end]
+def parse_vectors():
global vectors_start
global vectors_end
if (options.vmlinux_name == None):
return
vectors_regex = "([0-9a-fA-F]+)\sg.*\.text\s*[0-9a-fA-F]+\s(vectors)"
vectors_re = re.compile(vectors_regex)
fname = options.vmlinux_name
disasm = [ options.objdump_name, "-t", fname ]
disasm_output = check_output(disasm).split('\n')
for line in disasm_output:
m = vectors_re.search(line)
if (m != None):
vectors_start = int("0x"+m.group(1), 0)
vectors_end = vectors_start + 0x800
+def find_dso_mmap(addr):
global mmaps
for key, value in mmaps.items():
if (addr >= value[0] and addr < value[1]):
return key
return None
+def read_disam(dso, start_addr, stop_addr):
global mmaps
global build_ids
addr_range = start_addr + ":" + stop_addr;
# Don't let the cache get too big, clear it when it hits max size
if (len(disasm_cache) > cache_size):
disasm_cache.clear();
try:
disasm_output = disasm_cache[addr_range];
except:
try:
fname = build_ids[dso];
except KeyError:
sys.exit("cannot find symbol file for " + dso)
disasm = [ options.objdump_name, "-d", "-z",
"--start-address="+start_addr,
"--stop-address="+stop_addr, fname ]
disasm_output = check_output(disasm).split('\n')
disasm_cache[addr_range] = disasm_output;
return disasm_output
+def dump_disam(dso, start_addr, stop_addr, check_svc):
for line in read_disam(dso, start_addr, stop_addr):
m = disasm_func_re.search(line)
if (m != None):
print "\t",line
continue
m = disasm_re.search(line)
if (m == None):
continue;
print "\t",line
if ((check_svc == True) and "svc" in line):
return
+def is_svc_call(dso, start_addr, stop_addr):
for line in read_disam(dso, start_addr, stop_addr):
m = disasm_re.search(line)
if (m == None):
continue;
if "svc" in line:
return True
return False
+def in_vector_table(addr):
global vectors_start
global vectors_end
if (addr >= vectors_start and addr < vectors_end):
return True
return False
+def is_exception_handling(start_addr, stop_addr):
if (in_vector_table(int(start_addr, 0)) == False and
in_vector_table(int(stop_addr, 0)) == True):
return True
return False
+def dump_packet(sample):
print "Packet = { cpu: 0x%d addr: 0x%x phys_addr: 0x%x ip: 0x%x " \
"pid: %d tid: %d period: %d time: %d }" % \
(sample['cpu'], sample['addr'], sample['phys_addr'], \
sample['ip'], sample['pid'], sample['tid'], \
sample['period'], sample['time'])
+def trace_begin():
print 'CoreSight Trace Data Assembler Dump (ARM64)'
parse_buildid()
parse_mmap()
parse_vectors()
+def trace_end():
print 'End'
+def trace_unhandled(event_name, context, event_fields_dict):
print ' '.join(['%s=%s'%(k,str(v))for k,v in sorted(event_fields_dict.items())])
+def process_event(param_dict):
global cache_size
global options
global prev_cpu
sample = param_dict["sample"]
if (options.verbose == True):
dump_packet(sample)
# If period doesn't equal to 1, this packet is for instruction sample
# packet, we need drop this synthetic packet.
if (sample['period'] != 1):
print "Skip synthetic instruction sample"
return
cpu = format(sample['cpu'], "d");
# Initialize CPU data if it's empty, and directly return back
# if this is the first tracing event for this CPU.
if (cpu_data.get(str(cpu) + 'addr') == None):
cpu_data[str(cpu) + 'addr'] = format(sample['addr'], "#x")
prev_cpu = cpu
return
# The format for packet is:
#
# +------------+------------+------------+
# sample_prev: | addr | ip | cpu |
# +------------+------------+------------+
# sample_next: | addr | ip | cpu |
# +------------+------------+------------+
#
# We need to combine the two continuous packets to get the instruction
# range for sample_prev::cpu:
#
# [ sample_prev::addr .. sample_next::ip ]
#
# For this purose, sample_prev::addr is stored into cpu_data structure
# and read back for 'start_addr' when the new packet comes, and we need
# to use sample_next::ip to calculate 'stop_addr', plusing extra 4 for
# 'stop_addr' is for the sake of objdump so the final assembler dump can
# include last instruction for sample_next::ip.
start_addr = cpu_data[str(prev_cpu) + 'addr']
stop_addr = format(sample['ip'] + 4, "#x")
# Find dso for start_addr and stop_addr, if these two addresses have
# different dso that means there have switch between kernel space
# and user space.
prev_dso = find_dso_mmap(int(start_addr, 0))
next_dso = find_dso_mmap(int(stop_addr, 0))
# If cannot find dso so cannot dump assembler, bail out
if (prev_dso == None or next_dso == None):
print "Address range [ %s .. %s ]: dso is not found" % (start_addr, stop_addr)
prev_cpu = cpu
return
if (prev_dso != next_dso):
# If 'prev_dso' doesn't equal to 'next_dso', this might be below cases:
# - synchronous exception with 'svc' instruction;
# - synchronous exception and trap to kernel (data abort, etc);
# - asynchronous exception for interrupt.
#
# Check if the next 16 insts (16 insts x 4 bytes =64 bytes) have included
# 'svc' insctuction and for 'svc' instruction, we print out the flow from
# start address until to 'svc' instruction.
stop_addr = format((int(start_addr, 0) + 64), "#x")
is_svc = is_svc_call(prev_dso, start_addr, stop_addr)
# Otherwise we only print single instruction.
if (is_svc == False):
stop_addr = format((int(start_addr, 0) + 4), "#x")
dump_disam(prev_dso, start_addr, stop_addr, is_svc)
elif (is_exception_handling(start_addr, stop_addr) == True):
# If this flow is to trigger exception handling, we need divide
# to print the flow into two parts:
# - the flow before exception;
# - the flow for exception handling, every exception handling entry
# is aligned with 0x80.
stop_addr = format((int(start_addr, 0) + 4), "#x")
dump_disam(prev_dso, start_addr, stop_addr, False)
start_addr = format(sample['ip'] & ~(0x80 - 1), "#x")
stop_addr = format(sample['ip'] + 4, "#x")
dump_disam(prev_dso, start_addr, stop_addr, False)
else:
# This is normal case to dump range [ start_addr .. stop_addr ]
dump_disam(prev_dso, start_addr, stop_addr, False)
cpu_data[str(cpu) + 'addr'] = format(sample['addr'], "#x")
prev_cpu = cpu
-- 2.7.4
Hi Mike,
On Tue, May 15, 2018 at 04:37:01PM +0100, Mike Leach wrote:
Hi Leo,
I've taken some time to look at this an figure out why your solution is so different from the previous version by Tor. This is based on analysis of the code not a direct test, so I am happy to be proven wrong.
Thanks a lot for reviewing. Be honest, I went through your comments, but many of them are unclear for me, I believe I miss some important background info. So please see below inline questions I have.
Unfortunately I have had to conclude that the branch sample "fixes" introduced by Rob have broken the ability of the current system to support full disassembly. "branches" to exceptions - e.g. IRQ do appear to being missed.
So do you means after applying Rob's patches for branch sample "fixes" then we cannot have packets for "branches" to exceptions?
Does Rob's patches have been merged into mainline kernel? I searched in Coresight mailing list, but I don't find any related patch for this.
On 12 May 2018 at 06:56, Leo Yan leo.yan@linaro.org wrote:
This commit adds python script to parse CoreSight tracing event and use command 'objdump' for disassembled lines, finally we can generate readable program execution flow for reviewing tracing data.
The script receives CoreSight tracing packet with below format:
+------------+------------+------------+
packet(n): | addr | ip | cpu | +------------+------------+------------+ packet(n+1): | addr | ip | cpu | +------------+------------+------------+
packet::ip is the last address of current branch instruction and packet::addr presents the start address of the next coming branch instruction. So for one branch instruction which starts in packet(n), its execution flow starts from packet(n)::addr and it stops at packet(n+1)::ip. As results we need to combine the two continuous packets to generate the instruction range, this is the rationale for the script implementation:
[ sample(n)::addr .. sample(n+1)::ip ]
Except for the normal branch flow, this script also parses program flow for exception handling and for the exception level change flow from user space to kernel:
- We can easily decide it's an exception handling if the stop address is in kernel's vector table. For this case, we need to divide the program range into two parts, the first part in before exception handling, and the second part is for exception handling.
the trace contains explicit "exception" packets. The packet buffer within perf is marking these (cs_etm_decoder.c) but it does not appear that the information is being further used.
I never dig into the code in util/cs-etm-decoder/cs-etm-decoder.c, the python script receives packet info from file util/cs-etm.c, but I can confirm that the python script doesn't receive the explicit "exception" packets.
- For the program flow from user space to kernel, it might be a system call or a trap, so the script reads ahead 16 instructions to check if includes 'svc' instruction or not. If the prediction includes 'svc' instruction the script prints out the flow until 'svc' instruction, otherwise it only prints the instruction which causes trap.
I do not see how this can be safe.
e.g. tracing a sequence (addresses simplified for illustration:)
0x1000: <instr_block....> 0x1080:
<instr_not_br> <irq> 0x1084: <instr_not_br> <instr_not_br> <instr_svc> <instr_block>
irq: 0x10001000:
<instr_block> 0x10001060: <eret>
svc: 0x20001000:
<instr_block> 0x20001100 <ret>
This will result in the following trace packets: TRACE_ON<context> RANGE<0x1000,0x1080, last instr type = not BR> // PE has executed this range EXCEPTION<IRQ, 0x1084> // PE takes an exception, preferred return address is 0x1084 RANGE<0x10001000,0x10001060, lst instr type = BR, taken> // run irq routine, take branch at end (eret) RETURN_FROM_EXCEPTION<> RANGE<0x1084,0x1088, last instr type = BR taken (svc)> RANGE<20001000,0x20001100, last instr type = br taken (ret)>
Best I can tell, looking at the current branch sample code in perf this results in:-
- first range is ignored as start of trace (no previous packet + taken branch)
- second range would not result in a branch either - previous packet
did not have a taken branch at the end. 3) 3rd range would result in some branch from 0x10001060 to 0x1084
I need to take time to read code to match with this issue, but I did see some suspicious lost packets from disassembler flow.
Furthermore, should the instruction @ 0x1080 be a branch that was taken, then the disassembler look-ahead heuristic above would still fail as the exeception is missed in favour of the svc which has not been hit yet,.
Sorry, I don't understand for this point. Could you rephrase for this?
The "fix" that was undertaken for generating the branch samples is dropping information resulting in a complete trace disassembly to be impossible as far as I can tell.
Prior to the "fix", the branch packets contained the ranges in single packets - which is why the previous perf script could correctly decode trace. In this case the exception and start of code in the above example would not be missed.
Okay, so this is why Tor's code is much simple than mine :)
In conclusion some adjustment to the existing C code is required to ensure complete information on none branch based flow changes is correctly passed to the script code.
Thanks a lot for guidance. I need firstly get clear the issue you elaborate above, then then can fix it.
Do you know anyone else is working on this? I try to avoid duplicate works on this.
Thanks, Leo Yan
Credits to Tor Jeremiassen who have written the script skeleton and provides the ideas for reading symbol file according to build-id, creating memory map for dso and basic packet handling. Mathieu Poirier contributed fixes for build-id and memory map bugs. The detailed development history for this script you can find from [1]. Based on Tor and Mathieu work, the script is enhanced to support exception handling and program flow from user space to kernel as described above. Another minor enhancement is to support for without build-id case, the script can parse kernel symbols with option '-k' for vmlinux file path.
[1] https://github.com/Linaro/perf-opencsd/commits/perf-opencsd-v4.15/tools/perf...
Co-authored-by: Tor Jeremiassen tor@ti.com Co-authored-by: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Leo Yan leo.yan@linaro.org
tools/perf/scripts/python/arm-cs-trace-disasm.py | 324 +++++++++++++++++++++++ 1 file changed, 324 insertions(+) create mode 100644 tools/perf/scripts/python/arm-cs-trace-disasm.py
diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/perf/scripts/python/arm-cs-trace-disasm.py new file mode 100644 index 0000000..4f71ee6 --- /dev/null +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py @@ -0,0 +1,324 @@ +# arm-cs-trace-disasm.py: ARM CoreSight Trace Dump With Disassember +# SPDX-License-Identifier: GPL-2.0 +# +# Tor Jeremiassen tor@ti.com is original author who wrote script +# skeleton, Mathieu Poirier mathieu.poirier@linaro.org contributed +# fixes for build-id and memory map; Leo Yan leo.yan@linaro.org +# enhanced the packet parsing to support exception handling and the +# flow from user space to kernel switching.
+import os +import sys +import re +from subprocess import * +from optparse import OptionParser, make_option
+# Command line parsing
+option_list = [
# formatting options for the bottom entry of the stack
make_option("-k", "--vmlinux", dest="vmlinux_name",
help="Set path to vmlinux file"),
make_option("-d", "--objdump", dest="objdump_name",
help="Set path to objdump executable file"),
make_option("-v", "--verbose", dest="verbose",
action="store_true", default=False,
help="Enable debugging log")
+]
+parser = OptionParser(option_list=option_list) +(options, args) = parser.parse_args()
+if (options.objdump_name == None):
sys.exit("No objdump executable file specified - use -d or --objdump option")
+# Initialize global dicts and regular expression
+build_ids = dict() +mmaps = dict() +disasm_cache = dict() +cpu_data = dict() +disasm_re = re.compile("^\s*([0-9a-fA-F]+):") +disasm_func_re = re.compile("^\s*([0-9a-fA-F]+)\s<.*>:") +cache_size = 32*1024 +vectors_start = 0 +vectors_end = 0 +prev_cpu = -1
+def parse_buildid():
global build_ids
buildid_regex = "([a-fA-f0-9]+)[ \t]([^ \n]+)"
buildid_re = re.compile(buildid_regex)
results = check_output(["perf", "buildid-list"]).split('\n');
for line in results:
m = buildid_re.search(line)
if (m == None):
continue;
id_name = m.group(2)
id_num = m.group(1)
if (id_name == "[kernel.kallsyms]") :
append = "/kallsyms"
elif (id_name == "[vdso]") :
append = "/vdso"
else:
append = "/elf"
build_ids[id_name] = os.environ['PERF_BUILDID_DIR'] + \
"/" + id_name + "/" + id_num + append;
# Replace duplicate slash chars to single slash char
build_ids[id_name] = build_ids[id_name].replace('//', '/', 1)
if ((options.vmlinux_name == None) and ("[kernel.kallsyms]" in build_ids)):
print "kallsyms cannot be used to dump assembler"
# Set vmlinux path to replace kallsyms file, if without buildid we still
# can use vmlinux to prase kernel symbols
if ((options.vmlinux_name != None)):
build_ids['[kernel.kallsyms]'] = options.vmlinux_name;
+def parse_mmap():
global mmaps
# Check mmap for PERF_RECORD_MMAP and PERF_RECORD_MMAP2
mmap_regex = "PERF_RECORD_MMAP.* -?[0-9]+/[0-9]+: \[(0x[0-9a-fA-F]+)\((0x[0-9a-fA-F]+)\).*:\s.*\s(\S*)"
mmap_re = re.compile(mmap_regex)
results = check_output("perf script --show-mmap-events | fgrep PERF_RECORD_MMAP", shell=True).split('\n')
for line in results:
m = mmap_re.search(line)
if (m != None):
if (m.group(3) == '[kernel.kallsyms]_text'):
dso = '[kernel.kallsyms]'
else:
dso = m.group(3)
start = int(m.group(1),0)
end = int(m.group(1),0) + int(m.group(2),0)
mmaps[dso] = [start, end]
+def parse_vectors():
global vectors_start
global vectors_end
if (options.vmlinux_name == None):
return
vectors_regex = "([0-9a-fA-F]+)\sg.*\.text\s*[0-9a-fA-F]+\s(vectors)"
vectors_re = re.compile(vectors_regex)
fname = options.vmlinux_name
disasm = [ options.objdump_name, "-t", fname ]
disasm_output = check_output(disasm).split('\n')
for line in disasm_output:
m = vectors_re.search(line)
if (m != None):
vectors_start = int("0x"+m.group(1), 0)
vectors_end = vectors_start + 0x800
+def find_dso_mmap(addr):
global mmaps
for key, value in mmaps.items():
if (addr >= value[0] and addr < value[1]):
return key
return None
+def read_disam(dso, start_addr, stop_addr):
global mmaps
global build_ids
addr_range = start_addr + ":" + stop_addr;
# Don't let the cache get too big, clear it when it hits max size
if (len(disasm_cache) > cache_size):
disasm_cache.clear();
try:
disasm_output = disasm_cache[addr_range];
except:
try:
fname = build_ids[dso];
except KeyError:
sys.exit("cannot find symbol file for " + dso)
disasm = [ options.objdump_name, "-d", "-z",
"--start-address="+start_addr,
"--stop-address="+stop_addr, fname ]
disasm_output = check_output(disasm).split('\n')
disasm_cache[addr_range] = disasm_output;
return disasm_output
+def dump_disam(dso, start_addr, stop_addr, check_svc):
for line in read_disam(dso, start_addr, stop_addr):
m = disasm_func_re.search(line)
if (m != None):
print "\t",line
continue
m = disasm_re.search(line)
if (m == None):
continue;
print "\t",line
if ((check_svc == True) and "svc" in line):
return
+def is_svc_call(dso, start_addr, stop_addr):
for line in read_disam(dso, start_addr, stop_addr):
m = disasm_re.search(line)
if (m == None):
continue;
if "svc" in line:
return True
return False
+def in_vector_table(addr):
global vectors_start
global vectors_end
if (addr >= vectors_start and addr < vectors_end):
return True
return False
+def is_exception_handling(start_addr, stop_addr):
if (in_vector_table(int(start_addr, 0)) == False and
in_vector_table(int(stop_addr, 0)) == True):
return True
return False
+def dump_packet(sample):
print "Packet = { cpu: 0x%d addr: 0x%x phys_addr: 0x%x ip: 0x%x " \
"pid: %d tid: %d period: %d time: %d }" % \
(sample['cpu'], sample['addr'], sample['phys_addr'], \
sample['ip'], sample['pid'], sample['tid'], \
sample['period'], sample['time'])
+def trace_begin():
print 'CoreSight Trace Data Assembler Dump (ARM64)'
parse_buildid()
parse_mmap()
parse_vectors()
+def trace_end():
print 'End'
+def trace_unhandled(event_name, context, event_fields_dict):
print ' '.join(['%s=%s'%(k,str(v))for k,v in sorted(event_fields_dict.items())])
+def process_event(param_dict):
global cache_size
global options
global prev_cpu
sample = param_dict["sample"]
if (options.verbose == True):
dump_packet(sample)
# If period doesn't equal to 1, this packet is for instruction sample
# packet, we need drop this synthetic packet.
if (sample['period'] != 1):
print "Skip synthetic instruction sample"
return
cpu = format(sample['cpu'], "d");
# Initialize CPU data if it's empty, and directly return back
# if this is the first tracing event for this CPU.
if (cpu_data.get(str(cpu) + 'addr') == None):
cpu_data[str(cpu) + 'addr'] = format(sample['addr'], "#x")
prev_cpu = cpu
return
# The format for packet is:
#
# +------------+------------+------------+
# sample_prev: | addr | ip | cpu |
# +------------+------------+------------+
# sample_next: | addr | ip | cpu |
# +------------+------------+------------+
#
# We need to combine the two continuous packets to get the instruction
# range for sample_prev::cpu:
#
# [ sample_prev::addr .. sample_next::ip ]
#
# For this purose, sample_prev::addr is stored into cpu_data structure
# and read back for 'start_addr' when the new packet comes, and we need
# to use sample_next::ip to calculate 'stop_addr', plusing extra 4 for
# 'stop_addr' is for the sake of objdump so the final assembler dump can
# include last instruction for sample_next::ip.
start_addr = cpu_data[str(prev_cpu) + 'addr']
stop_addr = format(sample['ip'] + 4, "#x")
# Find dso for start_addr and stop_addr, if these two addresses have
# different dso that means there have switch between kernel space
# and user space.
prev_dso = find_dso_mmap(int(start_addr, 0))
next_dso = find_dso_mmap(int(stop_addr, 0))
# If cannot find dso so cannot dump assembler, bail out
if (prev_dso == None or next_dso == None):
print "Address range [ %s .. %s ]: dso is not found" % (start_addr, stop_addr)
prev_cpu = cpu
return
if (prev_dso != next_dso):
# If 'prev_dso' doesn't equal to 'next_dso', this might be below cases:
# - synchronous exception with 'svc' instruction;
# - synchronous exception and trap to kernel (data abort, etc);
# - asynchronous exception for interrupt.
#
# Check if the next 16 insts (16 insts x 4 bytes =64 bytes) have included
# 'svc' insctuction and for 'svc' instruction, we print out the flow from
# start address until to 'svc' instruction.
stop_addr = format((int(start_addr, 0) + 64), "#x")
is_svc = is_svc_call(prev_dso, start_addr, stop_addr)
# Otherwise we only print single instruction.
if (is_svc == False):
stop_addr = format((int(start_addr, 0) + 4), "#x")
dump_disam(prev_dso, start_addr, stop_addr, is_svc)
elif (is_exception_handling(start_addr, stop_addr) == True):
# If this flow is to trigger exception handling, we need divide
# to print the flow into two parts:
# - the flow before exception;
# - the flow for exception handling, every exception handling entry
# is aligned with 0x80.
stop_addr = format((int(start_addr, 0) + 4), "#x")
dump_disam(prev_dso, start_addr, stop_addr, False)
start_addr = format(sample['ip'] & ~(0x80 - 1), "#x")
stop_addr = format(sample['ip'] + 4, "#x")
dump_disam(prev_dso, start_addr, stop_addr, False)
else:
# This is normal case to dump range [ start_addr .. stop_addr ]
dump_disam(prev_dso, start_addr, stop_addr, False)
cpu_data[str(cpu) + 'addr'] = format(sample['addr'], "#x")
prev_cpu = cpu
-- 2.7.4
-- Mike Leach Principal Engineer, ARM Ltd. Blackburn Design Centre. UK
Hi Leo,
-----Original Message----- From: Leo Yan leo.yan@linaro.org Sent: 15 May 2018 17:41 To: Mike Leach mike.leach@linaro.org Cc: Robert Walker Robert.Walker@arm.com; Mathieu Poirier mathieu.poirier@linaro.org; Tor Jeremiassen tor@ti.com; Kim Phillips Kim.Phillips@arm.com; coresight@lists.linaro.org Subject: Re: [RFT 2/3] perf script python: Add script for CoreSight trace disassembler
Hi Mike,
On Tue, May 15, 2018 at 04:37:01PM +0100, Mike Leach wrote:
Hi Leo,
I've taken some time to look at this an figure out why your solution is so different from the previous version by Tor. This is based on analysis of the code not a direct test, so I am happy to be proven wrong.
Thanks a lot for reviewing. Be honest, I went through your comments, but many of them are unclear for me, I believe I miss some important background info. So please see below inline questions I have.
Unfortunately I have had to conclude that the branch sample "fixes" introduced by Rob have broken the ability of the current system to support full disassembly. "branches" to exceptions - e.g. IRQ do appear to being missed.
So do you means after applying Rob's patches for branch sample "fixes" then we cannot have packets for "branches" to exceptions?
Does Rob's patches have been merged into mainline kernel? I searched in Coresight mailing list, but I don't find any related patch for this.
The intention of my changes was to make the samples produced from CoreSight trace match the format of the samples produced for Intel PT and LBR, allowing CoreSight traces to be used with the AutoFDO tool for profile guided optimization. These changes have been merged into the upstream kernel for 4.17 - the initial rounds of review were done via the CoreSight mailing list, but I think the final patches went directly to the maintainer, main linux kernel mailing list and arm linux mailing list.
Before my changes, the "branch" samples emitted from CoreSight trace were effectively execution range samples - the from and to addresses were taken directly from the packet produced by the decoder. This meant that the samples were not correctly displayed by the perf report tool and did not work with other tools that consume the branch samples, such as the AutoFDO tool.
The code in cs-etm.c was changed to match the branch samples produced from Intel PT where the branch sample represents a discontinuity in execution flow. I appear to have forgotten to handle changes in program flow due to interrupts. I think the code in cs_etm__sample() that decides whether to emit a branch sample needs extending to look at whether the previous range ended because of an exception and not just because of a taken branch. We may also need to emit a sample at the start of trace to allow disassembly up to the first branch.
Regards
Rob
On 12 May 2018 at 06:56, Leo Yan leo.yan@linaro.org wrote:
This commit adds python script to parse CoreSight tracing event and use command 'objdump' for disassembled lines, finally we can generate readable program execution flow for reviewing tracing data.
The script receives CoreSight tracing packet with below format:
+------------+------------+------------+
packet(n): | addr | ip | cpu | +------------+------------+------------+ packet(n+1): | addr | ip | cpu | +------------+------------+------------+
packet::ip is the last address of current branch instruction and packet::addr presents the start address of the next coming branch instruction. So for one branch instruction which starts in packet(n), its execution flow starts from packet(n)::addr and it stops at packet(n+1)::ip. As results we need to combine the two continuous packets to generate the instruction range, this is the rationale for the script implementation:
[ sample(n)::addr .. sample(n+1)::ip ]
Except for the normal branch flow, this script also parses program flow for exception handling and for the exception level change flow from user space to kernel:
- We can easily decide it's an exception handling if the stop address is in kernel's vector table. For this case, we need to divide the program range into two parts, the first part in before exception handling, and the second part is for exception handling.
the trace contains explicit "exception" packets. The packet buffer within perf is marking these (cs_etm_decoder.c) but it does not appear that the information is being further used.
I never dig into the code in util/cs-etm-decoder/cs-etm-decoder.c, the python script receives packet info from file util/cs-etm.c, but I can confirm that the python script doesn't receive the explicit "exception" packets.
- For the program flow from user space to kernel, it might be a system call or a trap, so the script reads ahead 16 instructions to check if includes 'svc' instruction or not. If the prediction includes 'svc' instruction the script prints out the flow until 'svc' instruction, otherwise it only prints the instruction which causes trap.
I do not see how this can be safe.
e.g. tracing a sequence (addresses simplified for illustration:)
0x1000: <instr_block....> 0x1080:
<instr_not_br> <irq> 0x1084: <instr_not_br> <instr_not_br> <instr_svc> <instr_block>
irq: 0x10001000:
<instr_block> 0x10001060: <eret>
svc: 0x20001000:
<instr_block> 0x20001100 <ret>
This will result in the following trace packets: TRACE_ON<context> RANGE<0x1000,0x1080, last instr type = not BR> // PE has executed this range EXCEPTION<IRQ, 0x1084> // PE takes an exception, preferred return address is 0x1084 RANGE<0x10001000,0x10001060, lst instr type = BR, taken> // run irq routine, take branch at end (eret) RETURN_FROM_EXCEPTION<> RANGE<0x1084,0x1088, last instr type = BR taken (svc)> RANGE<20001000,0x20001100, last instr type = br taken (ret)>
Best I can tell, looking at the current branch sample code in perf this results in:-
- first range is ignored as start of trace (no previous packet +
taken branch) 2) second range would not result in a branch either - previous packet did not have a taken branch at the end. 3) 3rd range would result in some branch from 0x10001060 to 0x1084
I need to take time to read code to match with this issue, but I did see some suspicious lost packets from disassembler flow.
Furthermore, should the instruction @ 0x1080 be a branch that was taken, then the disassembler look-ahead heuristic above would still fail as the exeception is missed in favour of the svc which has not been hit yet,.
Sorry, I don't understand for this point. Could you rephrase for this?
The "fix" that was undertaken for generating the branch samples is dropping information resulting in a complete trace disassembly to be impossible as far as I can tell.
Prior to the "fix", the branch packets contained the ranges in single packets - which is why the previous perf script could correctly decode trace. In this case the exception and start of code in the above example would not be missed.
Okay, so this is why Tor's code is much simple than mine :)
In conclusion some adjustment to the existing C code is required to ensure complete information on none branch based flow changes is correctly passed to the script code.
Thanks a lot for guidance. I need firstly get clear the issue you elaborate above, then then can fix it.
Do you know anyone else is working on this? I try to avoid duplicate works on this.
Thanks, Leo Yan
Credits to Tor Jeremiassen who have written the script skeleton and provides the ideas for reading symbol file according to build-id, creating memory map for dso and basic packet handling. Mathieu Poirier contributed fixes for build-id and memory map bugs. The detailed development history for this script you can find from [1]. Based on Tor and Mathieu work, the script is enhanced to support exception handling and program flow from user space to kernel as described above. Another minor enhancement is to support for without build-id case, the script can parse kernel symbols with option '-k'
for vmlinux file path.
[1] https://github.com/Linaro/perf-opencsd/commits/perf-opencsd-
v4.15/to
ols/perf/scripts/python/cs-trace-disasm.py
Co-authored-by: Tor Jeremiassen tor@ti.com Co-authored-by: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Leo Yan leo.yan@linaro.org
tools/perf/scripts/python/arm-cs-trace-disasm.py | 324 +++++++++++++++++++++++ 1 file changed, 324 insertions(+) create mode 100644 tools/perf/scripts/python/arm-cs-trace-disasm.py
diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/perf/scripts/python/arm-cs-trace-disasm.py new file mode 100644 index 0000000..4f71ee6 --- /dev/null +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py @@ -0,0 +1,324 @@ +# arm-cs-trace-disasm.py: ARM CoreSight Trace Dump With
Disassember
+# SPDX-License-Identifier: GPL-2.0 # # Tor Jeremiassen tor@ti.com +is original author who wrote script # skeleton, Mathieu Poirier +mathieu.poirier@linaro.org contributed # fixes for build-id and +memory map; Leo Yan leo.yan@linaro.org # enhanced the packet +parsing to support exception handling and the # flow from user +space to kernel switching.
+import os +import sys +import re +from subprocess import * +from optparse import OptionParser, make_option
+# Command line parsing
+option_list = [
# formatting options for the bottom entry of the stack
make_option("-k", "--vmlinux", dest="vmlinux_name",
help="Set path to vmlinux file"),
make_option("-d", "--objdump", dest="objdump_name",
help="Set path to objdump executable file"),
make_option("-v", "--verbose", dest="verbose",
action="store_true", default=False,
help="Enable debugging log") ]
+parser = OptionParser(option_list=option_list) +(options, args) = parser.parse_args()
+if (options.objdump_name == None):
sys.exit("No objdump executable file specified - use -d or
+--objdump option")
+# Initialize global dicts and regular expression
+build_ids = dict() +mmaps = dict() +disasm_cache = dict() +cpu_data = dict() +disasm_re = re.compile("^\s*([0-9a-fA-F]+):") disasm_func_re = +re.compile("^\s*([0-9a-fA-F]+)\s<.*>:") +cache_size = 32*1024 +vectors_start = 0 +vectors_end = 0 +prev_cpu = -1
+def parse_buildid():
global build_ids
buildid_regex = "([a-fA-f0-9]+)[ \t]([^ \n]+)"
buildid_re = re.compile(buildid_regex)
results = check_output(["perf", "buildid-list"]).split('\n');
for line in results:
m = buildid_re.search(line)
if (m == None):
continue;
id_name = m.group(2)
id_num = m.group(1)
if (id_name == "[kernel.kallsyms]") :
append = "/kallsyms"
elif (id_name == "[vdso]") :
append = "/vdso"
else:
append = "/elf"
build_ids[id_name] = os.environ['PERF_BUILDID_DIR'] + \
"/" + id_name + "/" + id_num + append;
# Replace duplicate slash chars to single slash char
build_ids[id_name] =
- build_ids[id_name].replace('//', '/', 1)
if ((options.vmlinux_name == None) and ("[kernel.kallsyms]" in
build_ids)):
print "kallsyms cannot be used to dump assembler"
# Set vmlinux path to replace kallsyms file, if without buildid we still
# can use vmlinux to prase kernel symbols
if ((options.vmlinux_name != None)):
build_ids['[kernel.kallsyms]'] =
- options.vmlinux_name;
+def parse_mmap():
global mmaps
# Check mmap for PERF_RECORD_MMAP and
PERF_RECORD_MMAP2
mmap_regex = "PERF_RECORD_MMAP.* -?[0-9]+/[0-9]+: \[(0x[0-
9a-fA-F]+)((0x[0-9a-fA-F]+)).*:\s.*\s(\S*)"
mmap_re = re.compile(mmap_regex)
results = check_output("perf script --show-mmap-events | fgrep
PERF_RECORD_MMAP", shell=True).split('\n')
for line in results:
m = mmap_re.search(line)
if (m != None):
if (m.group(3) == '[kernel.kallsyms]_text'):
dso = '[kernel.kallsyms]'
else:
dso = m.group(3)
start = int(m.group(1),0)
end = int(m.group(1),0) + int(m.group(2),0)
mmaps[dso] = [start, end]
+def parse_vectors():
global vectors_start
global vectors_end
if (options.vmlinux_name == None):
return
vectors_regex = "([0-9a-fA-F]+)\sg.*\.text\s*[0-9a-fA-
F]+\s(vectors)"
vectors_re = re.compile(vectors_regex)
fname = options.vmlinux_name
disasm = [ options.objdump_name, "-t", fname ]
disasm_output = check_output(disasm).split('\n')
for line in disasm_output:
m = vectors_re.search(line)
if (m != None):
vectors_start = int("0x"+m.group(1), 0)
vectors_end = vectors_start + 0x800
+def find_dso_mmap(addr):
global mmaps
for key, value in mmaps.items():
if (addr >= value[0] and addr < value[1]):
return key
return None
+def read_disam(dso, start_addr, stop_addr):
global mmaps
global build_ids
addr_range = start_addr + ":" + stop_addr;
# Don't let the cache get too big, clear it when it hits max size
if (len(disasm_cache) > cache_size):
disasm_cache.clear();
try:
disasm_output = disasm_cache[addr_range];
except:
try:
fname = build_ids[dso];
except KeyError:
sys.exit("cannot find symbol file for " +
- dso)
disasm = [ options.objdump_name, "-d", "-z",
"--start-address="+start_addr,
"--stop-address="+stop_addr, fname ]
disasm_output = check_output(disasm).split('\n')
disasm_cache[addr_range] = disasm_output;
return disasm_output
+def dump_disam(dso, start_addr, stop_addr, check_svc):
for line in read_disam(dso, start_addr, stop_addr):
m = disasm_func_re.search(line)
if (m != None):
print "\t",line
continue
m = disasm_re.search(line)
if (m == None):
continue;
print "\t",line
if ((check_svc == True) and "svc" in line):
return
+def is_svc_call(dso, start_addr, stop_addr):
for line in read_disam(dso, start_addr, stop_addr):
m = disasm_re.search(line)
if (m == None):
continue;
if "svc" in line:
return True
return False
+def in_vector_table(addr):
global vectors_start
global vectors_end
if (addr >= vectors_start and addr < vectors_end):
return True
return False
+def is_exception_handling(start_addr, stop_addr):
if (in_vector_table(int(start_addr, 0)) == False and
in_vector_table(int(stop_addr, 0)) == True):
return True
return False
+def dump_packet(sample):
print "Packet = { cpu: 0x%d addr: 0x%x phys_addr: 0x%x ip: 0x%x " \
"pid: %d tid: %d period: %d time: %d }" % \
(sample['cpu'], sample['addr'], sample['phys_addr'], \
sample['ip'], sample['pid'], sample['tid'], \
sample['period'], sample['time'])
+def trace_begin():
print 'CoreSight Trace Data Assembler Dump (ARM64)'
parse_buildid()
parse_mmap()
parse_vectors()
+def trace_end():
print 'End'
+def trace_unhandled(event_name, context, event_fields_dict):
print ' '.join(['%s=%s'%(k,str(v))for k,v in
+sorted(event_fields_dict.items())])
+def process_event(param_dict):
global cache_size
global options
global prev_cpu
sample = param_dict["sample"]
if (options.verbose == True):
dump_packet(sample)
# If period doesn't equal to 1, this packet is for instruction sample
# packet, we need drop this synthetic packet.
if (sample['period'] != 1):
print "Skip synthetic instruction sample"
return
cpu = format(sample['cpu'], "d");
# Initialize CPU data if it's empty, and directly return back
# if this is the first tracing event for this CPU.
if (cpu_data.get(str(cpu) + 'addr') == None):
cpu_data[str(cpu) + 'addr'] = format(sample['addr'], "#x")
prev_cpu = cpu
return
# The format for packet is:
#
# +------------+------------+------------+
# sample_prev: | addr | ip | cpu |
# +------------+------------+------------+
# sample_next: | addr | ip | cpu |
# +------------+------------+------------+
#
# We need to combine the two continuous packets to get the
instruction
# range for sample_prev::cpu:
#
# [ sample_prev::addr .. sample_next::ip ]
#
# For this purose, sample_prev::addr is stored into cpu_data
structure
# and read back for 'start_addr' when the new packet comes, and
we need
# to use sample_next::ip to calculate 'stop_addr', plusing extra 4 for
# 'stop_addr' is for the sake of objdump so the final assembler dump
can
# include last instruction for sample_next::ip.
start_addr = cpu_data[str(prev_cpu) + 'addr']
stop_addr = format(sample['ip'] + 4, "#x")
# Find dso for start_addr and stop_addr, if these two addresses
have
# different dso that means there have switch between kernel space
# and user space.
prev_dso = find_dso_mmap(int(start_addr, 0))
next_dso = find_dso_mmap(int(stop_addr, 0))
# If cannot find dso so cannot dump assembler, bail out
if (prev_dso == None or next_dso == None):
print "Address range [ %s .. %s ]: dso is not found" % (start_addr,
stop_addr)
prev_cpu = cpu
return
if (prev_dso != next_dso):
# If 'prev_dso' doesn't equal to 'next_dso', this might be below
cases:
# - synchronous exception with 'svc' instruction;
# - synchronous exception and trap to kernel (data abort, etc);
# - asynchronous exception for interrupt.
#
# Check if the next 16 insts (16 insts x 4 bytes =64 bytes) have
included
# 'svc' insctuction and for 'svc' instruction, we print out the flow
from
# start address until to 'svc' instruction.
stop_addr = format((int(start_addr, 0) + 64), "#x")
is_svc = is_svc_call(prev_dso, start_addr,
- stop_addr)
# Otherwise we only print single instruction.
if (is_svc == False):
stop_addr = format((int(start_addr, 0) + 4),
- "#x")
dump_disam(prev_dso, start_addr, stop_addr, is_svc)
elif (is_exception_handling(start_addr, stop_addr) == True):
# If this flow is to trigger exception handling, we need divide
# to print the flow into two parts:
# - the flow before exception;
# - the flow for exception handling, every exception handling
entry
# is aligned with 0x80.
stop_addr = format((int(start_addr, 0) + 4), "#x")
dump_disam(prev_dso, start_addr, stop_addr, False)
start_addr = format(sample['ip'] & ~(0x80 - 1), "#x")
stop_addr = format(sample['ip'] + 4, "#x")
dump_disam(prev_dso, start_addr, stop_addr, False)
else:
# This is normal case to dump range [ start_addr .. stop_addr ]
dump_disam(prev_dso, start_addr, stop_addr, False)
cpu_data[str(cpu) + 'addr'] = format(sample['addr'], "#x")
prev_cpu = cpu
-- 2.7.4
-- Mike Leach Principal Engineer, ARM Ltd. Blackburn Design Centre. UK
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Leo
On 15 May 2018 at 17:41, Leo Yan leo.yan@linaro.org wrote:
Hi Mike,
On Tue, May 15, 2018 at 04:37:01PM +0100, Mike Leach wrote:
Hi Leo,
I've taken some time to look at this an figure out why your solution is so different from the previous version by Tor. This is based on analysis of the code not a direct test, so I am happy to be proven wrong.
Thanks a lot for reviewing. Be honest, I went through your comments, but many of them are unclear for me, I believe I miss some important background info. So please see below inline questions I have.
Unfortunately I have had to conclude that the branch sample "fixes" introduced by Rob have broken the ability of the current system to support full disassembly. "branches" to exceptions - e.g. IRQ do appear to being missed.
So do you means after applying Rob's patches for branch sample "fixes" then we cannot have packets for "branches" to exceptions?
These get missed by Robs current changes - hence you will see unexpected changes from application to kernel code.
Does Rob's patches have been merged into mainline kernel? I searched in Coresight mailing list, but I don't find any related patch for this.
As Rob has mentioned, these are looking to be included in the latest kernel.
On 12 May 2018 at 06:56, Leo Yan leo.yan@linaro.org wrote:
This commit adds python script to parse CoreSight tracing event and use command 'objdump' for disassembled lines, finally we can generate readable program execution flow for reviewing tracing data.
The script receives CoreSight tracing packet with below format:
+------------+------------+------------+
packet(n): | addr | ip | cpu | +------------+------------+------------+ packet(n+1): | addr | ip | cpu | +------------+------------+------------+
packet::ip is the last address of current branch instruction and packet::addr presents the start address of the next coming branch instruction. So for one branch instruction which starts in packet(n), its execution flow starts from packet(n)::addr and it stops at packet(n+1)::ip. As results we need to combine the two continuous packets to generate the instruction range, this is the rationale for the script implementation:
[ sample(n)::addr .. sample(n+1)::ip ]
Except for the normal branch flow, this script also parses program flow for exception handling and for the exception level change flow from user space to kernel:
- We can easily decide it's an exception handling if the stop address is in kernel's vector table. For this case, we need to divide the program range into two parts, the first part in before exception handling, and the second part is for exception handling.
the trace contains explicit "exception" packets. The packet buffer within perf is marking these (cs_etm_decoder.c) but it does not appear that the information is being further used.
I never dig into the code in util/cs-etm-decoder/cs-etm-decoder.c, the python script receives packet info from file util/cs-etm.c, but I can confirm that the python script doesn't receive the explicit "exception" packets.
This will need to change if that information is required for your script. I know that the information is passed through to the cs-etm* files in perf - but there may be more work required here.
- For the program flow from user space to kernel, it might be a system call or a trap, so the script reads ahead 16 instructions to check if includes 'svc' instruction or not. If the prediction includes 'svc' instruction the script prints out the flow until 'svc' instruction, otherwise it only prints the instruction which causes trap.
I do not see how this can be safe.
e.g. tracing a sequence (addresses simplified for illustration:)
0x1000: <instr_block....> 0x1080:
<instr_not_br> <irq> 0x1084: <instr_not_br> <instr_not_br> <instr_svc> <instr_block>
irq: 0x10001000:
<instr_block> 0x10001060: <eret>
svc: 0x20001000:
<instr_block> 0x20001100 <ret>
This will result in the following trace packets: TRACE_ON<context> RANGE<0x1000,0x1080, last instr type = not BR> // PE has executed this range EXCEPTION<IRQ, 0x1084> // PE takes an exception, preferred return address is 0x1084 RANGE<0x10001000,0x10001060, lst instr type = BR, taken> // run irq routine, take branch at end (eret) RETURN_FROM_EXCEPTION<> RANGE<0x1084,0x1088, last instr type = BR taken (svc)> RANGE<20001000,0x20001100, last instr type = br taken (ret)>
Best I can tell, looking at the current branch sample code in perf this results in:-
- first range is ignored as start of trace (no previous packet + taken branch)
- second range would not result in a branch either - previous packet
did not have a taken branch at the end. 3) 3rd range would result in some branch from 0x10001060 to 0x1084
I need to take time to read code to match with this issue, but I did see some suspicious lost packets from disassembler flow.
That is what I would expect in the currnent situation.
Furthermore, should the instruction @ 0x1080 be a branch that was taken, then the disassembler look-ahead heuristic above would still fail as the exeception is missed in favour of the svc which has not been hit yet,.
Sorry, I don't understand for this point. Could you rephrase for this?
Certainly. reading your script it appears that at present you are detecting a change from application to kernel by a change in dso. You are then idsassembling 16 instructions and trying to see if an SVC instruction is present - if it is you assume that it is the cause of the change. Otherwise you assume a trap/interrupt. This is unsafe. You could have a case where an interrupt occurs before a block containing multiple SVC instructions. You are also assuming that the SVC you find is actually being taken.
What appears to be happening is that you are trying to make up for the missing information by applying some algorithm to estimate the reason for the change.
There should be no need to do this - ETM trace has all the information that you need to create accurate trace disassembly. What is needed is code in perf to ensure that this is correctly passed though to the synthesized events processed by the script engine.
When this occurs two things can happen: 1) you will be able to simplify your scripts - removing the estimating algorithm etc. 2) they should then work extremely well in generating accurate trace disassembly.
The "fix" that was undertaken for generating the branch samples is dropping information resulting in a complete trace disassembly to be impossible as far as I can tell.
Prior to the "fix", the branch packets contained the ranges in single packets - which is why the previous perf script could correctly decode trace. In this case the exception and start of code in the above example would not be missed.
Okay, so this is why Tor's code is much simple than mine :)
In conclusion some adjustment to the existing C code is required to ensure complete information on none branch based flow changes is correctly passed to the script code.
Correct - Rob's mail makes this clear.
Thanks a lot for guidance. I need firstly get clear the issue you elaborate above, then then can fix it.
Do you know anyone else is working on this? I try to avoid duplicate works on this.
I am not sure if Rob will correct his changes - these work very well with AutoFDO but need adjusting for full trace decode.
Regards
Mike
Thanks, Leo Yan
Credits to Tor Jeremiassen who have written the script skeleton and provides the ideas for reading symbol file according to build-id, creating memory map for dso and basic packet handling. Mathieu Poirier contributed fixes for build-id and memory map bugs. The detailed development history for this script you can find from [1]. Based on Tor and Mathieu work, the script is enhanced to support exception handling and program flow from user space to kernel as described above. Another minor enhancement is to support for without build-id case, the script can parse kernel symbols with option '-k' for vmlinux file path.
[1] https://github.com/Linaro/perf-opencsd/commits/perf-opencsd-v4.15/tools/perf...
Co-authored-by: Tor Jeremiassen tor@ti.com Co-authored-by: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Leo Yan leo.yan@linaro.org
tools/perf/scripts/python/arm-cs-trace-disasm.py | 324 +++++++++++++++++++++++ 1 file changed, 324 insertions(+) create mode 100644 tools/perf/scripts/python/arm-cs-trace-disasm.py
diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/perf/scripts/python/arm-cs-trace-disasm.py new file mode 100644 index 0000000..4f71ee6 --- /dev/null +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py @@ -0,0 +1,324 @@ +# arm-cs-trace-disasm.py: ARM CoreSight Trace Dump With Disassember +# SPDX-License-Identifier: GPL-2.0 +# +# Tor Jeremiassen tor@ti.com is original author who wrote script +# skeleton, Mathieu Poirier mathieu.poirier@linaro.org contributed +# fixes for build-id and memory map; Leo Yan leo.yan@linaro.org +# enhanced the packet parsing to support exception handling and the +# flow from user space to kernel switching.
+import os +import sys +import re +from subprocess import * +from optparse import OptionParser, make_option
+# Command line parsing
+option_list = [
# formatting options for the bottom entry of the stack
make_option("-k", "--vmlinux", dest="vmlinux_name",
help="Set path to vmlinux file"),
make_option("-d", "--objdump", dest="objdump_name",
help="Set path to objdump executable file"),
make_option("-v", "--verbose", dest="verbose",
action="store_true", default=False,
help="Enable debugging log")
+]
+parser = OptionParser(option_list=option_list) +(options, args) = parser.parse_args()
+if (options.objdump_name == None):
sys.exit("No objdump executable file specified - use -d or --objdump option")
+# Initialize global dicts and regular expression
+build_ids = dict() +mmaps = dict() +disasm_cache = dict() +cpu_data = dict() +disasm_re = re.compile("^\s*([0-9a-fA-F]+):") +disasm_func_re = re.compile("^\s*([0-9a-fA-F]+)\s<.*>:") +cache_size = 32*1024 +vectors_start = 0 +vectors_end = 0 +prev_cpu = -1
+def parse_buildid():
global build_ids
buildid_regex = "([a-fA-f0-9]+)[ \t]([^ \n]+)"
buildid_re = re.compile(buildid_regex)
results = check_output(["perf", "buildid-list"]).split('\n');
for line in results:
m = buildid_re.search(line)
if (m == None):
continue;
id_name = m.group(2)
id_num = m.group(1)
if (id_name == "[kernel.kallsyms]") :
append = "/kallsyms"
elif (id_name == "[vdso]") :
append = "/vdso"
else:
append = "/elf"
build_ids[id_name] = os.environ['PERF_BUILDID_DIR'] + \
"/" + id_name + "/" + id_num + append;
# Replace duplicate slash chars to single slash char
build_ids[id_name] = build_ids[id_name].replace('//', '/', 1)
if ((options.vmlinux_name == None) and ("[kernel.kallsyms]" in build_ids)):
print "kallsyms cannot be used to dump assembler"
# Set vmlinux path to replace kallsyms file, if without buildid we still
# can use vmlinux to prase kernel symbols
if ((options.vmlinux_name != None)):
build_ids['[kernel.kallsyms]'] = options.vmlinux_name;
+def parse_mmap():
global mmaps
# Check mmap for PERF_RECORD_MMAP and PERF_RECORD_MMAP2
mmap_regex = "PERF_RECORD_MMAP.* -?[0-9]+/[0-9]+: \[(0x[0-9a-fA-F]+)\((0x[0-9a-fA-F]+)\).*:\s.*\s(\S*)"
mmap_re = re.compile(mmap_regex)
results = check_output("perf script --show-mmap-events | fgrep PERF_RECORD_MMAP", shell=True).split('\n')
for line in results:
m = mmap_re.search(line)
if (m != None):
if (m.group(3) == '[kernel.kallsyms]_text'):
dso = '[kernel.kallsyms]'
else:
dso = m.group(3)
start = int(m.group(1),0)
end = int(m.group(1),0) + int(m.group(2),0)
mmaps[dso] = [start, end]
+def parse_vectors():
global vectors_start
global vectors_end
if (options.vmlinux_name == None):
return
vectors_regex = "([0-9a-fA-F]+)\sg.*\.text\s*[0-9a-fA-F]+\s(vectors)"
vectors_re = re.compile(vectors_regex)
fname = options.vmlinux_name
disasm = [ options.objdump_name, "-t", fname ]
disasm_output = check_output(disasm).split('\n')
for line in disasm_output:
m = vectors_re.search(line)
if (m != None):
vectors_start = int("0x"+m.group(1), 0)
vectors_end = vectors_start + 0x800
+def find_dso_mmap(addr):
global mmaps
for key, value in mmaps.items():
if (addr >= value[0] and addr < value[1]):
return key
return None
+def read_disam(dso, start_addr, stop_addr):
global mmaps
global build_ids
addr_range = start_addr + ":" + stop_addr;
# Don't let the cache get too big, clear it when it hits max size
if (len(disasm_cache) > cache_size):
disasm_cache.clear();
try:
disasm_output = disasm_cache[addr_range];
except:
try:
fname = build_ids[dso];
except KeyError:
sys.exit("cannot find symbol file for " + dso)
disasm = [ options.objdump_name, "-d", "-z",
"--start-address="+start_addr,
"--stop-address="+stop_addr, fname ]
disasm_output = check_output(disasm).split('\n')
disasm_cache[addr_range] = disasm_output;
return disasm_output
+def dump_disam(dso, start_addr, stop_addr, check_svc):
for line in read_disam(dso, start_addr, stop_addr):
m = disasm_func_re.search(line)
if (m != None):
print "\t",line
continue
m = disasm_re.search(line)
if (m == None):
continue;
print "\t",line
if ((check_svc == True) and "svc" in line):
return
+def is_svc_call(dso, start_addr, stop_addr):
for line in read_disam(dso, start_addr, stop_addr):
m = disasm_re.search(line)
if (m == None):
continue;
if "svc" in line:
return True
return False
+def in_vector_table(addr):
global vectors_start
global vectors_end
if (addr >= vectors_start and addr < vectors_end):
return True
return False
+def is_exception_handling(start_addr, stop_addr):
if (in_vector_table(int(start_addr, 0)) == False and
in_vector_table(int(stop_addr, 0)) == True):
return True
return False
+def dump_packet(sample):
print "Packet = { cpu: 0x%d addr: 0x%x phys_addr: 0x%x ip: 0x%x " \
"pid: %d tid: %d period: %d time: %d }" % \
(sample['cpu'], sample['addr'], sample['phys_addr'], \
sample['ip'], sample['pid'], sample['tid'], \
sample['period'], sample['time'])
+def trace_begin():
print 'CoreSight Trace Data Assembler Dump (ARM64)'
parse_buildid()
parse_mmap()
parse_vectors()
+def trace_end():
print 'End'
+def trace_unhandled(event_name, context, event_fields_dict):
print ' '.join(['%s=%s'%(k,str(v))for k,v in sorted(event_fields_dict.items())])
+def process_event(param_dict):
global cache_size
global options
global prev_cpu
sample = param_dict["sample"]
if (options.verbose == True):
dump_packet(sample)
# If period doesn't equal to 1, this packet is for instruction sample
# packet, we need drop this synthetic packet.
if (sample['period'] != 1):
print "Skip synthetic instruction sample"
return
cpu = format(sample['cpu'], "d");
# Initialize CPU data if it's empty, and directly return back
# if this is the first tracing event for this CPU.
if (cpu_data.get(str(cpu) + 'addr') == None):
cpu_data[str(cpu) + 'addr'] = format(sample['addr'], "#x")
prev_cpu = cpu
return
# The format for packet is:
#
# +------------+------------+------------+
# sample_prev: | addr | ip | cpu |
# +------------+------------+------------+
# sample_next: | addr | ip | cpu |
# +------------+------------+------------+
#
# We need to combine the two continuous packets to get the instruction
# range for sample_prev::cpu:
#
# [ sample_prev::addr .. sample_next::ip ]
#
# For this purose, sample_prev::addr is stored into cpu_data structure
# and read back for 'start_addr' when the new packet comes, and we need
# to use sample_next::ip to calculate 'stop_addr', plusing extra 4 for
# 'stop_addr' is for the sake of objdump so the final assembler dump can
# include last instruction for sample_next::ip.
start_addr = cpu_data[str(prev_cpu) + 'addr']
stop_addr = format(sample['ip'] + 4, "#x")
# Find dso for start_addr and stop_addr, if these two addresses have
# different dso that means there have switch between kernel space
# and user space.
prev_dso = find_dso_mmap(int(start_addr, 0))
next_dso = find_dso_mmap(int(stop_addr, 0))
# If cannot find dso so cannot dump assembler, bail out
if (prev_dso == None or next_dso == None):
print "Address range [ %s .. %s ]: dso is not found" % (start_addr, stop_addr)
prev_cpu = cpu
return
if (prev_dso != next_dso):
# If 'prev_dso' doesn't equal to 'next_dso', this might be below cases:
# - synchronous exception with 'svc' instruction;
# - synchronous exception and trap to kernel (data abort, etc);
# - asynchronous exception for interrupt.
#
# Check if the next 16 insts (16 insts x 4 bytes =64 bytes) have included
# 'svc' insctuction and for 'svc' instruction, we print out the flow from
# start address until to 'svc' instruction.
stop_addr = format((int(start_addr, 0) + 64), "#x")
is_svc = is_svc_call(prev_dso, start_addr, stop_addr)
# Otherwise we only print single instruction.
if (is_svc == False):
stop_addr = format((int(start_addr, 0) + 4), "#x")
dump_disam(prev_dso, start_addr, stop_addr, is_svc)
elif (is_exception_handling(start_addr, stop_addr) == True):
# If this flow is to trigger exception handling, we need divide
# to print the flow into two parts:
# - the flow before exception;
# - the flow for exception handling, every exception handling entry
# is aligned with 0x80.
stop_addr = format((int(start_addr, 0) + 4), "#x")
dump_disam(prev_dso, start_addr, stop_addr, False)
start_addr = format(sample['ip'] & ~(0x80 - 1), "#x")
stop_addr = format(sample['ip'] + 4, "#x")
dump_disam(prev_dso, start_addr, stop_addr, False)
else:
# This is normal case to dump range [ start_addr .. stop_addr ]
dump_disam(prev_dso, start_addr, stop_addr, False)
cpu_data[str(cpu) + 'addr'] = format(sample['addr'], "#x")
prev_cpu = cpu
-- 2.7.4
-- Mike Leach Principal Engineer, ARM Ltd. Blackburn Design Centre. UK
On Wed, May 16, 2018 at 04:04:00PM +0100, Mike Leach wrote:
[...]
- For the program flow from user space to kernel, it might be a system call or a trap, so the script reads ahead 16 instructions to check if includes 'svc' instruction or not. If the prediction includes 'svc' instruction the script prints out the flow until 'svc' instruction, otherwise it only prints the instruction which causes trap.
I do not see how this can be safe.
e.g. tracing a sequence (addresses simplified for illustration:)
0x1000: <instr_block....> 0x1080:
<instr_not_br> <irq> 0x1084: <instr_not_br> <instr_not_br> <instr_svc> <instr_block>
irq: 0x10001000:
<instr_block> 0x10001060: <eret>
svc: 0x20001000:
<instr_block> 0x20001100 <ret>
This will result in the following trace packets: TRACE_ON<context> RANGE<0x1000,0x1080, last instr type = not BR> // PE has executed this range EXCEPTION<IRQ, 0x1084> // PE takes an exception, preferred return address is 0x1084 RANGE<0x10001000,0x10001060, lst instr type = BR, taken> // run irq routine, take branch at end (eret) RETURN_FROM_EXCEPTION<> RANGE<0x1084,0x1088, last instr type = BR taken (svc)> RANGE<20001000,0x20001100, last instr type = br taken (ret)>
Best I can tell, looking at the current branch sample code in perf this results in:-
- first range is ignored as start of trace (no previous packet + taken branch)
- second range would not result in a branch either - previous packet
did not have a taken branch at the end. 3) 3rd range would result in some branch from 0x10001060 to 0x1084
I need to take time to read code to match with this issue, but I did see some suspicious lost packets from disassembler flow.
That is what I would expect in the currnent situation.
Thanks a lot for your explanation, Rob & Mike.
Furthermore, should the instruction @ 0x1080 be a branch that was taken, then the disassembler look-ahead heuristic above would still fail as the exeception is missed in favour of the svc which has not been hit yet,.
Sorry, I don't understand for this point. Could you rephrase for this?
Certainly. reading your script it appears that at present you are detecting a change from application to kernel by a change in dso. You are then idsassembling 16 instructions and trying to see if an SVC instruction is present - if it is you assume that it is the cause of the change. Otherwise you assume a trap/interrupt. This is unsafe. You could have a case where an interrupt occurs before a block containing multiple SVC instructions. You are also assuming that the SVC you find is actually being taken.
Now understand and agree the svc prediction in my code is unsafe.
What appears to be happening is that you are trying to make up for the missing information by applying some algorithm to estimate the reason for the change.
There should be no need to do this - ETM trace has all the information that you need to create accurate trace disassembly. What is needed is code in perf to ensure that this is correctly passed though to the synthesized events processed by the script engine.
Yes, I did experiment to dump the packet info in function cs_etm_decoder__gen_trace_elem_printer() and all packets can be parsed in this function. But as you two suggested, the packet for start tracing and exception packet have been ignored in the function cs_etm__sample() so these packets will _not_ be passed to python script.
When this occurs two things can happen:
- you will be able to simplify your scripts - removing the estimating
algorithm etc. 2) they should then work extremely well in generating accurate trace disassembly.
Totally agree.
I did very little tweaking in the function cs_etm__sample() and now the python script can receive start packet and exception packet, so we can remove the estimating related code in python script.
Will send out new patch series soon for reviewing in today. Please review and if you have better idea for related fixing, I am very glad to test it and use your patch :) I am not sure my fixing can cover all cases.
Thanks a lot for suggestions.
[...]
Thanks, Leo Yan
This commit documents CoreSight trace disassembler usage and gives example for it.
Signed-off-by: Leo Yan leo.yan@linaro.org --- Documentation/trace/coresight.txt | 52 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+)
diff --git a/Documentation/trace/coresight.txt b/Documentation/trace/coresight.txt index 6f0120c..90d4fa2 100644 --- a/Documentation/trace/coresight.txt +++ b/Documentation/trace/coresight.txt @@ -381,3 +381,55 @@ sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tuto $ taskset -c 2 ./sort_autofdo Bubble sorting array of 30000 elements 5806 ms + + +Tracing data disassembler +------------------------- + +'perf script' supports to use script to parse tracing packet and rely on +'objdump' for disassembled lines, this can convert tracing data to readable +program execution flow for easily reviewing tracing data. + +The CoreSight trace disassembler is located in the folder: +tools/perf/scripts/python/arm-cs-trace-disasm.py. This script support below +options: + + -d, --objdump: Set path to objdump executable, this option is + mandatory. + -k, --vmlinux: Set path to vmlinux file. + -v, --verbose: Enable debugging log, after enable this option the + script dumps every event data. + +Below is one example for using python script to dump CoreSight trace +disassembler: + + $ perf script -s arm-cs-trace-disasm.py -i perf.data \ + -F cpu,event,ip,addr,sym -- -d objdump -k ./vmlinux > cs-disasm.log + +Below is one example for the disassembler log: + +CoreSight Trace Data Assembler Dump (ARM64) + ffff000008a5f2dc <etm4_enable_hw+0x344>: + ffff000008a5f2dc: 340000a0 cbz w0, ffff000008a5f2f0 <etm4_enable_hw+0x358> + ffff000008a5f2f0 <etm4_enable_hw+0x358>: + ffff000008a5f2f0: f9400260 ldr x0, [x19] + ffff000008a5f2f4: d5033f9f dsb sy + ffff000008a5f2f8: 913ec000 add x0, x0, #0xfb0 + ffff000008a5f2fc: b900001f str wzr, [x0] + ffff000008a5f300: f9400bf3 ldr x19, [sp, #16] + ffff000008a5f304: a8c27bfd ldp x29, x30, [sp], #32 + ffff000008a5f308: d65f03c0 ret + ffff000008a5fa18 <etm4_enable+0x1b0>: + ffff000008a5fa18: 14000025 b ffff000008a5faac <etm4_enable+0x244> + ffff000008a5faac <etm4_enable+0x244>: + ffff000008a5faac: b9406261 ldr w1, [x19, #96] + ffff000008a5fab0: 52800015 mov w21, #0x0 // #0 + ffff000008a5fab4: f901ca61 str x1, [x19, #912] + ffff000008a5fab8: 2a1503e0 mov w0, w21 + ffff000008a5fabc: 3940e261 ldrb w1, [x19, #56] + ffff000008a5fac0: f901ce61 str x1, [x19, #920] + ffff000008a5fac4: a94153f3 ldp x19, x20, [sp, #16] + ffff000008a5fac8: a9425bf5 ldp x21, x22, [sp, #32] + ffff000008a5facc: a94363f7 ldp x23, x24, [sp, #48] + ffff000008a5fad0: a8c47bfd ldp x29, x30, [sp], #64 + ffff000008a5fad4: d65f03c0 ret