Home | History | Annotate | only in /system/extras/simpleperf
Up to higher level directory
NameDateSize
.gitignore06-Dec-20175
Android.mk06-Dec-201717K
AndroidTest.xml06-Dec-20171.6K
build_id.h06-Dec-20172.5K
callchain.h06-Dec-20175.1K
cmd_dumprecord.cpp06-Dec-20178K
cmd_dumprecord_test.cpp06-Dec-20171,017
cmd_help.cpp06-Dec-20172.7K
cmd_kmem.cpp06-Dec-201725.7K
cmd_kmem_test.cpp06-Dec-20174.6K
cmd_list.cpp06-Dec-20174.6K
cmd_list_test.cpp06-Dec-20171.1K
cmd_record.cpp06-Dec-201740.7K
cmd_record_test.cpp06-Dec-201716.6K
cmd_report.cpp06-Dec-201730.1K
cmd_report_sample.cpp06-Dec-201720.5K
cmd_report_sample_test.cpp06-Dec-20173.5K
cmd_report_test.cpp06-Dec-201716.7K
cmd_stat.cpp06-Dec-201723.3K
cmd_stat_test.cpp06-Dec-20175.4K
command.cpp06-Dec-20174.3K
command.h06-Dec-20172K
command_test.cpp06-Dec-20171.5K
cpu_hotplug_test.cpp06-Dec-201715.4K
demo/06-Dec-2017
dso.cpp06-Dec-201715.3K
dso.h06-Dec-20175.4K
dwarf_unwind.cpp06-Dec-20176.7K
dwarf_unwind.h06-Dec-20171K
environment.cpp06-Dec-201720.6K
environment.h06-Dec-20173K
environment_test.cpp06-Dec-20171.5K
event_attr.cpp06-Dec-20179.4K
event_attr.h06-Dec-20171.5K
event_fd.cpp06-Dec-20179.4K
event_fd.h06-Dec-20174.4K
event_selection_set.cpp06-Dec-201726.8K
event_selection_set.h06-Dec-20176.6K
event_type.cpp06-Dec-20176.8K
event_type.h06-Dec-20172.4K
event_type_table.h06-Dec-201714.6K
generate_event_type_table.py06-Dec-20179.3K
get_test_data.h06-Dec-20175.4K
gtest_main.cpp06-Dec-20176.8K
include/06-Dec-2017
inferno/06-Dec-2017
inferno.bat06-Dec-201728
inferno.sh06-Dec-201743
inplace_sampler_lib.cpp06-Dec-201710.5K
inplace_sampler_lib.h06-Dec-20172.1K
InplaceSamplerClient.cpp06-Dec-20176.1K
InplaceSamplerClient.h06-Dec-20171.8K
IOEventLoop.cpp06-Dec-20174.6K
IOEventLoop.h06-Dec-20172.8K
IOEventLoop_test.cpp06-Dec-20174.7K
main.cpp06-Dec-2017726
nonlinux_support/06-Dec-2017
NOTICE06-Dec-201710.4K
perf_clock.cpp06-Dec-20174.9K
perf_clock.h06-Dec-20171.1K
perf_event.h06-Dec-2017843
perf_regs.cpp06-Dec-20176.8K
perf_regs.h06-Dec-20172.7K
read_apk.cpp06-Dec-20176.6K
read_apk.h06-Dec-20173.5K
read_apk_test.cpp06-Dec-20172.6K
read_elf.cpp06-Dec-201717K
read_elf.h06-Dec-20173.1K
read_elf_test.cpp06-Dec-20175.8K
README.md06-Dec-201738.6K
record.cpp06-Dec-201733.4K
record.h06-Dec-201716.5K
record_equal_test.h06-Dec-20173.1K
record_file.h06-Dec-20176.6K
record_file_format.h06-Dec-20172.5K
record_file_reader.cpp06-Dec-201714.5K
record_file_test.cpp06-Dec-20176.5K
record_file_writer.cpp06-Dec-201712.4K
record_lib_interface.cpp06-Dec-20177.6K
record_lib_test.cpp06-Dec-20175.3K
record_test.cpp06-Dec-20175K
report_lib_interface.cpp06-Dec-201711.3K
report_sample.proto06-Dec-20172.1K
runtest/06-Dec-2017
sample_tree.h06-Dec-201712.2K
sample_tree_test.cpp06-Dec-20178.4K
SampleComparator.h06-Dec-20174K
SampleDisplayer.h06-Dec-20179K
scripts/06-Dec-2017
test_util.h06-Dec-20172K
testdata/06-Dec-2017
thread_tree.cpp06-Dec-201711.2K
thread_tree.h06-Dec-20175.4K
tracing.cpp06-Dec-201713.5K
tracing.h06-Dec-20172.2K
UnixSocket.cpp06-Dec-20177K
UnixSocket.h06-Dec-20177.9K
UnixSocket_test.cpp06-Dec-20177.1K
utils.cpp06-Dec-20178.8K
utils.h06-Dec-20174.3K
utils_test.cpp06-Dec-20172.4K
workload.cpp06-Dec-20176.2K
workload.h06-Dec-20172.4K
workload_test.cpp06-Dec-20172.4K

README.md

      1 # Simpleperf
      2 
      3 Simpleperf is a native profiling tool for Android. It can be used to profile
      4 both Android applications and native processes running on Android. It can
      5 profile both Java and C++ code on Android. It can be used on Android L
      6 and above.
      7 
      8 Simpleperf is part of the Android Open Source Project. The source code is [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/).
      9 The latest document is [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/README.md).
     10 Bugs and feature requests can be submitted at http://github.com/android-ndk/ndk/issues.
     11 
     12 
     13 ## Table of Contents
     14 
     15 - [Simpleperf introduction](#simpleperf-introduction)
     16     - [Why simpleperf](#why-simpleperf)
     17     - [Tools in simpleperf](#tools-in-simpleperf)
     18     - [Simpleperf's profiling principle](#simpleperfs-profiling-principle)
     19     - [Main simpleperf commands](#main-simpleperf-commands)
     20         - [Simpleperf list](#simpleperf-list)
     21         - [Simpleperf stat](#simpleperf-stat)
     22         - [Simpleperf record](#simpleperf-record)
     23         - [Simpleperf report](#simpleperf-report)
     24 - [Android application profiling](#android-application-profiling)
     25     - [Prepare an Android application](#prepare-an-android-application)
     26     - [Record and report profiling data (using command-lines)](#record-and-report-profiling-data-using-commandlines)
     27     - [Record and report profiling data (using python scripts)](#record-and-report-profiling-data-using-python-scripts)
     28     - [Record and report call graph](#record-and-report-call-graph)
     29     - [Visualize profiling data](#visualize-profiling-data)
     30     - [Annotate source code](#annotate-source-code)
     31 - [Answers to common issues](#answers-to-common-issues)
     32     - [The correct way to pull perf.data on host](#the-correct-way-to-pull-perfdata-on-host)
     33 
     34 ## Simpleperf introduction
     35 
     36 ### Why simpleperf
     37 
     38 Simpleperf works similar to linux-tools-perf, but it has some specific features for
     39 Android profiling:
     40 
     41 1. Aware of Android environment
     42 
     43     a. It can profile embedded shared libraries in apk.
     44 
     45     b. It reads symbols and debug information from .gnu_debugdata section.
     46 
     47     c. It gives suggestions when errors occur.
     48 
     49     d. When recording with -g option, unwind the stack before writting to file to
     50     save storage space.
     51 
     52     e. It supports adding additional information (like symbols) in perf.data, to
     53     support recording on device and reporting on host.
     54 
     55 2. Using python scripts for profiling tasks
     56 
     57 3. Easy to release
     58 
     59     a. Simpleperf executables on device are built as static binaries. They can be
     60     pushed on any Android device and run.
     61 
     62     b. Simpleperf executables on host are built as static binaries, and support
     63     different hosts: mac, linux and windows.
     64 
     65 
     66 ### Tools in simpleperf
     67 
     68 Simpleperf is periodically released with Android ndk, located at `simpleperf/`.
     69 The latest release can be found [here](https://android.googlesource.com/platform/prebuilts/simpleperf/).
     70 Simpleperf tools contain executables, shared libraries and python scripts.
     71 
     72 **Simpleperf executables running on Android device**
     73 Simpleperf executables running on Android device are located at `bin/android/`.
     74 Each architecture has one executable, like `bin/android/arm64/simpleperf`. It
     75 can record and report profiling data. It provides a command-line interface
     76 broadly the same as the linux-tools perf, and also supports some additional
     77 features for Android-specific profiling.
     78 
     79 **Simpleperf executables running on hosts**
     80 Simpleperf executables running on hosts are located at `bin/darwin`, `bin/linux`
     81 and `bin/windows`. Each host and architecture has one executable, like
     82 `bin/linux/x86_64/simpleperf`. It provides a command-line interface for
     83 reporting profiling data on hosts.
     84 
     85 **Simpleperf report shared libraries used on host**
     86 Simpleperf report shared libraries used on host are located at `bin/darwin`,
     87 `bin/linux` and `bin/windows`. Each host and architecture has one library, like
     88 `bin/linux/x86_64/libsimpleperf_report.so`. It is a library for parsing
     89 profiling data.
     90 
     91 **Python scripts**
     92 Python scripts are written to help different profiling tasks.
     93 
     94 `annotate.py` is used to annotate source files based on profiling data.
     95 
     96 `app_profiler.py` is used to profile Android applications.
     97 
     98 `binary_cache_builder.py` is used to pull libraries from Android devices.
     99 
    100 `pprof_proto_generator.py` is used to convert profiling data to format used by pprof.
    101 
    102 `report.py` is used to provide a GUI interface to report profiling result.
    103 
    104 `report_sample.py` is used to generate flamegraph.
    105 
    106 `simpleperf_report_lib.py` provides a python interface for parsing profiling data.
    107 
    108 
    109 ### Simpleperf's profiling principle
    110 
    111 Modern CPUs have a hardware component called the performance monitoring unit
    112 (PMU). The PMU has several hardware counters, counting events like how many cpu
    113 cycles have happened, how many instructions have executed, or how many cache
    114 misses have happened.
    115 
    116 The Linux kernel wraps these hardware counters into hardware perf events. In
    117 addition, the Linux kernel also provides hardware independent software events
    118 and tracepoint events. The Linux kernel exposes all this to userspace via the
    119 perf_event_open system call, which simpleperf uses.
    120 
    121 Simpleperf has three main functions: stat, record and report.
    122 
    123 The stat command gives a summary of how many events have happened in the
    124 profiled processes in a time period. Heres how it works:
    125 1. Given user options, simpleperf enables profiling by making a system call to
    126 linux kernel.
    127 2. Linux kernel enables counters while scheduling on the profiled processes.
    128 3. After profiling, simpleperf reads counters from linux kernel, and reports a
    129 counter summary.
    130 
    131 The record command records samples of the profiled process in a time period.
    132 Heres how it works:
    133 1. Given user options, simpleperf enables profiling by making a system call to
    134 linux kernel.
    135 2. Simpleperf creates mapped buffers between simpleperf and linux kernel.
    136 3. Linux kernel enable counters while scheduling on the profiled processes.
    137 4. Each time a given number of events happen, linux kernel dumps a sample to a
    138 mapped buffer.
    139 5. Simpleperf reads samples from the mapped buffers and generates perf.data.
    140 
    141 The report command reads a "perf.data" file and any shared libraries used by
    142 the profiled processes, and outputs a report showing where the time was spent.
    143 
    144 
    145 ### Main simpleperf commands
    146 
    147 Simpleperf supports several subcommands, including list, stat, record and report.
    148 Each subcommand supports different options. This section only covers the most
    149 important subcommands and options. To see all subcommands and options,
    150 use --help.
    151 
    152     # List all subcommands.
    153     $ simpleperf --help
    154 
    155     # Print help message for record subcommand.
    156     $ simpleperf record --help
    157 
    158 
    159 #### Simpleperf list
    160 
    161 simpleperf list is used to list all events available on the device. Different
    162 devices may support different events because of differences in hardware and
    163 kernel.
    164 
    165     $ simpleperf list
    166     List of hw-cache events:
    167       branch-loads
    168       ...
    169     List of hardware events:
    170       cpu-cycles
    171       instructions
    172       ...
    173     List of software events:
    174       cpu-clock
    175       task-clock
    176       ...
    177 
    178 
    179 #### Simpleperf stat
    180 
    181 simpleperf stat is used to get a raw event counter information of the profiled program
    182 or system-wide. By passing options, we can select which events to use, which
    183 processes/threads to monitor, how long to monitor and the print interval.
    184 Below is an example.
    185 
    186     # Stat using default events (cpu-cycles,instructions,...), and monitor
    187     # process 7394 for 10 seconds.
    188     $ simpleperf stat -p 7394 --duration 10
    189     Performance counter statistics:
    190 
    191      1,320,496,145  cpu-cycles         # 0.131736 GHz                     (100%)
    192        510,426,028  instructions       # 2.587047 cycles per instruction  (100%)
    193          4,692,338  branch-misses      # 468.118 K/sec                    (100%)
    194     886.008130(ms)  task-clock         # 0.088390 cpus used               (100%)
    195                753  context-switches   # 75.121 /sec                      (100%)
    196                870  page-faults        # 86.793 /sec                      (100%)
    197 
    198     Total test time: 10.023829 seconds.
    199 
    200 **Select events**
    201 We can select which events to use via -e option. Below are examples:
    202 
    203     # Stat event cpu-cycles.
    204     $ simpleperf stat -e cpu-cycles -p 11904 --duration 10
    205 
    206     # Stat event cache-references and cache-misses.
    207     $ simpleperf stat -e cache-references,cache-misses -p 11904 --duration 10
    208 
    209 When running the stat command, if the number of hardware events is larger than
    210 the number of hardware counters available in the PMU, the kernel shares hardware
    211 counters between events, so each event is only monitored for part of the total
    212 time. In the example below, there is a percentage at the end of each row,
    213 showing the percentage of the total time that each event was actually monitored.
    214 
    215     # Stat using event cache-references, cache-references:u,....
    216     $ simpleperf stat -p 7394 -e     cache-references,cache-references:u,cache-references:k,cache-misses,cache-misses:u,cache-misses:k,instructions --duration 1
    217     Performance counter statistics:
    218 
    219     4,331,018  cache-references     # 4.861 M/sec    (87%)
    220     3,064,089  cache-references:u   # 3.439 M/sec    (87%)
    221     1,364,959  cache-references:k   # 1.532 M/sec    (87%)
    222        91,721  cache-misses         # 102.918 K/sec  (87%)
    223        45,735  cache-misses:u       # 51.327 K/sec   (87%)
    224        38,447  cache-misses:k       # 43.131 K/sec   (87%)
    225     9,688,515  instructions         # 10.561 M/sec   (89%)
    226 
    227     Total test time: 1.026802 seconds.
    228 
    229 In the example above, each event is monitored about 87% of the total time. But
    230 there is no guarantee that any pair of events are always monitored at the same
    231 time. If we want to have some events monitored at the same time, we can use
    232 --group option. Below is an example.
    233 
    234     # Stat using event cache-references, cache-references:u,....
    235     $ simpleperf stat -p 7394 --group cache-references,cache-misses --group cache-references:u,cache-misses:u --group cache-references:k,cache-misses:k -e instructions --duration 1
    236     Performance counter statistics:
    237 
    238     3,638,900  cache-references     # 4.786 M/sec          (74%)
    239        65,171  cache-misses         # 1.790953% miss rate  (74%)
    240     2,390,433  cache-references:u   # 3.153 M/sec          (74%)
    241        32,280  cache-misses:u       # 1.350383% miss rate  (74%)
    242       879,035  cache-references:k   # 1.251 M/sec          (68%)
    243        30,303  cache-misses:k       # 3.447303% miss rate  (68%)
    244     8,921,161  instructions         # 10.070 M/sec         (86%)
    245 
    246     Total test time: 1.029843 seconds.
    247 
    248 **Select target to monitor**
    249 We can select which processes or threads to monitor via -p option or -t option.
    250 Monitoring a process is the same as monitoring all threads in the process.
    251 Simpleperf can also fork a child process to run the new command and then monitor
    252 the child process. Below are examples.
    253 
    254     # Stat process 11904 and 11905.
    255     $ simpleperf stat -p 11904,11905 --duration 10
    256 
    257     # Stat thread 11904 and 11905.
    258     $ simpleperf stat -t 11904,11905 --duration 10
    259 
    260     # Start a child process running `ls`, and stat it.
    261     $ simpleperf stat ls
    262 
    263 **Decide how long to monitor**
    264 When monitoring existing threads, we can use --duration option to decide how long
    265 to monitor. When monitoring a child process running a new command, simpleperf
    266 monitors until the child process ends. In this case, we can use Ctrl-C to stop monitoring
    267 at any time. Below are examples.
    268 
    269     # Stat process 11904 for 10 seconds.
    270     $ simpleperf stat -p 11904 --duration 10
    271 
    272     # Stat until the child process running `ls` finishes.
    273     $ simpleperf stat ls
    274 
    275     # Stop monitoring using Ctrl-C.
    276     $ simpleperf stat -p 11904 --duration 10
    277     ^C
    278 
    279 **Decide the print interval**
    280 When monitoring perf counters, we can also use --interval option to decide the print
    281 interval. Below are examples.
    282 
    283     # Print stat for process 11904 every 300ms.
    284     $ simpleperf stat -p 11904 --duration 10 --interval 300
    285 
    286     # Print system wide stat at interval of 300ms for 10 seconds (rooted device only).
    287     # system wide profiling needs root privilege
    288     $ su 0 simpleperf stat -a --duration 10 --interval 300
    289 
    290 **Display counters in systrace**
    291 simpleperf can also work with systrace to dump counters in the collected trace.
    292 Below is an example to do a system wide stat
    293 
    294     # capture instructions (kernel only) and cache misses with interval of 300 milliseconds for 15 seconds
    295     $ su 0 simpleperf stat -e instructions:k,cache-misses -a --interval 300 --duration 15
    296     # on host launch systrace to collect trace for 10 seconds
    297     (HOST)$ external/chromium-trace/systrace.py --time=10 -o new.html sched gfx view
    298     # open the collected new.html in browser and perf counters will be shown up
    299 
    300 
    301 #### Simpleperf record
    302 
    303 simpleperf record is used to dump records of the profiled program. By passing
    304 options, we can select which events to use, which processes/threads to monitor,
    305 what frequency to dump records, how long to monitor, and where to store records.
    306 
    307     # Record on process 7394 for 10 seconds, using default event (cpu-cycles),
    308     # using default sample frequency (4000 samples per second), writing records
    309     # to perf.data.
    310     $ simpleperf record -p 7394 --duration 10
    311     simpleperf I 07-11 21:44:11 17522 17522 cmd_record.cpp:316] Samples recorded: 21430. Samples lost: 0.
    312 
    313 **Select events**
    314 In most cases, the cpu-cycles event is used to evaluate consumed cpu time.
    315 As a hardware event, it is both accurate and efficient. We can also use other
    316 events via -e option. Below is an example.
    317 
    318     # Record using event instructions.
    319     $ simpleperf record -e instructions -p 11904 --duration 10
    320 
    321 **Select target to monitor**
    322 The way to select target in record command is similar to that in stat command.
    323 Below are examples.
    324 
    325     # Record process 11904 and 11905.
    326     $ simpleperf record -p 11904,11905 --duration 10
    327 
    328     # Record thread 11904 and 11905.
    329     $ simpleperf record -t 11904,11905 --duration 10
    330 
    331     # Record a child process running `ls`.
    332     $ simpleperf record ls
    333 
    334 **Set the frequency to record**
    335 We can set the frequency to dump records via the -f or -c options. For example,
    336 -f 4000 means dumping approximately 4000 records every second when the monitored
    337 thread runs. If a monitored thread runs 0.2s in one second (it can be preempted
    338 or blocked in other times), simpleperf dumps about 4000 * 0.2 / 1.0 = 800
    339 records every second. Another way is using -c option. For example, -c 10000
    340 means dumping one record whenever 10000 events happen. Below are examples.
    341 
    342     # Record with sample frequency 1000: sample 1000 times every second running.
    343     $ simpleperf record -f 1000 -p 11904,11905 --duration 10
    344 
    345     # Record with sample period 100000: sample 1 time every 100000 events.
    346     $ simpleperf record -c 100000 -t 11904,11905 --duration 10
    347 
    348 **Decide how long to monitor**
    349 The way to decide how long to monitor in record command is similar to that in
    350 stat command. Below are examples.
    351 
    352     # Record process 11904 for 10 seconds.
    353     $ simpleperf record -p 11904 --duration 10
    354 
    355     # Record until the child process running `ls` finishes.
    356     $ simpleperf record ls
    357 
    358     # Stop monitoring using Ctrl-C.
    359     $ simpleperf record -p 11904 --duration 10
    360     ^C
    361 
    362 **Set the path to store records**
    363 By default, simpleperf stores records in perf.data in current directory. We can
    364 use -o option to set the path to store records. Below is an example.
    365 
    366     # Write records to data/perf2.data.
    367     $ simpleperf record -p 11904 -o data/perf2.data --duration 10
    368 
    369 
    370 #### Simpleperf report
    371 
    372 simpleperf report is used to report based on perf.data generated by simpleperf
    373 record command. Report command groups records into different sample entries,
    374 sorts sample entries based on how many events each sample entry contains, and
    375 prints out each sample entry. By passing options, we can select where to find
    376 perf.data and executable binaries used by the monitored program, filter out
    377 uninteresting records, and decide how to group records.
    378 
    379 Below is an example. Records are grouped into 4 sample entries, each entry is
    380 a row. There are several columns, each column shows piece of information
    381 belonging to a sample entry. The first column is Overhead, which shows the
    382 percentage of events inside current sample entry in total events. As the
    383 perf event is cpu-cycles, the overhead can be seen as the percentage of cpu
    384 time used in each function.
    385 
    386     # Reports perf.data, using only records sampled in libsudo-game-jni.so,
    387     # grouping records using thread name(comm), process id(pid), thread id(tid),
    388     # function name(symbol), and showing sample count for each row.
    389     $ simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so --sort comm,pid,tid,symbol -n
    390     Cmdline: /data/data/com.example.sudogame/simpleperf record -p 7394 --duration 10
    391     Arch: arm64
    392     Event: cpu-cycles (type 0, config 0)
    393     Samples: 28235
    394     Event count: 546356211
    395 
    396     Overhead  Sample  Command    Pid   Tid   Symbol
    397     59.25%    16680   sudogame  7394  7394  checkValid(Board const&, int, int)
    398     20.42%    5620    sudogame  7394  7394  canFindSolution_r(Board&, int, int)
    399     13.82%    4088    sudogame  7394  7394  randomBlock_r(Board&, int, int, int, int, int)
    400     6.24%     1756    sudogame  7394  7394  @plt
    401 
    402 **Set the path to read records**
    403 By default, simpleperf reads perf.data in current directory. We can use -i
    404 option to select another file to read records.
    405 
    406     $ simpleperf report -i data/perf2.data
    407 
    408 **Set the path to find executable binaries**
    409 If reporting function symbols, simpleperf needs to read executable binaries
    410 used by the monitored processes to get symbol table and debug information. By
    411 default, the paths are the executable binaries used by monitored processes while
    412 recording. However, these binaries may not exist when reporting or not contain
    413 symbol table and debug information. So we can use --symfs to redirect the paths.
    414 Below is an example.
    415 
    416     $ simpleperf report
    417     # In this case, when simpleperf wants to read executable binary /A/b,
    418     # it reads file in /A/b.
    419 
    420     $ simpleperf report --symfs /debug_dir
    421     # In this case, when simpleperf wants to read executable binary /A/b,
    422     # it prefers file in /debug_dir/A/b to file in /A/b.
    423 
    424 **Filter records**
    425 When reporting, it happens that not all records are of interest. Simpleperf
    426 supports five filters to select records of interest. Below are examples.
    427 
    428     # Report records in threads having name sudogame.
    429     $ simpleperf report --comms sudogame
    430 
    431     # Report records in process 7394 or 7395
    432     $ simpleperf report --pids 7394,7395
    433 
    434     # Report records in thread 7394 or 7395.
    435     $ simpleperf report --tids 7394,7395
    436 
    437     # Report records in libsudo-game-jni.so.
    438     $ simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so
    439 
    440     # Report records in function checkValid or canFindSolution_r.
    441     $ simpleperf report --symbols "checkValid(Board const&, int, int);canFindSolution_r(Board&, int, int)"
    442 
    443 **Decide how to group records into sample entries**
    444 Simpleperf uses --sort option to decide how to group sample entries. Below are
    445 examples.
    446 
    447     # Group records based on their process id: records having the same process
    448     # id are in the same sample entry.
    449     $ simpleperf report --sort pid
    450 
    451     # Group records based on their thread id and thread comm: records having
    452     # the same thread id and thread name are in the same sample entry.
    453     $ simpleperf report --sort tid,comm
    454 
    455     # Group records based on their binary and function: records in the same
    456     # binary and function are in the same sample entry.
    457     $ simpleperf report --sort dso,symbol
    458 
    459     # Default option: --sort comm,pid,tid,dso,symbol. Group records in the same
    460     # thread, and belong to the same function in the same binary.
    461     $ simpleperf report
    462 
    463 
    464 ## Android application profiling
    465 
    466 This section shows how to profile an Android application.
    467 [Here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/README.md) are examples. And we use
    468 [SimpleperfExamplePureJava](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava) project to show the profiling results.
    469 
    470 Simpleperf only supports profiling native instructions in binaries in ELF
    471 format. If the Java code is executed by interpreter, or with jit cache, it
    472 cant be profiled by simpleperf. As Android supports Ahead-of-time compilation,
    473 it can compile Java bytecode into native instructions with debug information.
    474 On devices with Android version <= M, we need root privilege to compile Java
    475 bytecode with debug information. However, on devices with Android version >= N,
    476 we don't need root privilege to do so.
    477 
    478 Profiling an Android application involves three steps:
    479 1. Prepare the application.
    480 2. Record profiling data.
    481 3. Report profiling data.
    482 
    483 To profile, we can use either command lines or python scripts. Below shows both.
    484 
    485 
    486 ### Prepare an Android application
    487 
    488 Before profiling, we need to install the application to be profiled on an Android device.
    489 To get valid profiling results, please check following points:
    490 
    491 **1. The application should be debuggable.**
    492 It means [android:debuggable](https://developer.android.com/guide/topics/manifest/application-element.html#debug)
    493 should be true. So we need to use debug [build type](https://developer.android.com/studio/build/build-variants.html#build-types)
    494 instead of release build type. It is understandable because we can't profile others' apps.
    495 However, on a rooted Android device, the application doesn't need to be debuggable.
    496 
    497 **2. Run on an Android device >= L.**
    498 Profiling on emulators are not yet supported. And to profile Java code, we need
    499 the jvm running in oat mode, which is only available >= L.
    500 
    501 **3. On Android O, add `wrap.sh` in the apk.**
    502 To profile Java code, we need the jvm running in oat mode. But on Android O,
    503 debuggable applications are forced to run in jit mode. To work around this,
    504 we need to add a `wrap.sh` in the apk. So if you are running on Android O device,
    505 Check [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava/app/profiling.gradle)
    506 for how to add `wrap.sh` in the apk.
    507 
    508 **4. Make sure C++ code is compiled with optimizing flags.**
    509 If the application contains C++ code, it can be compiled with -O0 flag in debug build type.
    510 This makes C++ code slow. Check [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava/app/profiling.gradle)
    511 for how to avoid that.
    512 
    513 **5. Use native libraries with debug info in the apk when possible.**
    514 If the application contains C++ code or pre-compiled native libraries, try to use
    515 unstripped libraries in the apk. This helps simpleperf generating better profiling
    516 results. Check [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava/app/profiling.gradle)
    517 for how to use unstripped libraries.
    518 
    519 Here we use [SimpleperfExamplePureJava](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava) as an example.
    520 It builds an app-profiling.apk for profiling.
    521 
    522     $ git clone https://android.googlesource.com/platform/system/extras
    523     $ cd extras/simpleperf/demo
    524     # Open SimpleperfExamplesPureJava project with Android studio,
    525     # and build this project sucessfully, otherwise the `./gradlew` command below will fail.
    526     $ cd SimpleperfExamplePureJava
    527 
    528     # On windows, use "gradlew" instead.
    529     $ ./gradlew clean assemble
    530     $ adb install -r app/build/outputs/apk/app-profiling.apk
    531 
    532 
    533 ### Record and report profiling data (using command-lines)
    534 
    535 We recommend using python scripts for profiling because they are more convenient.
    536 But using command-line will give us a better understanding of the profile process
    537 step by step. So we first show how to use command lines.
    538 
    539 **1. Enable profiling**
    540 
    541     $ adb shell setprop security.perf_harden 0
    542 
    543 **2. Fully compile the app**
    544 
    545 We need to compile Java bytecode into native instructions to profile Java code
    546 in the application. This needs different commands on different Android versions.
    547 
    548 On Android >= N:
    549 
    550     $ adb shell setprop debug.generate-debug-info true
    551     $ adb shell cmd package compile -f -m speed com.example.simpleperf.simpleperfexamplepurejava
    552     # Restart the app to take effect
    553     $ adb shell am force-stop com.example.simpleperf.simpleperfexamplepurejava
    554 
    555 On Android M devices, We need root privilege to force Android to fully compile
    556 Java code into native instructions in ELF binaries with debug information. We
    557 also need root privilege to read compiled native binaries (because installd
    558 writes them to a directory whose uid/gid is system:install). So profiling Java
    559 code can only be done on rooted devices.
    560 
    561     $ adb root
    562     $ adb shell setprop dalvik.vm.dex2oat-flags -g
    563 
    564     # Reinstall the app.
    565     $ adb install -r app/build/outputs/apk/app-profiling.apk
    566 
    567 On Android L devices, we also need root privilege to compile the app with debug info
    568 and access the native binaries.
    569 
    570     $ adb root
    571     $ adb shell setprop dalvik.vm.dex2oat-flags --include-debug-symbols
    572 
    573     # Reinstall the app.
    574     $ adb install -r app/build/outputs/apk/app-profiling.apk
    575 
    576 
    577 **3. Find the app process**
    578 
    579     # Start the app if needed
    580     $ adb shell am start -n com.example.simpleperf.simpleperfexamplepurejava/.MainActivity
    581 
    582     # Run `ps` in the app's context. On Android >= O devicces, run `ps -e` instead.
    583     $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ps | grep simpleperf
    584     u0_a151   6885  3346  1590504 53980 SyS_epoll_ 6fc2024b6c S com.example.simpleperf.simpleperfexamplepurejava
    585 
    586 So the id of the app process is `6885`. We will use this number in the command lines below,
    587 please replace this number with what you get by running `ps` command.
    588 
    589 **4. Download simpleperf to the app's data directory**
    590 
    591     # Find which architecture the app is using.
    592     $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava cat /proc/6885/maps | grep boot.oat
    593     708e6000-70e33000 r--p 00000000 103:09 1214                              /system/framework/arm64/boot.oat
    594 
    595     # The app uses /arm64/boot.oat, so push simpleperf in bin/android/arm64/ to device.
    596     $ cd ../../scripts/
    597     $ adb push bin/android/arm64/simpleperf /data/local/tmp
    598     $ adb shell chmod a+x /data/local/tmp/simpleperf
    599     $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava cp /data/local/tmp/simpleperf .
    600 
    601 
    602 **5. Record perf.data**
    603 
    604     $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ./simpleperf record -p 6885 --duration 10
    605     simpleperf I 04-27 20:41:11  6940  6940 cmd_record.cpp:357] Samples recorded: 40008. Samples lost: 0.
    606 
    607     $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ls -lh perf.data
    608     simpleperf I 04-27 20:31:40  5999  5999 cmd_record.cpp:357] Samples recorded: 39949. Samples lost: 0.
    609 
    610 The profiling data is recorded at perf.data.
    611 
    612 Normally we need to use the app when profiling, otherwise we may record no samples.
    613 But in this case, the MainActivity starts a busy thread. So we don't need to use
    614 the app while profiling.
    615 
    616 There are many options to record profiling data, check [record command](#simpleperf-record) for details.
    617 
    618 **6. Report perf.data**
    619 
    620     # Pull perf.data on host.
    621     $ adb shell "run-as com.example.simpleperf.simpleperfexamplepurejava cat perf.data | tee /data/local/tmp/perf.data >/dev/null"
    622     $ adb pull /data/local/tmp/perf.data
    623 
    624     # Report samples using corresponding simpleperf executable on host.
    625     # On windows, use "bin\windows\x86_64\simpleperf" instead.
    626     $ bin/linux/x86_64/simpleperf report
    627     ...
    628     Overhead  Command   Pid   Tid   Shared Object                                                                     Symbol
    629     83.54%    Thread-2  6885  6900  /data/app/com.example.simpleperf.simpleperfexamplepurejava-2/oat/arm64/base.odex  void com.example.simpleperf.simpleperfexamplepurejava.MainActivity$1.run()
    630     16.11%    Thread-2  6885  6900  /data/app/com.example.simpleperf.simpleperfexamplepurejava-2/oat/arm64/base.odex  int com.example.simpleperf.simpleperfexamplepurejava.MainActivity$1.callFunction(int)
    631 
    632 See [here](#the-correct-way-to-pull-perfdata-on-host) for why we use tee rather than just >.
    633 There are many ways to show reports, check [report command](#simpleperf-report) for details.
    634 
    635 
    636 ### Record and report profiling data (using python scripts)
    637 
    638 Besides command lines, We can use `app-profiler.py` to profile Android applications.
    639 It downloads simpleperf on device, records perf.data, and collects profiling
    640 results and native binaries on host. It is configured by `app-profiler.config`.
    641 
    642 **1. Fill `app-profiler.config`**
    643 
    644     Change `app_package_name` line to  app_package_name="com.example.simpleperf.simpleperfexamplepurejava"
    645     Change `apk_file_path` line to apk_file_path = "../SimpleperfExamplePureJava/app/build/outputs/apk/app-profiling.apk"
    646     Change `android_studio_project_dir` line to android_studio_project_dir = "../SimpleperfExamplePureJava/"
    647     Change `record_options` line to record_options = "--duration 10"
    648 
    649 `apk_file_path` is needed to fully compile the application on Android L/M. It is
    650 not necessary on Android >= N.
    651 
    652 `android_studio_project_dir` is used to search native libraries in the
    653 application. It is not necessary for profiling.
    654 
    655 `record_options` can be set to any option accepted by simpleperf record command.
    656 
    657 **2. Run `app-profiler.py`**
    658 
    659     $ python app_profiler.py
    660 
    661 
    662 If running successfully, it will collect profiling data in perf.data in current
    663 directory, and related native binaries in binary_cache/.
    664 
    665 **3. Report perf.data**
    666 
    667 We can use `report.py` to report perf.data.
    668 
    669     $ python report.py
    670 
    671 We can add any option accepted by `simpleperf report` command to `report.py`.
    672 
    673 
    674 ### Record and report call graph
    675 
    676 A call graph is a tree showing function call relations. Below is an example.
    677 
    678     main() {
    679         FunctionOne();
    680         FunctionTwo();
    681     }
    682     FunctionOne() {
    683         FunctionTwo();
    684         FunctionThree();
    685     }
    686     callgraph:
    687         main-> FunctionOne
    688            |    |
    689            |    |-> FunctionTwo
    690            |    |-> FunctionThree
    691            |
    692            |-> FunctionTwo
    693 
    694 
    695 #### Record dwarf based call graph
    696 
    697 When using command lines, add `-g` option like below:
    698 
    699     $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ./simpleperf record -g -p 6685 --duration 10
    700 
    701 When using python scripts, change `app-profiler.config` as below:
    702 
    703     Change `record_options` line to record_options = "--duration 10 -g"
    704 
    705 Recording dwarf based call graph needs support of debug information
    706 in native binaries. So if using native libraries in the application,
    707 it is better to contain non-stripped native libraries in the apk.
    708 
    709 
    710 #### Record stack frame based call graph
    711 
    712 When using command lines, add `--call-graph fp` option like below:
    713 
    714     $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ./simpleperf record --call-graph fp -p 6685 --duration 10
    715 
    716 When using python scripts, change `app-profiler.config` as below:
    717 
    718     Change `record_options` line to record_options = "--duration 10 --call-graph fp"
    719 
    720 Recording stack frame based call graphs needs support of stack frame
    721 register. Notice that on arm architecture, the stack frame register
    722 is not well supported, even if compiled using -O0 -g -fno-omit-frame-pointer
    723 options. It is because the kernel can't unwind user stack containing both
    724 arm/thumb code. **So please consider using dwarf based call graph on arm
    725 architecture, or profiling in arm64 environment.**
    726 
    727 
    728 #### Report call graph
    729 
    730 To report call graph using command lines, add `-g` option.
    731 
    732     $ bin/linux/x86_64/simpleperf report -g
    733     ...
    734     Children  Self    Command          Pid    Tid    Shared Object                                                                     Symbol
    735     99.97%    0.00%   Thread-2         10859  10876  /system/framework/arm64/boot.oat                                                  java.lang.Thread.run
    736        |
    737        -- java.lang.Thread.run
    738           |
    739            -- void com.example.simpleperf.simpleperfexamplepurejava.MainActivity$1.run()
    740                |--83.66%-- [hit in function]
    741                |
    742                |--16.22%-- int com.example.simpleperf.simpleperfexamplepurejava.MainActivity$1.callFunction(int)
    743                |    |--99.97%-- [hit in function]
    744 
    745 To report call graph using python scripts, add `-g` option.
    746 
    747     $ python report.py -g
    748     # Double-click an item started with '+' to show its callgraph.
    749 
    750 ### Visualize profiling data
    751 
    752 `simpleperf_report_lib.py` provides an interface reading samples from perf.data.
    753 By using it, You can write python scripts to read perf.data or convert perf.data
    754 to other formats. Below are two examples.
    755 
    756 
    757 ### Show flamegraph
    758 
    759     $ python report_sample.py >out.perf
    760     $ stackcollapse-perf.pl out.perf >out.folded
    761     $ ./flamegraph.pl out.folded >a.svg
    762 
    763 
    764 ### Visualize using pprof
    765 
    766 pprof is a tool for visualization and analysis of profiling data. It can
    767 be got from https://github.com/google/pprof. pprof_proto_generator.py can
    768 generate profiling data in a format acceptable by pprof.
    769 
    770     $ python pprof_proto_generator.py
    771     $ pprof -pdf pprof.profile
    772 
    773 
    774 ### Annotate source code
    775 
    776 `annotate.py` reads perf.data, binaries in `binary-cache` (collected by `app-profiler.py`)
    777 and source code, and generates annoated source code in `annotated_files/`.
    778 
    779 **1. Run annotate.py**
    780 
    781     $ python annotate.py -s ../SimpleperfExamplePureJava
    782 
    783 `addr2line` is need to annotate source code. It can be found in Android ndk
    784 release, in paths like toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/bin/aarch64-linux-android-addr2line.
    785 Please use `--addr2line` option to set the path of `addr2line` if annotate.py
    786 can't find it.
    787 
    788 **2. Read annotated code**
    789 
    790 The annotated source code is located at `annotated_files/`.
    791 `annotated_files/summary` shows how each source file is annotated.
    792 
    793 One annotated source file is `annotated_files/java/com/example/simpleperf/simpleperfexamplepurejava/MainActivity.java`.
    794 It's content is similar to below:
    795 
    796     // [file] shows how much time is spent in current file.
    797     /* [file] acc_p: 99.966552%, p: 99.837438% */package com.example.simpleperf.simpleperfexamplepurejava;
    798     ...
    799     // [func] shows how much time is spent in current function.
    800     /* [func] acc_p: 16.213395%, p: 16.209250% */            private int callFunction(int a) {
    801     ...
    802     // This shows how much time is spent in current line.
    803     // acc_p field means how much time is spent in current line and functions called by current line.
    804     // p field means how much time is spent just in current line.
    805     /* acc_p: 99.966552%, p: 83.628188%        */                    i = callFunction(i);
    806 
    807 
    808 ## Answers to common issues
    809 
    810 ### The correct way to pull perf.data on host
    811 As perf.data is generated in app's context, it can't be pulled directly to host.
    812 One way is to `adb shell run-as xxx cat perf.data >perf.data`. However, it
    813 doesn't work well on Windows, because the content can be modified when it goes
    814 through the pipe. So we first copy it from app's context to shell's context,
    815 then pull it on host. The commands are as below:
    816 
    817     $adb shell "run-as xxx cat perf.data | tee /data/local/tmp/perf.data >/dev/null"
    818     $adb pull /data/local/tmp/perf.data
    819 
    820 ## Inferno
    821 
    822 ![logo](./inferno/inferno_small.png)
    823 
    824 ### Description
    825 
    826 Inferno is a flamegraph generator for native (C/C++) Android apps. It was
    827 originally written to profile and improve surfaceflinger performance
    828 (Android compositor) but it can be used for any native Android application
    829 . You can see a sample report generated with Inferno
    830 [here](./inferno/report.html). Report are self-contained in HTML so they can be
    831 exchanged easily.
    832 
    833 Notice there is no concept of time in a flame graph since all callstack are
    834 merged together. As a result, the width of a flamegraph represents 100% of
    835 the number of samples and the height is related to the number of functions on
    836 the stack when sampling occurred.
    837 
    838 
    839 ![flamegraph sample](./inferno/main_thread_flamegraph.png)
    840 
    841 In the flamegraph featured above you can see the main thread of SurfaceFlinger.
    842 It is immediatly apparent that most of the CPU time is spent processing messages
    843 `android::SurfaceFlinger::onMessageReceived`. The most expensive task is to ask
    844  the screen to be refreshed as `android::DisplayDevice::prepare` shows in orange
    845 . This graphic division helps to see what part of the program is costly and
    846 where a developer's effort to improve performances should go.
    847 
    848 ### Example of bottleneck
    849 
    850 A flamegraph give you instant vision on the CPU cycles cost centers but
    851 it can also be used to find specific offenders. To find them, look for
    852 plateaus. It is easier to see an example:
    853 
    854 ![flamegraph sample](./inferno/bottleneck.png)
    855 
    856 In the previous flamegraph, two
    857 plateaus (due to `android::BufferQueueCore::validateConsistencyLocked`)
    858 are immediately apparent.
    859 
    860 ### How it works
    861 Inferno relies on simpleperf to record the callstack of a native application
    862 thousands of times per second. Simpleperf takes care of unwinding the stack
    863 either using frame pointer (recommended) or dwarf. At the end of the recording
    864 `simpleperf` also symbolize all IPs automatically. The record are aggregated and
    865 dumps dumped to a file `perf.data`. This file is pulled from the Android device
    866 and processed on the host by Inferno. The callstacks are merged together to
    867 visualize in which part of an app the CPU cycles are spent.
    868 
    869 ### How to use it
    870 
    871 Open a terminal and from `simpleperf` directory type:
    872 ```
    873 ./inferno.sh  (on Linux/Mac)
    874 ./inferno.bat (on Windows)
    875 ```
    876 
    877 Inferno will collect data, process them and automatically open your web browser
    878 to display the HTML report.
    879 
    880 ### Parameters
    881 
    882 You can select how long to sample for, the color of the node and many other
    883 things. Use `-h` to get a list of all supported parameters.
    884 
    885 ```
    886 ./inferno.sh -h
    887 ```
    888 
    889 ### Troubleshooting
    890 
    891 #### Messy flame graph
    892 A healthy flame graph features a single call site at its base
    893 (see `inferno/report.html`).
    894 If you don't see a unique call site like `_start` or `_start_thread` at the base
    895 from which all flames originate, something went wrong. : Stack unwinding may
    896 fail to reach the root callsite. These incomplete
    897 callstack are impossible to merge properly. By default Inferno asks
    898  `simpleperf` to unwind the stack via the kernel and frame pointers. Try to
    899  perform unwinding with dwarf `-du`, you can further tune this setting.
    900 
    901 
    902 #### No flames
    903 If you see no flames at all or a mess of 1 level flame without a common base,
    904 this may be because you compiled without frame pointers. Make sure there is no
    905 ` -fomit-frame-pointer` in your build config. Alternatively, ask simpleperf to
    906 collect data with dward unwinding `-du`.
    907 
    908 
    909 
    910 #### High percentage of lost samples
    911 
    912 If simpleperf reports a lot of lost sample it is probably because you are
    913 unwinding with `dwarf`. Dwarf unwinding involves copying the stack before it is
    914 processed. Try to use frame pointer unwinding which can be done by the kernel
    915 and it much faster.
    916 
    917 The cost of frame pointer is negligible on arm64 parameter but considerable
    918  on arm 32-bit arch (due to register pressure). Use a 64-bit build for better
    919  profiling.
    920 
    921 #### run-as: package not debuggable
    922 If you cannot run as root, make sure the app is debuggable otherwise simpleperf
    923 will not be able to profile it.