Home | History | Annotate | only in /system/extras/simpleperf
Up to higher level directory
NameDateSize
.gitignore05-Oct-20175
Android.mk05-Oct-201713.5K
build_id.h05-Oct-20172.5K
callchain.h05-Oct-20174.9K
cmd_dumprecord.cpp05-Oct-20177.6K
cmd_dumprecord_test.cpp05-Oct-20171,017
cmd_help.cpp05-Oct-20172.7K
cmd_kmem.cpp05-Oct-201725.7K
cmd_kmem_test.cpp05-Oct-20174.6K
cmd_list.cpp05-Oct-20173K
cmd_list_test.cpp05-Oct-20171.1K
cmd_record.cpp05-Oct-201737.5K
cmd_record_test.cpp05-Oct-201713.5K
cmd_report.cpp05-Oct-201729K
cmd_report_sample.cpp05-Oct-201719K
cmd_report_sample_test.cpp05-Oct-20171.9K
cmd_report_test.cpp05-Oct-201716.6K
cmd_stat.cpp05-Oct-201721.3K
cmd_stat_test.cpp05-Oct-20175K
command.cpp05-Oct-20172.9K
command.h05-Oct-20172K
command_test.cpp05-Oct-20171.5K
cpu_hotplug_test.cpp05-Oct-201715.4K
dso.cpp05-Oct-201714.5K
dso.h05-Oct-20175K
dwarf_unwind.cpp05-Oct-20176.7K
dwarf_unwind.h05-Oct-20171K
environment.cpp05-Oct-201714.1K
environment.h05-Oct-20172.4K
environment_test.cpp05-Oct-2017988
event_attr.cpp05-Oct-20179.4K
event_attr.h05-Oct-20171.5K
event_fd.cpp05-Oct-20179.2K
event_fd.h05-Oct-20174.2K
event_selection_set.cpp05-Oct-201726.7K
event_selection_set.h05-Oct-20176.6K
event_type.cpp05-Oct-20175.4K
event_type.h05-Oct-20172.2K
event_type_table.h05-Oct-20177.7K
generate_event_type_table.py05-Oct-20174.6K
get_test_data.h05-Oct-20175.3K
gtest_main.cpp05-Oct-20177K
inplace_sampler_lib.cpp05-Oct-201710.5K
inplace_sampler_lib.h05-Oct-20172.1K
InplaceSamplerClient.cpp05-Oct-20176.1K
InplaceSamplerClient.h05-Oct-20171.8K
IOEventLoop.cpp05-Oct-20174.6K
IOEventLoop.h05-Oct-20172.8K
IOEventLoop_test.cpp05-Oct-20174.7K
main.cpp05-Oct-20172.3K
nonlinux_support/05-Oct-2017
NOTICE05-Oct-201710.4K
perf_clock.cpp05-Oct-20174.9K
perf_clock.h05-Oct-20171.1K
perf_event.h05-Oct-2017843
perf_regs.cpp05-Oct-20176.7K
perf_regs.h05-Oct-20172.7K
read_apk.cpp05-Oct-20176.6K
read_apk.h05-Oct-20173.5K
read_apk_test.cpp05-Oct-20172.6K
read_elf.cpp05-Oct-201717K
read_elf.h05-Oct-20173.1K
read_elf_test.cpp05-Oct-20175.8K
README.md05-Oct-201736.6K
record.cpp05-Oct-201733.4K
record.h05-Oct-201716.5K
record_equal_test.h05-Oct-20173.1K
record_file.h05-Oct-20176.4K
record_file_format.h05-Oct-20172.4K
record_file_reader.cpp05-Oct-201714K
record_file_test.cpp05-Oct-20175.6K
record_file_writer.cpp05-Oct-201711.8K
record_test.cpp05-Oct-20175K
report_lib_interface.cpp05-Oct-201711.3K
report_sample.proto05-Oct-20171.6K
runtest/05-Oct-2017
sample_tree.h05-Oct-201711.2K
sample_tree_test.cpp05-Oct-20178.4K
SampleComparator.h05-Oct-20173.7K
SampleDisplayer.h05-Oct-20178.7K
scripts/05-Oct-2017
test_util.h05-Oct-20171.7K
testdata/05-Oct-2017
thread_tree.cpp05-Oct-201710.9K
thread_tree.h05-Oct-20175.2K
tracing.cpp05-Oct-201713.5K
tracing.h05-Oct-20172.2K
UnixSocket.cpp05-Oct-20177K
UnixSocket.h05-Oct-20177.9K
UnixSocket_test.cpp05-Oct-20177.1K
utils.cpp05-Oct-20178.6K
utils.h05-Oct-20174.3K
utils_test.cpp05-Oct-20172.4K
workload.cpp05-Oct-20175.2K
workload.h05-Oct-20172.2K
workload_test.cpp05-Oct-20172.4K

README.md

      1 # Simpleperf Introduction
      2 ## What is simpleperf
      3 Simpleperf is a native profiling tool for Android. Its command-line interface
      4 supports broadly the same options as the linux-tools perf, but also supports
      5 various Android-specific improvements.
      6 
      7 Simpleperf is part of the Android Open Source Project. The source code is at
      8 https://android.googlesource.com/platform/system/extras/+/master/simpleperf/.
      9 The latest document is at
     10 https://android.googlesource.com/platform/system/extras/+show/master/simpleperf/README.md.
     11 Bugs and feature requests can be submitted at
     12 http://github.com/android-ndk/ndk/issues.
     13 
     14 ## How simpleperf works
     15 Modern CPUs have a hardware component called the performance monitoring unit
     16 (PMU). The PMU has several hardware counters, counting events like how many cpu
     17 cycles have happened, how many instructions have executed, or how many cache
     18 misses have happened.
     19 
     20 The Linux kernel wraps these hardware counters into hardware perf events. In
     21 addition, the Linux kernel also provides hardware independent software events
     22 and tracepoint events. The Linux kernel exposes all this to userspace via the
     23 perf_event_open system call, which simpleperf uses.
     24 
     25 Simpleperf has three main functions: stat, record and report.
     26 
     27 The stat command gives a summary of how many events have happened in the
     28 profiled processes in a time period. Heres how it works:
     29 1. Given user options, simpleperf enables profiling by making a system call to
     30 linux kernel.
     31 2. Linux kernel enables counters while scheduling on the profiled processes.
     32 3. After profiling, simpleperf reads counters from linux kernel, and reports a
     33 counter summary.
     34 
     35 The record command records samples of the profiled process in a time period.
     36 Heres how it works:
     37 1. Given user options, simpleperf enables profiling by making a system call to
     38 linux kernel.
     39 2. Simpleperf creates mapped buffers between simpleperf and linux kernel.
     40 3. Linux kernel enable counters while scheduling on the profiled processes.
     41 4. Each time a given number of events happen, linux kernel dumps a sample to a
     42 mapped buffer.
     43 5. Simpleperf reads samples from the mapped buffers and generates perf.data.
     44 
     45 The report command reads a "perf.data" file and any shared libraries used by
     46 the profiled processes, and outputs a report showing where the time was spent.
     47 
     48 ## Main simpleperf commands
     49 Simpleperf supports several subcommands, including list, stat, record, report.
     50 Each subcommand supports different options. This section only covers the most
     51 important subcommands and options. To see all subcommands and options,
     52 use --help.
     53 
     54     # List all subcommands.
     55     $simpleperf --help
     56 
     57     # Print help message for record subcommand.
     58     $simpleperf record --help
     59 
     60 ### simpleperf list
     61 simpleperf list is used to list all events available on the device. Different
     62 devices may support different events because of differences in hardware and
     63 kernel.
     64 
     65     $simpleperf list
     66     List of hw-cache events:
     67       branch-loads
     68       ...
     69     List of hardware events:
     70       cpu-cycles
     71       instructions
     72       ...
     73     List of software events:
     74       cpu-clock
     75       task-clock
     76       ...
     77 
     78 ### simpleperf stat
     79 simpleperf stat is used to get a raw event counter information of the profiled program
     80 or system-wide. By passing options, we can select which events to use, which
     81 processes/threads to monitor, how long to monitor and the print interval.
     82 Below is an example.
     83 
     84     # Stat using default events (cpu-cycles,instructions,...), and monitor
     85     # process 7394 for 10 seconds.
     86     $simpleperf stat -p 7394 --duration 10
     87     Performance counter statistics:
     88 
     89      1,320,496,145  cpu-cycles         # 0.131736 GHz                     (100%)
     90        510,426,028  instructions       # 2.587047 cycles per instruction  (100%)
     91          4,692,338  branch-misses      # 468.118 K/sec                    (100%)
     92     886.008130(ms)  task-clock         # 0.088390 cpus used               (100%)
     93                753  context-switches   # 75.121 /sec                      (100%)
     94                870  page-faults        # 86.793 /sec                      (100%)
     95 
     96     Total test time: 10.023829 seconds.
     97 
     98 #### Select events
     99 We can select which events to use via -e option. Below are examples:
    100 
    101     # Stat event cpu-cycles.
    102     $simpleperf stat -e cpu-cycles -p 11904 --duration 10
    103 
    104     # Stat event cache-references and cache-misses.
    105     $simpleperf stat -e cache-references,cache-misses -p 11904 --duration 10
    106 
    107 When running the stat command, if the number of hardware events is larger than
    108 the number of hardware counters available in the PMU, the kernel shares hardware
    109 counters between events, so each event is only monitored for part of the total
    110 time. In the example below, there is a percentage at the end of each row,
    111 showing the percentage of the total time that each event was actually monitored.
    112 
    113     # Stat using event cache-references, cache-references:u,....
    114     $simpleperf stat -p 7394 -e     cache-references,cache-references:u,cache-references:k,cache-misses,cache-misses:u,cache-misses:k,instructions --duration 1
    115     Performance counter statistics:
    116 
    117     4,331,018  cache-references     # 4.861 M/sec    (87%)
    118     3,064,089  cache-references:u   # 3.439 M/sec    (87%)
    119     1,364,959  cache-references:k   # 1.532 M/sec    (87%)
    120        91,721  cache-misses         # 102.918 K/sec  (87%)
    121        45,735  cache-misses:u       # 51.327 K/sec   (87%)
    122        38,447  cache-misses:k       # 43.131 K/sec   (87%)
    123     9,688,515  instructions         # 10.561 M/sec   (89%)
    124 
    125     Total test time: 1.026802 seconds.
    126 
    127 In the example above, each event is monitored about 87% of the total time. But
    128 there is no guarantee that any pair of events are always monitored at the same
    129 time. If we want to have some events monitored at the same time, we can use
    130 --group option. Below is an example.
    131 
    132     # Stat using event cache-references, cache-references:u,....
    133     $simpleperf stat -p 7394 --group cache-references,cache-misses --group cache-references:u,cache-misses:u --group cache-references:k,cache-misses:k -e instructions --duration 1
    134     Performance counter statistics:
    135 
    136     3,638,900  cache-references     # 4.786 M/sec          (74%)
    137        65,171  cache-misses         # 1.790953% miss rate  (74%)
    138     2,390,433  cache-references:u   # 3.153 M/sec          (74%)
    139        32,280  cache-misses:u       # 1.350383% miss rate  (74%)
    140       879,035  cache-references:k   # 1.251 M/sec          (68%)
    141        30,303  cache-misses:k       # 3.447303% miss rate  (68%)
    142     8,921,161  instructions         # 10.070 M/sec         (86%)
    143 
    144     Total test time: 1.029843 seconds.
    145 
    146 #### Select target to monitor
    147 We can select which processes or threads to monitor via -p option or -t option.
    148 Monitoring a process is the same as monitoring all threads in the process.
    149 Simpleperf can also fork a child process to run the new command and then monitor
    150 the child process. Below are examples.
    151 
    152     # Stat process 11904 and 11905.
    153     $simpleperf stat -p 11904,11905 --duration 10
    154 
    155     # Stat thread 11904 and 11905.
    156     $simpleperf stat -t 11904,11905 --duration 10
    157 
    158     # Start a child process running `ls`, and stat it.
    159     $simpleperf stat ls
    160 
    161 #### Decide how long to monitor
    162 When monitoring existing threads, we can use --duration option to decide how long
    163 to monitor. When monitoring a child process running a new command, simpleperf
    164 monitors until the child process ends. In this case, we can use Ctrl-C to stop monitoring
    165 at any time. Below are examples.
    166 
    167     # Stat process 11904 for 10 seconds.
    168     $simpleperf stat -p 11904 --duration 10
    169 
    170     # Stat until the child process running `ls` finishes.
    171     $simpleperf stat ls
    172 
    173     # Stop monitoring using Ctrl-C.
    174     $simpleperf stat -p 11904 --duration 10
    175     ^C
    176 
    177 #### Decide the print interval
    178 When monitoring perf counters, we can also use --interval option to decide the print
    179 interval. Below are examples.
    180 
    181     # Print stat for process 11904 every 300ms.
    182     $simpleperf stat -p 11904 --duration 10 --interval 300
    183 
    184     # Print system wide stat at interval of 300ms for 10 seconds (rooted device only).
    185     # system wide profiling needs root privilege
    186     $su 0 simpleperf stat -a --duration 10 --interval 300
    187 
    188 #### Display counters in systrace
    189 simpleperf can also work with systrace to dump counters in the collected trace.
    190 Below is an example to do a system wide stat
    191 
    192     # capture instructions (kernel only) and cache misses with interval of 300 milliseconds for 15 seconds
    193     $su 0 simpleperf stat -e instructions:k,cache-misses -a --interval 300 --duration 15
    194     # on host launch systrace to collect trace for 10 seconds
    195     (HOST)$external/chromium-trace/systrace.py --time=10 -o new.html sched gfx view
    196     # open the collected new.html in browser and perf counters will be shown up
    197 
    198 ### simpleperf record
    199 simpleperf record is used to dump records of the profiled program. By passing
    200 options, we can select which events to use, which processes/threads to monitor,
    201 what frequency to dump records, how long to monitor, and where to store records.
    202 
    203     # Record on process 7394 for 10 seconds, using default event (cpu-cycles),
    204     # using default sample frequency (4000 samples per second), writing records
    205     # to perf.data.
    206     $simpleperf record -p 7394 --duration 10
    207     simpleperf I 07-11 21:44:11 17522 17522 cmd_record.cpp:316] Samples recorded: 21430. Samples lost: 0.
    208 
    209 #### Select events
    210 In most cases, the cpu-cycles event is used to evaluate consumed cpu time.
    211 As a hardware event, it is both accurate and efficient. We can also use other
    212 events via -e option. Below is an example.
    213 
    214     # Record using event instructions.
    215     $simpleperf record -e instructions -p 11904 --duration 10
    216 
    217 #### Select target to monitor
    218 The way to select target in record command is similar to that in stat command.
    219 Below are examples.
    220 
    221     # Record process 11904 and 11905.
    222     $simpleperf record -p 11904,11905 --duration 10
    223 
    224     # Record thread 11904 and 11905.
    225     $simpleperf record -t 11904,11905 --duration 10
    226 
    227     # Record a child process running `ls`.
    228     $simpleperf record ls
    229 
    230 #### Set the frequency to record
    231 We can set the frequency to dump records via the -f or -c options. For example,
    232 -f 4000 means dumping approximately 4000 records every second when the monitored
    233 thread runs. If a monitored thread runs 0.2s in one second (it can be preempted
    234 or blocked in other times), simpleperf dumps about 4000 * 0.2 / 1.0 = 800
    235 records every second. Another way is using -c option. For example, -c 10000
    236 means dumping one record whenever 10000 events happen. Below are examples.
    237 
    238     # Record with sample frequency 1000: sample 1000 times every second running.
    239     $simpleperf record -f 1000 -p 11904,11905 --duration 10
    240 
    241     # Record with sample period 100000: sample 1 time every 100000 events.
    242     $simpleperf record -c 100000 -t 11904,11905 --duration 10
    243 
    244 #### Decide how long to monitor
    245 The way to decide how long to monitor in record command is similar to that in
    246 stat command. Below are examples.
    247 
    248     # Record process 11904 for 10 seconds.
    249     $simpleperf record -p 11904 --duration 10
    250 
    251     # Record until the child process running `ls` finishes.
    252     $simpleperf record ls
    253 
    254     # Stop monitoring using Ctrl-C.
    255     $simpleperf record -p 11904 --duration 10
    256     ^C
    257 
    258 #### Set the path to store records
    259 By default, simpleperf stores records in perf.data in current directory. We can
    260 use -o option to set the path to store records. Below is an example.
    261 
    262     # Write records to data/perf2.data.
    263     $simpleperf record -p 11904 -o data/perf2.data --duration 10
    264 
    265 ### simpleperf report
    266 simpleperf report is used to report based on perf.data generated by simpleperf
    267 record command. Report command groups records into different sample entries,
    268 sorts sample entries based on how many events each sample entry contains, and
    269 prints out each sample entry. By passing options, we can select where to find
    270 perf.data and executable binaries used by the monitored program, filter out
    271 uninteresting records, and decide how to group records.
    272 
    273 Below is an example. Records are grouped into 4 sample entries, each entry is
    274 a row. There are several columns, each column shows piece of information
    275 belonging to a sample entry. The first column is Overhead, which shows the
    276 percentage of events inside current sample entry in total events. As the
    277 perf event is cpu-cycles, the overhead can be seen as the percentage of cpu
    278 time used in each function.
    279 
    280     # Reports perf.data, using only records sampled in libsudo-game-jni.so,
    281     # grouping records using thread name(comm), process id(pid), thread id(tid),
    282     # function name(symbol), and showing sample count for each row.
    283     $simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so --sort comm,pid,tid,symbol -n
    284     Cmdline: /data/data/com.example.sudogame/simpleperf record -p 7394 --duration 10
    285     Arch: arm64
    286     Event: cpu-cycles (type 0, config 0)
    287     Samples: 28235
    288     Event count: 546356211
    289 
    290     Overhead  Sample  Command    Pid   Tid   Symbol
    291     59.25%    16680   sudogame  7394  7394  checkValid(Board const&, int, int)
    292     20.42%    5620    sudogame  7394  7394  canFindSolution_r(Board&, int, int)
    293     13.82%    4088    sudogame  7394  7394  randomBlock_r(Board&, int, int, int, int, int)
    294     6.24%     1756    sudogame  7394  7394  @plt
    295 
    296 #### Set the path to read records
    297 By default, simpleperf reads perf.data in current directory. We can use -i
    298 option to select another file to read records.
    299 
    300     $simpleperf report -i data/perf2.data
    301 
    302 #### Set the path to find executable binaries
    303 If reporting function symbols, simpleperf needs to read executable binaries
    304 used by the monitored processes to get symbol table and debug information. By
    305 default, the paths are the executable binaries used by monitored processes while
    306 recording. However, these binaries may not exist when reporting or not contain
    307 symbol table and debug information. So we can use --symfs to redirect the paths.
    308 Below is an example.
    309 
    310     $simpleperf report
    311     # In this case, when simpleperf wants to read executable binary /A/b,
    312     # it reads file in /A/b.
    313 
    314     $simpleperf report --symfs /debug_dir
    315     # In this case, when simpleperf wants to read executable binary /A/b,
    316     # it prefers file in /debug_dir/A/b to file in /A/b.
    317 
    318 #### Filter records
    319 When reporting, it happens that not all records are of interest. Simpleperf
    320 supports five filters to select records of interest. Below are examples.
    321 
    322     # Report records in threads having name sudogame.
    323     $simpleperf report --comms sudogame
    324 
    325     # Report records in process 7394 or 7395
    326     $simpleperf report --pids 7394,7395
    327 
    328     # Report records in thread 7394 or 7395.
    329     $simpleperf report --tids 7394,7395
    330 
    331     # Report records in libsudo-game-jni.so.
    332     $simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so
    333 
    334     # Report records in function checkValid or canFindSolution_r.
    335     $simpleperf report --symbols "checkValid(Board const&, int, int);canFindSolution_r(Board&, int, int)"
    336 
    337 #### Decide how to group records into sample entries
    338 Simpleperf uses --sort option to decide how to group sample entries. Below are
    339 examples.
    340 
    341     # Group records based on their process id: records having the same process
    342     # id are in the same sample entry.
    343     $simpleperf report --sort pid
    344 
    345     # Group records based on their thread id and thread comm: records having
    346     # the same thread id and thread name are in the same sample entry.
    347     $simpleperf report --sort tid,comm
    348 
    349     # Group records based on their binary and function: records in the same
    350     # binary and function are in the same sample entry.
    351     $simpleperf report --sort dso,symbol
    352 
    353     # Default option: --sort comm,pid,tid,dso,symbol. Group records in the same
    354     # thread, and belong to the same function in the same binary.
    355     $simpleperf report
    356 
    357 ## Features of simpleperf
    358 Simpleperf works similar to linux-tools-perf, but it has following improvements:
    359 1. Aware of Android environment. Simpleperf handles some Android specific
    360 situations when profiling. For example, it can profile embedded shared libraries
    361 in apk, read symbol table and debug information from .gnu_debugdata section. If
    362 possible, it gives suggestions when facing errors, like how to disable
    363 perf_harden to enable profiling.
    364 2. Support unwinding while recording. If we want to use -g option to record and
    365 report call-graph of a program, we need to dump user stack and register set in
    366 each record, and then unwind the stack to find the call chain. Simpleperf
    367 supports unwinding while recording, so it doesnt need to store user stack in
    368 perf.data. So we can profile for a longer time with limited space on device.'
    369 3. Support scripts to make profiling on Android more convenient.
    370 4. Build in static binaries. Simpleperf is a static binary, so it doesnt need
    371 supporting shared libraries to run. It means there is no limitation of Android
    372 version that simpleperf can run on, although some devices dont support
    373 profiling.
    374 
    375 # Simpleperf tools in ndk
    376 Simpleperf tools in ndk contain three parts: simpleperf executable running on
    377 Android device, simpleperf executable running on host, and python scripts.
    378 
    379 ## Simpleperf on device
    380 Simpleperf running on device is located at bin/android directory. It contains
    381 static binaries running on Android on different architectures. They can be used
    382 to profile processes running device, and generate perf.data.
    383 
    384 ## Simpleperf on host
    385 Simpleperfs running on host are located at bin/darwin, bin/linux and
    386 bin/windows.They can be used to parse perf.data on host.
    387 
    388 ## Scripts
    389 Scripts are used to make it convenient to profile and parse profiling results.
    390 app_profiler.py is used to profile an android application. It prepares
    391 profiling environments, downloads simpleperf on device, generates and pulls
    392 perf.data on host. It is configured by app_profiler.config.
    393 binary_cache_builder.py is used to pull native binaries from device to host.
    394 It is used by app_profiler.py.
    395 annotate.py is used to annotate source files using perf.data. It is configured
    396 by annotate.config.
    397 report.py reports perf.data in a GUI window.
    398 simpleperf_report_lib.py is used to enumerate samples in perf.data. Internally
    399 it uses libsimpleperf_report.so to parse perf.data. It can be used to translate
    400 samples in perf.data to other forms. One example using simpleperf_report_lib.py
    401 is report_sample.py.
    402 
    403 # Examples of using simpleperf tools
    404 This section shows how to use simpleperf tools to profile an Android
    405 application.
    406 
    407 ## Prepare a debuggable Application
    408 The package name of the application is com.example.sudogame. It has both java
    409 code and c++ code. We need to run a copy of the app with
    410 android:debuggable=true in its AndroidManifest.xml <application> element,
    411 because we cant use run-as for non-debuggable apps. The application should
    412 has been installed on device, and we can connect device via adb.
    413 
    414 ## Profile using command line
    415 To record profiling data, we need to download simpleperf and native libraries
    416 with debug information on device, run simpleperf to generate profiling data
    417 file: perf.data, and run simpleperf to report perf.data. Below are the steps.
    418 
    419 ### 1. Enable profiling
    420 
    421     $adb shell setprop security.perf_harden 0
    422 
    423 ### 2. Find the process running the app
    424 Run `ps` in the apps context. On >=O devices, run `ps -e` instead.
    425 
    426     $adb shell
    427     angler:/ $ run-as com.example.sudogame
    428     angler:/data/data/com.example.sudogame $ ps
    429     u0_a93    10324 570   1030480 58104 SyS_epoll_ 00f41b7528 S com.example.sudogame
    430     u0_a93    10447 10441 7716   1588  sigsuspend 753c515d34 S sh
    431     u0_a93    10453 10447 9112   1644           0 7ba07ff664 R ps
    432 
    433 So process 10324 runs the app.
    434 
    435 ### 3. Download simpleperf to the apps data directory
    436 First we need to find out which architecture the app is using. There are many
    437 ways, here we just check the map of the process.
    438 
    439     angler:/data/data/com.example.sudogame $cat /proc/10324/maps | grep boot.art
    440     70f34000-7144e000 r--p 00000000 fd:00 1082  /system/framework/arm/boot.oat
    441 
    442 The file path shows it is arm. So we download simpleperf in arm directory on
    443 device.
    444 
    445     $adb push bin/android/arm/simpleperf /data/local/tmp
    446     $adb shell
    447     angler:/ $ run-as com.example.sudogame
    448     angler:/data/data/com.example.sudogame $ cp /data/local/tmp/simpleperf .
    449 
    450 ### 4. Record perf.data
    451 
    452     angler:/data/data/com.example.sudogame $./simpleperf record -p 10324 --duration 30
    453     simpleperf I 01-01 09:26:39 10598 10598 cmd_record.cpp:341] Samples recorded: 49471. Samples lost: 0.
    454     angler:/data/data/com.example.sudogame $ls -lh perf.data
    455     -rw-rw-rw- 1 u0_a93 u0_a93 2.6M 2017-01-01 09:26 perf.data
    456 
    457 Dont forget to run the app while recording. Otherwise, we may get no samples
    458 because the process is always sleeping.
    459 
    460 ### 5. Report perf.data
    461 There are different ways to report perf.data. Below shows some examples.
    462 
    463 Report samples in different threads.
    464 
    465     angler:/data/data/com.example.sudogame $./simpleperf report --sort pid,tid,comm
    466     Cmdline: /data/data/com.example.sudogame/simpleperf record -p 10324 --duration 30
    467     Arch: arm64
    468     Event: cpu-cycles (type 0, config 0)
    469     Samples: 49471
    470     Event count: 16700769019
    471 
    472     Overhead  Pid    Tid    Command
    473     66.31%    10324  10324  xample.sudogame
    474     30.97%    10324  10340  RenderThread
    475     ...
    476 
    477 Report samples in different binaries in the main thread.
    478 
    479     angler:/data/data/com.example.sudogame $./simpleperf report --tids 10324 --sort dso -n
    480     ...
    481     Overhead  Sample  Shared Object
    482     37.71%    9970    /system/lib/libc.so
    483     35.45%    9786    [kernel.kallsyms]
    484     8.71%     3305    /system/lib/libart.so
    485     6.44%     2405    /system/framework/arm/boot-framework.oat
    486     5.64%     1480    /system/lib/libcutils.so
    487     1.55%     426     /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so
    488     ...
    489 
    490 Report samples in different functions in libsudo-game-jni.so in the main thread.
    491 
    492     angler:/data/data/com.example.sudogame $./simpleperf report --tids 10324 --dsos  /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so --sort symbol -n
    493     ...
    494     Overhead  Sample  Symbol
    495     8.94%     35      libsudo-game-jni.so[+1d54]
    496     5.71%     25      libsudo-game-jni.so[+1dae]
    497     5.70%     23      @plt
    498     5.09%     22      libsudo-game-jni.so[+1d88]
    499     4.54%     19      libsudo-game-jni.so[+1d82]
    500     3.61%     14      libsudo-game-jni.so[+1f3c]
    501     ...
    502 
    503 In the above result, most symbols are binary name[+virual_addr]. It is because
    504 libsudo-game-jni.so used on device has stripped .symbol section. We can
    505 download libsudo-game-jni.so having debug information on device. In android
    506 studio project, it locates at
    507 app/build/intermediates/binaries/debug/arm/obj/armeabi-v7a/libsudo-game-jni.so.
    508 We have to download libsudo-game-jni.so to the same relative path as recorded
    509 in perf.data (otherwise, simpleperf cant find it). In this case, it is
    510 /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so.
    511 
    512 Report symbols using libraries with debug information.
    513 
    514     $adb push app/build/intermediates/binaries/debug/arm/obj/armeabi-v7a/libsudo-game-jni.so /data/local/tmp
    515     $adb shell
    516     angler:/ $ run-as com.example.sudogame
    517     angler:/data/data/com.example.sudogame $ mkdir -p data/app/com.example.sudogame-1/lib/arm
    518     angler:/data/data/com.example.sudogame $cp /data/local/tmp/libsudo-game-jni.so data/app/com.example.sudogame-1/lib/arm
    519     angler:/data/data/com.example.sudogame $./simpleperf report --tids 10324 --dsos  /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so --sort symbol -n --symfs .
    520     ...
    521     Overhead  Sample  Symbol
    522     75.18%    317     checkValid(Board const&, int, int)
    523     14.43%    60      canFindSolution_r(Board&, int, int)
    524     5.70%     23      @plt
    525     3.51%     20      randomBlock_r(Board&, int, int, int, int, int)
    526     ...
    527 
    528 Report samples in one function
    529 
    530     angler:/data/data/com.example.sudogame $./simpleperf report --tids 10324 --dsos  /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so --symbols checkValid(Board const&, int, int) --sort vaddr_in_file -n --symfs .
    531     ...
    532     Overhead  Sample  VaddrInFile
    533     11.89%    35      0x1d54
    534     7.59%     25      0x1dae
    535     6.77%     22      0x1d88
    536     6.03%     19      0x1d82
    537     ...
    538 
    539 ### 6. Record and report call graph
    540 A call graph is a tree showing function call relations. Below is an example.
    541 
    542     main() {
    543         FunctionOne();
    544         FunctionTwo();
    545     }
    546     FunctionOne() {
    547         FunctionTwo();
    548         FunctionThree();
    549     }
    550     callgraph:
    551         main-> FunctionOne
    552            |    |
    553            |    |-> FunctionTwo
    554            |    |-> FunctionThree
    555            |
    556            |-> FunctionTwo
    557 
    558 #### Record dwarf based call graph
    559 To generate call graph, simpleperf needs to generate call chain for each record.
    560 Simpleperf requests kernel to dump user stack and user register set for each
    561 record, then it backtraces the user stack to find the function call chain. To
    562 parse the call chain, it needs support of dwarf call frame information, which
    563 usually resides in .eh_frame or .debug_frame section of the binary.  So we need
    564 to use --symfs to point out where is libsudo-game-jni.so with debug information.
    565 
    566     angler:/data/data/com.example.sudogame $./simpleperf record -p 10324 -g --symfs . --duration 30
    567     simpleperf I 01-01 09:59:42 11021 11021 cmd_record.cpp:341] Samples recorded: 60700. Samples lost: 1240.
    568 
    569 Note that kernel cant dump user stack >= 64K, so the dwarf based call graph
    570 doesnt contain call chains consuming >= 64K stack. Whats more, because we
    571 need to dump stack in each record, it is likely to lost records. Usually, it
    572 doesnt matter to lost some records.
    573 
    574 #### Record stack frame based call graph
    575 Another way to generate call graph is to rely on the kernel parsing the call
    576 chain for each record. To make it possible, kernel has to be able to identify
    577 the stack frame of each function call. This is not always possible, because
    578 compilers can optimize away stack frames, or use a stack frame style not
    579 recognized by the kernel. So how well it works depends (It works well on arm64,
    580 but not well on arm).
    581 
    582     angler:/data/data/com.example.sudogame $./simpleperf record -p 10324 --call-graph fp --symfs . --duration 30
    583     simpleperf I 01-01 10:03:58 11267 11267 cmd_record.cpp:341] Samples recorded: 56736. Samples lost: 0.
    584 
    585 #### Report call graph
    586 Report accumulated period. In the table below, the first column is Children,
    587 it is the cpu cycle percentage of a function and functions called by that
    588 function. The second column is Self, it is the cpu cycle percentage of just a
    589 function. For example, checkValid() itself takes 1.28% cpus, but it takes
    590 29.43% by running itself and calling other functions.
    591 
    592     angler:/data/data/com.example.sudogame $./simpleperf report --children --symfs .
    593     ...
    594     Children  Self   Command          Pid    Tid    Shared Object                                                 Symbol
    595     31.94%    0.00%  xample.sudogame  10324  10324  [kernel.kallsyms]                                             [kernel.kallsyms][+ffffffc000204268]
    596     31.10%    0.92%  xample.sudogame  10324  10324  /system/lib/libc.so                                           writev
    597     29.43%    1.28%  xample.sudogame  10324  10324  /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so  checkValid(Board const&, int, int)
    598     28.43%    0.34%  xample.sudogame  10324  10324  /system/lib/liblog.so                                         __android_log_print
    599     28.24%    0.00%  xample.sudogame  10324  10324  /system/lib/libcutils.so                                      libcutils.so[+107b7]
    600     28.10%    0.27%  xample.sudogame  10324  10324  /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so  canFindSolution_r(Board&, int, int)
    601     ...
    602 
    603 Report call graph.
    604 
    605     angler:/data/data/com.example.sudogame $./simpleperf report -g --symfs . >report
    606     angler:/data/data/com.example.sudogame $exit
    607     angler:/ $cp /data/data/com.example.sudogame/report /data/local/tmp
    608     angler:/ $exit
    609     $adb pull /data/local/tmp/report .
    610     $cat report
    611     ...
    612     29.43%    1.28%  xample.sudogame  10324  10324  /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so  checkValid(Board const&, int, int)
    613            |
    614            -- checkValid(Board const&, int, int)
    615               |
    616               |--95.50%-- __android_log_print
    617               |    |--0.68%-- [hit in function]
    618               |    |
    619               |    |--51.84%-- __android_log_buf_write
    620               |    |    |--2.07%-- [hit in function]
    621               |    |    |
    622               |    |    |--30.74%-- libcutils.so[+c69d]
    623     ...
    624 
    625 Report call graph in callee mode. We can also show how a function is called by
    626 other functions.
    627 
    628     angler:/data/data/com.example.sudogame $./simpleperf report -g callee --symfs . >report
    629     $adb shell run-as com.example.sudogame cat report >report
    630     $cat report
    631     
    632     28.43%    0.34%  xample.sudogame  10324  10324  /system/lib/liblog.so                                         __android_log_print
    633            |
    634            -- __android_log_print
    635               |
    636               |--97.82%-- checkValid(Board const&, int, int)
    637               |    |--0.13%-- [hit in function]
    638               |    |
    639               |    |--94.89%-- canFindSolution_r(Board&, int, int)
    640               |    |    |--0.01%-- [hit in function]
    641               |    |    |
    642     ...
    643 
    644 ## Profile java code
    645 Simpleperf only supports profiling native instructions in binaries in ELF
    646 format. If the java code is executed by interpreter, or with jit cache, it
    647 cant be profiled by simpleperf. As Android supports Ahead-of-time compilation,
    648 it can compile java bytecode into native instructions with debug information.
    649 On devices with Android version <= M, we need root privilege to compile java
    650 bytecode with debug information. However, on devices with Android version >= N,
    651 we don't need root privilege to do so.
    652 
    653 ### On Android N
    654 #### 1. Fully compile java code into native instructions.
    655 
    656     $adb shell setprop debug.generate-debug-info true
    657     $adb shell cmd package compile -f -m speed com.example.sudogame
    658     // restart the app to take effect
    659 
    660 #### 2. Record perf.data
    661 
    662     angler:/data/data/com.example.sudogame $./simpleperf record -p 11826 -g --symfs . --duration 30
    663     simpleperf I 01-01 10:31:40 11859 11859 cmd_record.cpp:341] Samples recorded: 50576. Samples lost: 2139.
    664 
    665 #### 3. Report perf.data
    666 
    667     angler:/data/data/com.example.sudogame $./simpleperf report -g --symfs . >report
    668     angler:/data/data/com.example.sudogame $exit
    669     angler:/ $cp /data/data/com.example.sudogame/report /data/local/tmp
    670     angler:/ $exit
    671     $adb pull /data/local/tmp/report .
    672     $cat report
    673     ...
    674     21.14%    0.00%  xample.sudogame  11826  11826  /data/app/com.example.sudogame-1/oat/arm/base.odex            boolean com.example.sudogame.MainActivity.onOptionsItemSelected(android.view.MenuItem)
    675            |
    676            -- boolean com.example.sudogame.MainActivity.onOptionsItemSelected(android.view.MenuItem)
    677               |
    678                --99.99%-- void com.example.sudogame.GameView.startNewGame()
    679                    |--0.01%-- [hit in function]
    680                    |
    681                    |--99.87%-- void com.example.sudogame.GameModel.reInit()
    682                    |    |--0.01%-- [hit in function]
    683                    |    |
    684                    |    |--89.65%-- boolean com.example.sudogame.GameModel.canFindSolution(int[][])
    685                    |    |    |
    686                    |    |    |--99.95%-- Java_com_example_sudogame_GameModel_canFindSolution
    687                    |    |    |    |
    688                    |    |    |    |--99.49%-- canFindSolution(Board&)
    689                    |    |    |    |    |--0.01%-- [hit in function]
    690                    |    |    |    |    |
    691                    |    |    |    |    |--99.97%-- canFindSolution_r(Board&, int, int)
    692                    |    |    |    |    |           canFindSolution_r(Board&, int, int)
    693     ...
    694 
    695 ### On Android M
    696 On M devices, We need root privilege to force Android fully compiling java code
    697 into native instructions in ELF binaries with debug information. We also need
    698 root privilege to read compiled native binaries (because installd writes them
    699 to a directory whose uid/gid is system:install). So profiling java code can
    700 only be done on rooted devices.
    701 
    702     $adb root
    703     $adb shell setprop dalvik.vm.dex2oat-flags -g
    704 
    705     # Reinstall the app.
    706     $adb install -r app-debug.apk
    707 
    708     # Change to the apps data directory.
    709     $ adb root && adb shell
    710     device# cd `run-as com.example.sudogame pwd`
    711 
    712     # Record as root as simpleperf needs to read the generated native binary.
    713     device#./simpleperf record -p 25636 -g --symfs . -f 1000 --duration 30
    714     simpleperf I 01-02 07:18:20 27182 27182 cmd_record.cpp:323] Samples recorded: 23552. Samples lost: 39.
    715 
    716 ### On Android L
    717 On L devices, we also need root privilege to compile the app with debug info
    718 and access the native binaries.
    719 
    720     $adb root
    721     $adb shell setprop dalvik.vm.dex2oat-flags --include-debug-symbols
    722 
    723     # Reinstall the app.
    724     $adb install -r app-debug.apk
    725 
    726 ## Profile using scripts
    727 Although using command line is flexible, it can be too complex. So we have
    728 python scripts to help running commands.
    729 
    730 ### Record using app_profiler.py
    731 app_profiler.py is used to profile an Android application. It sets up profiling
    732 environment, downloads simpleperf and native libraries with debug information,
    733 runs simpleperf to generate perf.data, and pulls perf.data and binaries from
    734 device to host.
    735 It is configured by app_profiler.config. Below is an example.
    736 
    737 app_profiler.config:
    738 
    739     app_package_name = com.example.sudogame
    740     android_studio_project_dir = /AndroidStudioProjects/SudoGame  # absolute path of the project
    741     ...
    742     record_options = "-e cpu-cycles:u -f 4000 -g --dump-symbols --duration 30"
    743     ...
    744 
    745 run app_profiler.py:
    746 
    747     $python app_profiler.py
    748     ...
    749     INFO:root:profiling is finished.
    750 
    751 It pulls generated perf.data on host, and collects binaries from device in
    752 binary_cache.
    753 
    754 ### Report using report.py
    755 
    756     $python report.py -g
    757 
    758 It generates a GUI interface to report data.
    759 
    760 ### Process samples using simpleperf_report_lib.py
    761 simpleperf_report_lib.py provides an interface reading samples from perf.data.
    762 An example is report_sample.py.
    763 
    764 ### Show flamegraph
    765 
    766     $python report_sample.py >out.perf
    767     $stackcollapse-perf.pl out.perf >out.folded
    768     $./flamegraph.pl out.folded >a.svg
    769 
    770 ### Visualize using pprof
    771 pprof is a tool for visualization and analysis of profiling data. It can
    772 be got from https://github.com/google/pprof. pprof_proto_generator.py can
    773 generate profiling data in a format acceptable by pprof.
    774 
    775     $python pprof_proto_generator.py
    776     $pprof -pdf pprof.profile
    777 
    778 ### Annotate source code
    779 annotate.py reads perf.data and binaries in binary_cache. Then it knows which
    780 source file:line each sample hits. So it can annotate source code. annotate.py
    781 is configured by annotate.config. Below is an example.
    782 
    783 annotate.config:
    784 
    785     ...
    786     source_dirs = [/AndroidStudio/SudoGame]  # It is a directory containing source code.
    787     ...
    788 
    789 run annotate.py:
    790 
    791     $python annotate.py
    792 
    793 It generates annotated_files directory.
    794 annotated_files/summary file contains summary information for each source file.
    795 An example is as below.
    796 
    797     /AndroidStudioProjects/SudoGame/app/src/main/jni/sudo-game-jni.cpp: accumulated_period: 25.587937%, period: 1.250961%
    798       function (checkValid(Board const&, int, int)): line 99, accumulated_period: 23.564356%, period: 0.908457%
    799       function (canFindSolution_r(Board&, int, int)): line 135, accumulated_period: 22.260125%, period: 0.142359%
    800       function (canFindSolution(Board&)): line 166, accumulated_period: 22.233101%, period: 0.000000%
    801       function (Java_com_example_sudogame_GameModel_canFindSolution): line 470, accumulated_period: 21.983184%, period: 0.000000%
    802       function (Java_com_example_sudogame_GameModel_initRandomBoard): line 430, accumulated_period: 2.226896%, period: 0.000000%
    803 
    804       line 27: accumulated_period: 0.011729%, period: 0.000000%
    805       line 32: accumulated_period: 0.004362%, period: 0.000000%
    806       line 33: accumulated_period: 0.004427%, period: 0.000000%
    807       line 36: accumulated_period: 0.003303%, period: 0.000000%
    808       line 39: accumulated_period: 0.010367%, period: 0.004123%
    809       line 41: accumulated_period: 0.162219%, period: 0.000000%
    810 
    811 annotated_files/ also contains annotated source files which are found by
    812 annotate.py. For example, part of checkValid() function in libsudo-game-jni.cpp
    813 is annotated as below.
    814 
    815     /* [func] acc_p: 23.564356%, p: 0.908457% */static bool checkValid(const Board& board, int curR, int curC) {
    816     /* acc_p: 0.037933%, p: 0.037933%         */    int digit = board.digits[curR][curC];
    817     /* acc_p: 0.162355%, p: 0.162355%         */    for (int r = 0; r < BOARD_ROWS; ++r) {
    818     /* acc_p: 0.020880%, p: 0.020880%         */        if (r == curR) {
    819     /* acc_p: 0.034691%, p: 0.034691%         */            continue;
    820                                                         }
    821     /* acc_p: 0.176490%, p: 0.176490%         */        if (board.digits[r][curC] == digit) {
    822     /* acc_p: 14.957673%, p: 0.059022%        */            LOGI("conflict (%d, %d) (%d, %d)", curR, curC, r, curC);
    823     /* acc_p: 0.016296%, p: 0.016296%         */            return false;
    824                                                         }
    825                                                     }
    826