Home | History | Annotate | Download | only in doc
      1 ## Inferno
      2 
      3 ![logo](./inferno_small.png)
      4 
      5 ### Description
      6 
      7 Inferno is a flamegraph generator for native (C/C++) Android apps. It was
      8 originally written to profile and improve surfaceflinger performance
      9 (Android compositor) but it can be used for any native Android application
     10 . You can see a sample report generated with Inferno
     11 [here](./report.html). Report are self-contained in HTML so they can be
     12 exchanged easily.
     13 
     14 Notice there is no concept of time in a flame graph since all callstack are
     15 merged together. As a result, the width of a flamegraph represents 100% of
     16 the number of samples and the height is related to the number of functions on
     17 the stack when sampling occurred.
     18 
     19 
     20 ![flamegraph sample](./main_thread_flamegraph.png)
     21 
     22 In the flamegraph featured above you can see the main thread of SurfaceFlinger.
     23 It is immediatly apparent that most of the CPU time is spent processing messages
     24 `android::SurfaceFlinger::onMessageReceived`. The most expensive task is to ask
     25  the screen to be refreshed as `android::DisplayDevice::prepare` shows in orange
     26 . This graphic division helps to see what part of the program is costly and
     27 where a developer's effort to improve performances should go.
     28 
     29 ### Example of bottleneck
     30 
     31 A flamegraph give you instant vision on the CPU cycles cost centers but
     32 it can also be used to find specific offenders. To find them, look for
     33 plateaus. It is easier to see an example:
     34 
     35 ![flamegraph sample](./bottleneck.png)
     36 
     37 In the previous flamegraph, two
     38 plateaus (due to `android::BufferQueueCore::validateConsistencyLocked`)
     39 are immediately apparent.
     40 
     41 ### How it works
     42 Inferno relies on simpleperf to record the callstack of a native application
     43 thousands of times per second. Simpleperf takes care of unwinding the stack
     44 either using frame pointer (recommended) or dwarf. At the end of the recording
     45 `simpleperf` also symbolize all IPs automatically. The record are aggregated and
     46 dumps dumped to a file `perf.data`. This file is pulled from the Android device
     47 and processed on the host by Inferno. The callstacks are merged together to
     48 visualize in which part of an app the CPU cycles are spent.
     49 
     50 ### How to use it
     51 
     52 Open a terminal and from `simpleperf/scripts` directory type:
     53 ```
     54 ./inferno.sh  (on Linux/Mac)
     55 inferno.bat (on Windows)
     56 ```
     57 
     58 Inferno will collect data, process them and automatically open your web browser
     59 to display the HTML report.
     60 
     61 ### Parameters
     62 
     63 You can select how long to sample for, the color of the node and many other
     64 things. Use `-h` to get a list of all supported parameters.
     65 
     66 ```
     67 ./inferno.sh -h
     68 ```
     69 
     70 ### Troubleshooting
     71 
     72 #### Messy flame graph
     73 A healthy flame graph features a single call site at its base (see [here](./report.html)).
     74 If you don't see a unique call site like `_start` or `_start_thread` at the base
     75 from which all flames originate, something went wrong. : Stack unwinding may
     76 fail to reach the root callsite. These incomplete
     77 callstack are impossible to merge properly. By default Inferno asks
     78  `simpleperf` to unwind the stack via the kernel and frame pointers. Try to
     79  perform unwinding with dwarf `-du`, you can further tune this setting.
     80 
     81 
     82 #### No flames
     83 If you see no flames at all or a mess of 1 level flame without a common base,
     84 this may be because you compiled without frame pointers. Make sure there is no
     85 ` -fomit-frame-pointer` in your build config. Alternatively, ask simpleperf to
     86 collect data with dward unwinding `-du`.
     87 
     88 
     89 
     90 #### High percentage of lost samples
     91 
     92 If simpleperf reports a lot of lost sample it is probably because you are
     93 unwinding with `dwarf`. Dwarf unwinding involves copying the stack before it is
     94 processed. Try to use frame pointer unwinding which can be done by the kernel
     95 and it much faster.
     96 
     97 The cost of frame pointer is negligible on arm64 parameter but considerable
     98  on arm 32-bit arch (due to register pressure). Use a 64-bit build for better
     99  profiling.
    100 
    101 #### run-as: package not debuggable
    102 If you cannot run as root, make sure the app is debuggable otherwise simpleperf
    103 will not be able to profile it.
    104