Home | History | Annotate | Download | only in systrace_parser
      1 Contract between NNAPI systrace code and parser
      2 ===============================================
      3 
      4 This text files documents how tracing in the NNAPI manifests in systrace output
      5 and how that output is interpreted by the systrace parser.
      6 
      7 Please view in a 160 character window.
      8 
      9 Special cases
     10 -------------
     11 
     12 - Execution is exposed as an asynchronous event from the runtime. Time taken
     13   by the runtime is calculated as the time between start of
     14   ANeuralNetworksExecution_startCompute and end of
     15   ANeuralNetworksEvent_wait. This special case is not reflected in the cases
     16   presented below.
     17 
     18 Notation
     19 --------
     20 - ...: elided code
     21 - t_m_w: tracing_mark_write
     22 - tX: timestamps
     23 - T1, T2: thread ids
     24 
     25 
     26 Cases for the parser
     27 ====================
     28 
     29 Source code                                    Systrace                                                   Interpretation for timing statistics - all
     30                                                                                                           times are wallclock
     31 ------------------------------------------------------------------------------------------------------------------------------------------------------
     32 
     33 *Baseline case*
     34 
     35 ... funcP(...) {                               t0: t_m_w:B|T1|[NN_LR_PP]funcP                             Add (t1-t0) to total time spent in Layer
     36   NNTRACE_RT(NNTRACE_PHASE_PREPARATION,        t1: t_m_w:E|T1                                             Runtime, Phase Preparation
     37              "funcP);
     38   ...
     39 }
     40 
     41 ------------------------------------------------------------------------------------------------------------------------------------------------------
     42 
     43 *Local call to other layer*
     44 
     45 ... funcA1(...) {                              t0:   t_m_w:B|T1|[NN_LA_PP]funcE1                          Add (t3-t0) to total time spent in Layer
     46   NNTRACE_APP(NNTRACE_PHASE_PREPARATION,       t1:   t_m_w:B|T1|[NN_LR_PP]funcC1                          Application, Phase Preparation
     47               "funcA1);                        t2:   t_m_w:E|T1
     48   ... funcR1(...);                             t3:   t_m_w:E|T1                                           Add (t2-t1) to total time spent in Layer
     49   ...                                                                                                     Runtime, Phase Preparation
     50 }
     51 ... funcR1(...) {                                                                                         Note: Self-time of Layer Application,
     52   NNTRACE_RT(NNTRACE_PHASE_PREPARATION,                                                                   Phase Preparation will be calculated as
     53              "funcR1"); ...                                                                               total time in Layer Application - total time
     54 }                                                                                                         in Layer Runtime
     55 
     56                                                                                                           Note: These can be nested as per rows
     57                                                                                                           below ("Switch phase ...", "Subphases ...",
     58                                                                                                           "Additional detail...")
     59 
     60                                                                                                           Note: If the called function specifies a
     61                                                                                                           phase that is not supposed to be nested,
     62                                                                                                           the parser will emit a diagnostic.
     63 
     64 ------------------------------------------------------------------------------------------------------------------------------------------------------
     65 
     66 *Switch phase during function*
     67 
     68 ... funcC1(...) {                              t0:   t_m_w:B|T1|[NN_LC_PTR]funcC1                         Add (t1-t0) to total time spent in Layer
     69   NNTRACE_TRANS("funcC1");                     t1:   t_m_w:B|T1|[SW][NN_LC_PCO]funcC1                     CPU, Phase Transformation
     70   ...                                          t2:   t_m_w:E|T1
     71   NNTRACE_COMP_SWITCH("funcC1");               t3:   t_m_w:E|T1                                           Add (t2-t1) to total time spent in Layer
     72   ...                                                                                                     CPU, Phase Computation
     73 }
     74                                                                                                           (t3-t2 treated as negligible - only the
     75                                                                                                           destructors of objects created between
     76                                                                                                           the tracepoints)
     77 
     78 ------------------------------------------------------------------------------------------------------------------------------------------------------
     79 
     80 *Subphases of execution*
     81 
     82 ... funcR2(...) {                              t0:   t_m_w:B|T1|[NN_LR_PE]funcR2                          Add (t2-t1) to total time spent in Layer
     83   NNTRACE_RT(NNTRACE_PHASE_EXECUTION,          t1:   t_m_w:B|T1|[NN_LC_PCO]funcC2                         CPU, (sub)Phase Computation and to
     84              "funcR2);                         t2:   t_m_w:E|T1                                           total time in Phase Execution
     85   ... funcC2(...);                             t3:   t_m_w:E|T1
     86   ...                                                                                                     Add (t3-t0) to total time spent in Layer
     87 }                                                                                                         Runtime, Phase Execution
     88 ... funcC2(...) {
     89   NNTRACE_COMP("funcC2");
     90   ...
     91 }
     92 
     93 ------------------------------------------------------------------------------------------------------------------------------------------------------
     94 
     95 *Additional detail in the same layer*
     96 
     97 ... funcR3(...) {                              t0:   t_m_w:B|T1|[NN_LR_PE]funcR3                          Add (t3-t0) to total time spent in Layer
     98   NNTRACE_RT(NNTRACE_PHASE_EXECUTION,          t1:   t_m_w:B|T1|[NN_LR_PE]funcR4                          Runtime, Phase Execution
     99              "funcR3);                         t2:   t_m_w:E|T1
    100   ... funcR4(...);                             t3:   t_m_w:E|T1                                           Note: funcR4 will be visible in the systrace
    101   ...                                                                                                     visualization
    102 }
    103 ... funcR4(...) {
    104   NNTRACE_RT(NNTRACE_PHASE_EXECUTION,
    105              "funcR4");
    106   ...
    107 }
    108 
    109 
    110 ------------------------------------------------------------------------------------------------------------------------------------------------------
    111 
    112 *Synchronous IPC call*
    113 
    114 ... funcR5(...) {                              t0:   t_m_w:B|T1|[NN_LR_PC]funcR5                          Add (t5-t0) - (t4-r1) to total time spent in
    115   NNTRACE_RT(NNTRACE_PHASE_COMPILATION,        t1:   t_m_w:B|T1|[NN_LI_PI]getCapabilities                 Layer Runtime, Phase Compilation; see
    116              "funcR5");                        t2:   t_m_w:B|T1|HIDL::IDevice::getCapabilities::client    "Onetime initialization code".
    117   ... device->getCapabilities()                t3:   t_m_w:E|T1
    118   ...                                          t4:   t_m_w:E|T1                                           Add (t4-t1) to total time spent in Layer
    119 }                                              t5:   t_m_w:E|T1                                           IPC, Phase Initialization
    120 ... VersionedIDevice::getCapabilities(...) {
    121    NTRACE_FULL(NNTRACE_LAYER_IPC,                                                                         Note: Self-time of Layer Runtime, Phase
    122                NNTRACE_PHASE_COMPILATION,                                                                 Compilation will be calculated as total
    123               "getCapabilities");                                                                         time in Layer Runtime - total time in Layer
    124                                                                                                           IPC
    125 }
    126                                                                                                           Note: Tracepoints are needed for the
    127                                                                                                           client IPC calls. The HIDL tracing isn't
    128                                                                                                           guaranteed to wait for the server - it just
    129                                                                                                           sends the transaction even if the call is
    130                                                                                                           synchronous.
    131 
    132 ------------------------------------------------------------------------------------------------------------------------------------------------------
    133 
    134 *Asynchronous IPC call that is synchronously waited for by the runtime*
    135 
    136 // Runtime code                                t0: t_m_w:B|T1|[NN_LI_PC]prepareModel                      Add (t10-t0) to total time spent in Layer
    137 ... funcRC(...) {                              t1: t_m_w:B|T1|HIDL::IDevice::prepareModel_1_1::client     IPC, Phase Compilation
    138    ...                                         t2: t_m_w:B|T2|HIDL::IDevice::prepareModel_1_1::server
    139    NTRACE_FULL(NNTRACE_LAYER_IPC,              t3: t_m_w:B|T2|[NN_LD_PC]SampleDriver::prepareModel        Add (t6-t2) to total time spent in Layer
    140                NNTRACE_PHASE_COMPILATION,      t4: t_m_w:B|T2|HIDL::IPreparedModelCallback::notify::clie  Driver, Phase Compilation. This includes
    141               "prapareModel");                 t5: t_m_w:E|T2                                             the generated HIDL stub code, which is
    142    ...                                         t6: t_m_w:E|T2                                             <0.05ms.
    143    device->prepareModel(...);                  t7: t_m_w:E|T2
    144    ...                                         t8: t_m_w:B|T1|HIDL::IPreparedModelCallback::notify::serv  Note: the HIDL trace rows are added by
    145    cb->wait();                                 t9: t_m_w:E|T1                                             the automatically generated proxy and
    146    ...                                         t10: t_m_w:E|T1                                            stub code. For the driver side, the
    147 }                                              t11: t_m_w:E|T1                                            mapping of the HIDL functions to layers
    148                                                                                                           and phases is done in the systrace
    149 // Driver code                                                                                            parser.
    150 ... SampleDriver::prepareModel(...) {
    151   NNTRACE_FULL(NNTRACE_LAYER_DRIVER,                                                                      Note: the SampleDriver::prepareModel is
    152                NNTRACE_PHASE_COMPILATION,                                                                 treated as additional detail for Layer
    153                "SampleDriver::prepareModel");                                                             Driver, Phase Compilation.
    154 }
    155                                                                                                           Note: the "t_m_w" output of
    156                                                                                                           systrace uses thread ids, so that the call
    157                                                                                                           stack can be reconstructed. The systrace
    158                                                                                                           rows are also annotated with process ids.
    159                                                                                                           The parser uses the process ids to
    160                                                                                                           distinguish between the application
    161                                                                                                           process from the driver process (used for
    162                                                                                                           diagnostics and for distinguishing CPU
    163                                                                                                           fallback from sample driver).
    164 
    165                                                                                                           Note: the next row in this table gives more
    166                                                                                                           detail for prepareModel specifically
    167 
    168                                                                                                           Note: the driver-side HIDL traces get us
    169                                                                                                           the time spent in sample and hvx drivers.
    170                                                                                                           With a different driver threading model
    171                                                                                                           this may not be the case - future drivers
    172                                                                                                           should add manual tracing.
    173 
    174                                                                                                           TODO: attribute driver process IPC call
    175                                                                                                           (callback) overhead to IPC layer.
    176 
    177 ------------------------------------------------------------------------------------------------------------------------------------------------------
    178 
    179 *Subtracting time when nesting is violated*
    180 
    181 // Runtime code                                t0:   t_m_w:B|T1|[NN_LI_PC]prepareModel                    Add (t3 - t0) - (t2 - t1) to total time spent
    182 ... funcRC(...) {                              t1:   t_m_w:B|T1|[SUB][NN_LR_PC]VersionedIDevice::prepareM in Layer IPC, Phase compilation
    183    ...                                         t2:   t_m_w:E|T1
    184    NTRACE_FULL(NNTRACE_LAYER_IPC,              t3:   t_m_w:E|T1                                           Add (t2-t1) to total time spent in Layer
    185                NNTRACE_PHASE_COMPILATION,                                                                 Runtime, Phase compilation
    186               "prapareModel");
    187    ...
    188    device->prepareModel(...);
    189    ...
    190    cb->wait();
    191    ...
    192 }
    193 
    194 ... VersionedIDevice::prepareModel(...) {
    195     // IPC work
    196     {
    197         NNTRACE_FULL_SUBTRACT(
    198             NNTRACE_LAYER_RUNTIME,
    199             NNTRACE_PHASE_COMPILATION,
    200             "VersionedIDevice::prepareModel");
    201         // Runtime work
    202     }
    203     // IPC work
    204 }
    205 
    206 ------------------------------------------------------------------------------------------------------------------------------------------------------
    207 
    208 *Onetime initialization code*
    209 
    210 ... funcR5(...) {                              t0:   t_m_w:B|T1|[NN_LR_PP]funcR5                          Add (t2-t1) to total time spent in Layer
    211   NNTRACE_RT(NNTRACE_PHASE_PREPARATION,        t1:   t_m_w:B|T1|[NN_LR_PI]funcI                           Runtime, Phase Initialization
    212              "runcR5);                         t2:   t_m_w:E|T1
    213   ... funcI(...);                              t3:   t_m_w:E|T1                                           Add (t3 - t0) - (t2 - t1) to total time spent
    214   ...                                                                                                     in Layer Runtime, Phase Preparation.
    215 }
    216 ... funcI(...) {
    217   NNTRACE_RT(NNTRACE_PHASE_INITIALIZATION,
    218              "funcI")
    219   ...
    220 }
    221 
    222 ------------------------------------------------------------------------------------------------------------------------------------------------------
    223 
    224 *Utility code*
    225 
    226 ... funcR6(...) {                              t0:   t_m_w:B|T1|[NN_LR_PP]funcR6                          Add (t3-t0) to total time spent in Layer
    227   NNTRACE_RT(NNTRACE_PHASE_PREPARATION,        t1:   t_m_w:B|T1|[NN_LU_PU]funcU                           Runtime, Phase Preparation
    228              "funcR6");                        t2:   t_m_w:E|T1
    229   ... funcU(...);                              t3:   t_m_w:E|T1                                           Note: the funcU is treated as additional
    230   ...                                                                                                     detail.
    231 }
    232 ... funcU(...) {
    233   NNTRACE_FULL(NNTRACE_LAYER_UTILITY,
    234                NNTRACE_PHASE_UNSPECIFIED,
    235                "funcU")
    236   ...
    237 }
    238