Home | History | Annotate | Download | only in graphics
      1 <html devsite>
      2   <head>
      3     <title>Implementing VSYNC</title>
      4     <meta name="project_path" value="/_project.yaml" />
      5     <meta name="book_path" value="/_book.yaml" />
      6   </head>
      7   <body>
      8   <!--
      9       Copyright 2017 The Android Open Source Project
     10 
     11       Licensed under the Apache License, Version 2.0 (the "License");
     12       you may not use this file except in compliance with the License.
     13       You may obtain a copy of the License at
     14 
     15           http://www.apache.org/licenses/LICENSE-2.0
     16 
     17       Unless required by applicable law or agreed to in writing, software
     18       distributed under the License is distributed on an "AS IS" BASIS,
     19       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     20       See the License for the specific language governing permissions and
     21       limitations under the License.
     22   -->
     23 
     24 
     25 
     26 
     27 <p>VSYNC synchronizes certain events to the refresh cycle of the display.
     28 Applications always start drawing on a VSYNC boundary, and SurfaceFlinger
     29 always composites on a VSYNC boundary. This eliminates stutters and improves
     30 visual performance of graphics.</p>
     31 
     32 <p>The Hardware Composer (HWC) has a function pointer indicating the function
     33 to implement for VSYNC:</p>
     34 
     35 <pre class="prettyprint">
     36 int (waitForVsync*) (int64_t *timestamp)
     37 </pre>
     38 
     39 <p>This function blocks until a VSYNC occurs and returns the timestamp of the
     40 actual VSYNC. A message must be sent every time VSYNC occurs. A client can
     41 receive a VSYNC timestamp once at specified intervals or continuously at
     42 intervals of 1. You must implement VSYNC with a maximum 1 ms lag (0.5 ms or less
     43 is recommended); timestamps returned must be extremely accurate.</p>
     44 
     45 <h2 id=explicit_synchronization>Explicit synchronization</h2>
     46 
     47 <p>Explicit synchronization is required and provides a mechanism for Gralloc
     48 buffers to be acquired and released in a synchronized way. Explicit
     49 synchronization allows producers and consumers of graphics buffers to signal
     50 when they are done with a buffer. This allows Android to asynchronously queue
     51 buffers to be read or written with the certainty that another consumer or
     52 producer does not currently need them. For details, see
     53 <a href="/devices/graphics/index.html#synchronization_framework">Synchronization
     54 framework</a>.</p>
     55 
     56 <p>The benefits of explicit synchronization include less behavior variation
     57 between devices, better debugging support, and improved testing metrics. For
     58 instance, the sync framework output readily identifies problem areas and root
     59 causes, and centralized SurfaceFlinger presentation timestamps show when events
     60 occur in the normal flow of the system.</p>
     61 
     62 <p>This communication is facilitated by the use of synchronization fences,
     63 which are required when requesting a buffer for consuming or producing. The
     64 synchronization framework consists of three main building blocks:
     65 <code>sync_timeline</code>, <code>sync_pt</code>, and <code>sync_fence</code>.</p>
     66 
     67 <h3 id=sync_timeline>sync_timeline</h3>
     68 
     69 <p>A <code>sync_timeline</code> is a monotonically increasing timeline that
     70 should be implemented for each driver instance, such as a GL context, display
     71 controller, or 2D blitter. This is essentially a counter of jobs submitted to
     72 the kernel for a particular piece of hardware. It provides guarantees about the
     73 order of operations and allows hardware-specific implementations.</p>
     74 
     75 <p>The sync_timeline is offered as a CPU-only reference implementation called
     76 <code>sw_sync</code> (software sync). If possible, use this instead of a
     77 <code>sync_timeline</code> to save resources and avoid complexity. If youre not
     78 employing a hardware resource, <code>sw_sync</code> should be sufficient.</p>
     79 
     80 <p>If you must implement a <code>sync_timeline</code>, use the
     81 <code>sw_sync</code> driver as a starting point. Follow these guidelines:</p>
     82 
     83 <ul>
     84 <li>Provide useful names for all drivers, timelines, and fences. This simplifies
     85 debugging.</li>
     86 <li>Implement <code>timeline_value_str</code> and <code>pt_value_str</code>
     87 operators in your timelines to make debugging output more readable.</li>
     88 <li>If you want your userspace libraries (such as the GL library) to have access
     89 to the private data of your timelines, implement the fill driver_data operator.
     90 This lets you get information about the immutable sync_fence and
     91 <code>sync_pts</code> so you can build command lines based upon them.</li>
     92 </ul>
     93 
     94 <p>When implementing a <code>sync_timeline</code>, <strong>do not</strong>:</p>
     95 
     96 <ul>
     97 <li>Base it on any real view of time, such as when a wall clock or other piece
     98 of work might finish. It is better to create an abstract timeline that you can
     99 control.</li>
    100 <li>Allow userspace to explicitly create or signal a fence. This can result in
    101 one piece of the user pipeline creating a denial-of-service attack that halts
    102 all functionality. This is because the userspace cannot make promises on behalf
    103 of the kernel.</li>
    104 <li>Access <code>sync_timeline</code>, <code>sync_pt</code>, or
    105 <code>sync_fence</code> elements explicitly, as the API should provide all
    106 required functions.</li>
    107 </ul>
    108 
    109 <h3 id=sync_pt>sync_pt</h3>
    110 
    111 <p>A <code>sync_pt</code> is a single value or point on a sync_timeline. A point
    112 has three states: active, signaled, and error. Points start in the active state
    113 and transition to the signaled or error states. For instance, when a buffer is
    114 no longer needed by an image consumer, this sync_point is signaled so image
    115 producers know it is okay to write into the buffer again.</p>
    116 
    117 <h3 id=sync_fence>sync_fence</h3>
    118 
    119 <p>A <code>sync_fence</code> is a collection of <code>sync_pts</code> that often
    120 have different <code>sync_timeline</code> parents (such as for the display
    121 controller and GPU). These are the main primitives over which drivers and
    122 userspace communicate their dependencies. A fence is a promise from the kernel
    123 given upon accepting work that has been queued and assures completion in a
    124 finite amount of time.</p>
    125 
    126 <p>This allows multiple consumers or producers to signal they are using a
    127 buffer and to allow this information to be communicated with one function
    128 parameter. Fences are backed by a file descriptor and can be passed from
    129 kernel-space to user-space. For instance, a fence can contain two
    130 <code>sync_points</code> that signify when two separate image consumers are done
    131 reading a buffer. When the fence is signaled, the image producers know both
    132 consumers are done consuming.</p>
    133 
    134 <p>Fences, like <code>sync_pts</code>, start active and then change state based
    135 upon the state of their points. If all <code>sync_pts</code> become signaled,
    136 the <code>sync_fence</code> becomes signaled. If one <code>sync_pt</code> falls
    137 into an error state, the entire sync_fence has an error state.</p>
    138 
    139 <p>Membership in the <code>sync_fence</code> is immutable after the fence is
    140 created. As a <code>sync_pt</code> can be in only one fence, it is included as a
    141 copy. Even if two points have the same value, there will be two copies of the
    142 <code>sync_pt</code> in the fence. To get more than one point in a fence, a
    143 merge operation is conducted where points from two distinct fences are added to
    144 a third fence. If one of those points was signaled in the originating fence and
    145 the other was not, the third fence will also not be in a signaled state.</p>
    146 
    147 <p>To implement explicit synchronization, provide the following:</p>
    148 
    149 <ul>
    150 <li>A kernel-space driver that implements a synchronization timeline for a
    151 particular piece of hardware. Drivers that need to be fence-aware are generally
    152 anything that accesses or communicates with the Hardware Composer. Key files
    153 include:
    154 <ul>
    155 <li>Core implementation:
    156 <ul>
    157  <li><code>kernel/common/include/linux/sync.h</code></li>
    158  <li><code>kernel/common/drivers/base/sync.c</code></li>
    159 </ul></li>
    160 <li><code>sw_sync</code>:
    161 <ul>
    162  <li><code>kernel/common/include/linux/sw_sync.h</code></li>
    163  <li><code>kernel/common/drivers/base/sw_sync.c</code></li>
    164 </ul></li>
    165 <li>Documentation at <code>kernel/common//Documentation/sync.txt</code>.</li>
    166 <li>Library to communicate with the kernel-space in
    167  <code>platform/system/core/libsync</code>.</li>
    168 </ul></li>
    169 <li>A Hardware Composer HAL module (v1.3 or higher) that supports the new
    170 synchronization functionality. You must provide the appropriate synchronization
    171 fences as parameters to the <code>set()</code> and <code>prepare()</code>
    172 functions in the HAL.</li>
    173 <li>Two fence-related GL extensions (<code>EGL_ANDROID_native_fence_sync</code>
    174 and <code>EGL_ANDROID_wait_sync</code>) and fence support in your graphics
    175 drivers.</li>
    176 </ul>
    177 
    178 <p>For example, to use the API supporting the synchronization function, you
    179 might develop a display driver that has a display buffer function. Before the
    180 synchronization framework existed, this function would receive dma-bufs, put
    181 those buffers on the display, and block while the buffer is visible. For
    182 example:</p>
    183 
    184 <pre class="prettyprint">
    185 /*
    186  * assumes buf is ready to be displayed.  returns when buffer is no longer on
    187  * screen.
    188  */
    189 void display_buffer(struct dma_buf *buf);
    190 </pre>
    191 
    192 <p>With the synchronization framework, the API call is slightly more complex.
    193 While putting a buffer on display, you associate it with a fence that says when
    194 the buffer will be ready. You can queue up the work and initiate after the fence
    195 clears.</p>
    196 
    197 <p>In this manner, you are not blocking anything. You immediately return your
    198 own fence, which is a guarantee of when the buffer will be off of the display.
    199 As you queue up buffers, the kernel will list dependencies with the
    200 synchronization framework:</p>
    201 
    202 <pre class="prettyprint">
    203 /*
    204  * will display buf when fence is signaled.  returns immediately with a fence
    205  * that will signal when buf is no longer displayed.
    206  */
    207 struct sync_fence* display_buffer(struct dma_buf *buf, struct sync_fence
    208 *fence);
    209 </pre>
    210 
    211 
    212 <h2 id=sync_integration>Sync integration</h2>
    213 <p>This section explains how to integrate the low-level sync framework with
    214 different parts of the Android framework and the drivers that must communicate
    215 with one another.</p>
    216 
    217 <h3 id=integration_conventions>Integration conventions</h3>
    218 
    219 <p>The Android HAL interfaces for graphics follow consistent conventions so
    220 when file descriptors are passed across a HAL interface, ownership of the file
    221 descriptor is always transferred. This means:</p>
    222 
    223 <ul>
    224 <li>If you receive a fence file descriptor from the sync framework, you must
    225 close it.</li>
    226 <li>If you return a fence file descriptor to the sync framework, the framework
    227 will close it.</li>
    228 <li>To continue using the fence file descriptor, you must duplicate the
    229 descriptor.</li>
    230 </ul>
    231 
    232 <p>Every time a fence passes through BufferQueue (such as for a window that
    233 passes a fence to BufferQueue saying when its new contents will be ready) the
    234 fence object is renamed. Since kernel fence support allows fences to have
    235 strings for names, the sync framework uses the window name and buffer index
    236 that is being queued to name the fence (i.e., <code>SurfaceView:0</code>). This
    237 is helpful in debugging to identify the source of a deadlock as the names appear
    238 in the output of <code>/d/sync</code> and bug reports.</p>
    239 
    240 <h3 id=anativewindow_integration>ANativeWindow integration</h3>
    241 
    242 <p>ANativeWindow is fence aware and <code>dequeueBuffer</code>,
    243 <code>queueBuffer</code>, and <code>cancelBuffer</code> have fence parameters.
    244 </p>
    245 
    246 <h3 id=opengl_es_integration>OpenGL ES integration</h3>
    247 
    248 <p>OpenGL ES sync integration relies upon two EGL extensions:</p>
    249 
    250 <ul>
    251 <li><code>EGL_ANDROID_native_fence_sync</code>. Provides a way to either
    252 wrap or create native Android fence file descriptors in EGLSyncKHR objects.</li>
    253 <li><code>EGL_ANDROID_wait_sync</code>. Allows GPU-side stalls rather than in
    254 CPU, making the GPU wait for an EGLSyncKHR. This is essentially the same as the
    255 <code>EGL_KHR_wait_sync</code> extension (refer to that specification for
    256 details).</li>
    257 </ul>
    258 
    259 <p>These extensions can be used independently and are controlled by a compile
    260 flag in libgui. To use them, first implement the
    261 <code>EGL_ANDROID_native_fence_sync</code> extension along with the associated
    262 kernel support. Next, add a ANativeWindow support for fences to your driver then
    263 turn on support in libgui to make use of the
    264 <code>EGL_ANDROID_native_fence_sync</code> extension.</p>
    265 
    266 <p>In a second pass, enable the <code>EGL_ANDROID_wait_sync</code>
    267 extension in your driver and turn it on separately. The
    268 <code>EGL_ANDROID_native_fence_sync</code> extension consists of a distinct
    269 native fence EGLSync object type so extensions that apply to existing EGLSync
    270 object types dont necessarily apply to <code>EGL_ANDROID_native_fence</code>
    271 objects to avoid unwanted interactions.</p>
    272 
    273 <p>The EGL_ANDROID_native_fence_sync extension employs a corresponding native
    274 fence file descriptor attribute that can be set only at creation time and
    275 cannot be directly queried onward from an existing sync object. This attribute
    276 can be set to one of two modes:</p>
    277 
    278 <ul>
    279 <li><em>A valid fence file descriptor</em>. Wraps an existing native Android
    280 fence file descriptor in an EGLSyncKHR object.</li>
    281 <li><em>-1</em>. Creates a native Android fence file descriptor from an
    282 EGLSyncKHR object.</li>
    283 </ul>
    284 
    285 <p>The DupNativeFenceFD function call is used to extract the EGLSyncKHR object
    286 from the native Android fence file descriptor. This has the same result as
    287 querying the attribute that was set but adheres to the convention that the
    288 recipient closes the fence (hence the duplicate operation). Finally, destroying
    289 the EGLSync object should close the internal fence attribute.</p>
    290 
    291 <h3 id=hardware_composer_integration>Hardware Composer integration</h3>
    292 
    293 <p>The Hardware Composer handles three types of sync fences:</p>
    294 
    295 <ul>
    296 <li><em>Acquire fence</em>. One per layer, set before calling
    297 <code>HWC::set</code>. It signals when Hardware Composer may read the buffer.</li>
    298 <li><em>Release fence</em>. One per layer, filled in by the driver in
    299 <code>HWC::set</code>. It signals when Hardware Composer is done reading the
    300 buffer so the framework can start using that buffer again for that particular
    301 layer.</li>
    302 <li><em>Retire fence</em>. One per the entire frame, filled in by the driver
    303 each time <code>HWC::set</code> is called. This covers all layers for the set
    304 operation and signals to the framework when all effects of this set operation
    305 have completed. The retire fence signals when the next set operation takes place
    306 on the screen.</li>
    307 </ul>
    308 
    309 <p>The retire fence can be used to determine how long each frame appears on the
    310 screen. This is useful in identifying the location and source of delays, such
    311 as a stuttering animation.</p>
    312 
    313 <h2 id=vsync_offset>VSYNC offset</h2>
    314 
    315 <p>Application and SurfaceFlinger render loops should be synchronized to the
    316 hardware VSYNC. On a VSYNC event, the display begins showing frame N while
    317 SurfaceFlinger begins compositing windows for frame N+1. The app handles
    318 pending input and generates frame N+2.</p>
    319 
    320 <p>Synchronizing with VSYNC delivers consistent latency. It reduces errors in
    321 apps and SurfaceFlinger and the drifting of displays in and out of phase with
    322 each other. This, however, does assume application and SurfaceFlinger per-frame
    323 times dont vary widely. Nevertheless, the latency is at least two frames.</p>
    324 
    325 <p>To remedy this, you can employ VSYNC offsets to reduce the input-to-display
    326 latency by making application and composition signal relative to hardware
    327 VSYNC. This is possible because application plus composition usually takes less
    328 than 33 ms.</p>
    329 
    330 <p>The result of VSYNC offset is three signals with same period, offset
    331 phase:</p>
    332 
    333 <ul>
    334 <li><code>HW_VSYNC_0</code>. Display begins showing next frame.</li>
    335 <li><code>VSYNC</code>. App reads input and generates next frame.</li>
    336 <li><code>SF VSYNC</code>. SurfaceFlinger begins compositing for next frame.</li>
    337 </ul>
    338 
    339 <p>With VSYNC offset, SurfaceFlinger receives the buffer and composites the
    340 frame, while the application processes the input and renders the frame, all
    341 within a single frame of time.</p>
    342 
    343 <p class="note"><strong>Note:</strong> VSYNC offsets reduce the time available
    344 for app and composition and therefore provide a greater chance for error.</p>
    345 
    346 <h3 id=dispsync>DispSync</h3>
    347 
    348 <p>DispSync maintains a model of the periodic hardware-based VSYNC events of a
    349 display and uses that model to execute periodic callbacks at specific phase
    350 offsets from the hardware VSYNC events.</p>
    351 
    352 <p>DispSync is essentially a software phase lock loop (PLL) that generates the
    353 VSYNC and SF VSYNC signals used by Choreographer and SurfaceFlinger, even if
    354 not offset from hardware VSYNC.</p>
    355 
    356 <img src="images/dispsync.png" alt="DispSync flow">
    357 
    358 <p class="img-caption"><strong>Figure 1.</strong> DispSync flow</p>
    359 
    360 <p>DispSync has the following qualities:</p>
    361 
    362 <ul>
    363 <li><em>Reference</em>. HW_VSYNC_0.</li>
    364 <li><em>Output</em>. VSYNC and SF VSYNC.</li>
    365 <li><em>Feedback</em>. Retire fence signal timestamps from Hardware Composer.
    366 </li>
    367 </ul>
    368 
    369 <h3 id=vsync_retire_offset>VSYNC/Retire offset</h3>
    370 
    371 <p>The signal timestamp of retire fences must match HW VSYNC even on devices
    372 that dont use the offset phase. Otherwise, errors appear to have greater
    373 severity than reality. Smart panels often have a delta: Retire fence is the end
    374 of direct memory access (DMA) to display memory, but the actual display switch
    375 and HW VSYNC is some time later.</p>
    376 
    377 <p><code>PRESENT_TIME_OFFSET_FROM_VSYNC_NS</code> is set in the devices
    378 BoardConfig.mk make file. It is based upon the display controller and panel
    379 characteristics. Time from retire fence timestamp to HW VSYNC signal is
    380 measured in nanoseconds.</p>
    381 
    382 <h3 id=vsync_and_sf_vsync_offsets>VSYNC and SF_VSYNC offsets</h3>
    383 
    384 <p>The <code>VSYNC_EVENT_PHASE_OFFSET_NS</code> and
    385 <code>SF_VSYNC_EVENT_PHASE_OFFSET_NS</code> are set conservatively based on
    386 high-load use cases, such as partial GPU composition during window transition
    387 or Chrome scrolling through a webpage containing animations. These offsets
    388 allow for long application render time and long GPU composition time.</p>
    389 
    390 <p>More than a millisecond or two of latency is noticeable. We recommend
    391 integrating thorough automated error testing to minimize latency without
    392 significantly increasing error counts.</p>
    393 
    394 <p class="note"><strong>Note:</strong> Theses offsets are also configured in the
    395 devices BoardConfig.mk file. Both settings are offset in nanoseconds after
    396 HW_VSYNC_0, default to zero (if not set), and can be negative.</p>
    397 
    398   </body>
    399 </html>
    400