1 page.title=Implementing VSYNC 2 @jd:body 3 4 <!-- 5 Copyright 2016 The Android Open Source Project 6 7 Licensed under the Apache License, Version 2.0 (the "License"); 8 you may not use this file except in compliance with the License. 9 You may obtain a copy of the License at 10 11 http://www.apache.org/licenses/LICENSE-2.0 12 13 Unless required by applicable law or agreed to in writing, software 14 distributed under the License is distributed on an "AS IS" BASIS, 15 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 16 See the License for the specific language governing permissions and 17 limitations under the License. 18 --> 19 20 <div id="qv-wrapper"> 21 <div id="qv"> 22 <h2>In this document</h2> 23 <ol id="auto-toc"> 24 </ol> 25 </div> 26 </div> 27 28 29 <p>VSYNC synchronizes certain events to the refresh cycle of the display. 30 Applications always start drawing on a VSYNC boundary, and SurfaceFlinger 31 always composites on a VSYNC boundary. This eliminates stutters and improves 32 visual performance of graphics.</p> 33 34 <p>The Hardware Composer (HWC) has a function pointer indicating the function 35 to implement for VSYNC:</p> 36 37 <pre class=prettyprint> int (waitForVsync*) (int64_t *timestamp) </pre> 38 39 <p>This function blocks until a VSYNC occurs and returns the timestamp of the 40 actual VSYNC. A message must be sent every time VSYNC occurs. A client can 41 receive a VSYNC timestamp once at specified intervals or continuously at 42 intervals of 1. You must implement VSYNC with a maximum 1 ms lag (0.5 ms or less 43 is recommended); timestamps returned must be extremely accurate.</p> 44 45 <h2 id=explicit_synchronization>Explicit synchronization</h2> 46 47 <p>Explicit synchronization is required and provides a mechanism for Gralloc 48 buffers to be acquired and released in a synchronized way. Explicit 49 synchronization allows producers and consumers of graphics buffers to signal 50 when they are done with a buffer. This allows Android to asynchronously queue 51 buffers to be read or written with the certainty that another consumer or 52 producer does not currently need them. For details, see 53 <a href="{@docRoot}devices/graphics/index.html#synchronization_framework">Synchronization 54 framework</a>.</p> 55 56 <p>The benefits of explicit synchronization include less behavior variation 57 between devices, better debugging support, and improved testing metrics. For 58 instance, the sync framework output readily identifies problem areas and root 59 causes, and centralized SurfaceFlinger presentation timestamps show when events 60 occur in the normal flow of the system.</p> 61 62 <p>This communication is facilitated by the use of synchronization fences, 63 which are required when requesting a buffer for consuming or producing. The 64 synchronization framework consists of three main building blocks: 65 <code>sync_timeline</code>, <code>sync_pt</code>, and <code>sync_fence</code>.</p> 66 67 <h3 id=sync_timeline>sync_timeline</h3> 68 69 <p>A <code>sync_timeline</code> is a monotonically increasing timeline that 70 should be implemented for each driver instance, such as a GL context, display 71 controller, or 2D blitter. This is essentially a counter of jobs submitted to 72 the kernel for a particular piece of hardware. It provides guarantees about the 73 order of operations and allows hardware-specific implementations.</p> 74 75 <p>The sync_timeline is offered as a CPU-only reference implementation called 76 <code>sw_sync</code> (software sync). If possible, use this instead of a 77 <code>sync_timeline</code> to save resources and avoid complexity. If youre not 78 employing a hardware resource, <code>sw_sync</code> should be sufficient.</p> 79 80 <p>If you must implement a <code>sync_timeline</code>, use the 81 <code>sw_sync</code> driver as a starting point. Follow these guidelines:</p> 82 83 <ul> 84 <li>Provide useful names for all drivers, timelines, and fences. This simplifies 85 debugging.</li> 86 <li>Implement <code>timeline_value_str</code> and <code>pt_value_str</code> 87 operators in your timelines to make debugging output more readable.</li> 88 <li>If you want your userspace libraries (such as the GL library) to have access 89 to the private data of your timelines, implement the fill driver_data operator. 90 This lets you get information about the immutable sync_fence and 91 <code>sync_pts</code> so you can build command lines based upon them.</li> 92 </ul> 93 94 <p>When implementing a <code>sync_timeline</code>, <strong>do not</strong>:</p> 95 96 <ul> 97 <li>Base it on any real view of time, such as when a wall clock or other piece 98 of work might finish. It is better to create an abstract timeline that you can 99 control.</li> 100 <li>Allow userspace to explicitly create or signal a fence. This can result in 101 one piece of the user pipeline creating a denial-of-service attack that halts 102 all functionality. This is because the userspace cannot make promises on behalf 103 of the kernel.</li> 104 <li>Access <code>sync_timeline</code>, <code>sync_pt</code>, or 105 <code>sync_fence</code> elements explicitly, as the API should provide all 106 required functions.</li> 107 </ul> 108 109 <h3 id=sync_pt>sync_pt</h3> 110 111 <p>A <code>sync_pt</code> is a single value or point on a sync_timeline. A point 112 has three states: active, signaled, and error. Points start in the active state 113 and transition to the signaled or error states. For instance, when a buffer is 114 no longer needed by an image consumer, this sync_point is signaled so image 115 producers know it is okay to write into the buffer again.</p> 116 117 <h3 id=sync_fence>sync_fence</h3> 118 119 <p>A <code>sync_fence</code> is a collection of <code>sync_pts</code> that often 120 have different <code>sync_timeline</code> parents (such as for the display 121 controller and GPU). These are the main primitives over which drivers and 122 userspace communicate their dependencies. A fence is a promise from the kernel 123 given upon accepting work that has been queued and assures completion in a 124 finite amount of time.</p> 125 126 <p>This allows multiple consumers or producers to signal they are using a 127 buffer and to allow this information to be communicated with one function 128 parameter. Fences are backed by a file descriptor and can be passed from 129 kernel-space to user-space. For instance, a fence can contain two 130 <code>sync_points</code> that signify when two separate image consumers are done 131 reading a buffer. When the fence is signaled, the image producers know both 132 consumers are done consuming.</p> 133 134 <p>Fences, like <code>sync_pts</code>, start active and then change state based 135 upon the state of their points. If all <code>sync_pts</code> become signaled, 136 the <code>sync_fence</code> becomes signaled. If one <code>sync_pt</code> falls 137 into an error state, the entire sync_fence has an error state.</p> 138 139 <p>Membership in the <code>sync_fence</code> is immutable after the fence is 140 created. As a <code>sync_pt</code> can be in only one fence, it is included as a 141 copy. Even if two points have the same value, there will be two copies of the 142 <code>sync_pt</code> in the fence. To get more than one point in a fence, a 143 merge operation is conducted where points from two distinct fences are added to 144 a third fence. If one of those points was signaled in the originating fence and 145 the other was not, the third fence will also not be in a signaled state.</p> 146 147 <p>To implement explicit synchronization, provide the following:</p> 148 149 <ul> 150 <li>A kernel-space driver that implements a synchronization timeline for a 151 particular piece of hardware. Drivers that need to be fence-aware are generally 152 anything that accesses or communicates with the Hardware Composer. Key files 153 include: 154 <ul> 155 <li>Core implementation: 156 <ul> 157 <li><code>kernel/common/include/linux/sync.h</code></li> 158 <li><code>kernel/common/drivers/base/sync.c</code></li> 159 </ul></li> 160 <li><code>sw_sync</code>: 161 <ul> 162 <li><code>kernel/common/include/linux/sw_sync.h</code></li> 163 <li><code>kernel/common/drivers/base/sw_sync.c</code></li> 164 </ul></li> 165 <li>Documentation at <code>kernel/common//Documentation/sync.txt</code>.</li> 166 <li>Library to communicate with the kernel-space in 167 <code>platform/system/core/libsync</code>.</li> 168 </ul></li> 169 <li>A Hardware Composer HAL module (v1.3 or higher) that supports the new 170 synchronization functionality. You must provide the appropriate synchronization 171 fences as parameters to the <code>set()</code> and <code>prepare()</code> 172 functions in the HAL.</li> 173 <li>Two fence-related GL extensions (<code>EGL_ANDROID_native_fence_sync</code> 174 and <code>EGL_ANDROID_wait_sync</code>) and fence support in your graphics 175 drivers.</li> 176 </ul> 177 178 <p>For example, to use the API supporting the synchronization function, you 179 might develop a display driver that has a display buffer function. Before the 180 synchronization framework existed, this function would receive dma-bufs, put 181 those buffers on the display, and block while the buffer is visible. For 182 example:</p> 183 184 <pre class=prettyprint>/* 185 * assumes buf is ready to be displayed. returns when buffer is no longer on 186 * screen. 187 */ 188 void display_buffer(struct dma_buf *buf); 189 </pre> 190 191 <p>With the synchronization framework, the API call is slightly more complex. 192 While putting a buffer on display, you associate it with a fence that says when 193 the buffer will be ready. You can queue up the work and initiate after the fence 194 clears.</p> 195 196 <p>In this manner, you are not blocking anything. You immediately return your 197 own fence, which is a guarantee of when the buffer will be off of the display. 198 As you queue up buffers, the kernel will list dependencies with the 199 synchronization framework:</p> 200 201 <pre class=prettyprint>/* 202 * will display buf when fence is signaled. returns immediately with a fence 203 * that will signal when buf is no longer displayed. 204 */ 205 struct sync_fence* display_buffer(struct dma_buf *buf, struct sync_fence 206 *fence); 207 </pre> 208 209 210 <h2 id=sync_integration>Sync integration</h2> 211 <p>This section explains how to integrate the low-level sync framework with 212 different parts of the Android framework and the drivers that must communicate 213 with one another.</p> 214 215 <h3 id=integration_conventions>Integration conventions</h3> 216 217 <p>The Android HAL interfaces for graphics follow consistent conventions so 218 when file descriptors are passed across a HAL interface, ownership of the file 219 descriptor is always transferred. This means:</p> 220 221 <ul> 222 <li>If you receive a fence file descriptor from the sync framework, you must 223 close it.</li> 224 <li>If you return a fence file descriptor to the sync framework, the framework 225 will close it.</li> 226 <li>To continue using the fence file descriptor, you must duplicate the 227 descriptor.</li> 228 </ul> 229 230 <p>Every time a fence passes through BufferQueue (such as for a window that 231 passes a fence to BufferQueue saying when its new contents will be ready) the 232 fence object is renamed. Since kernel fence support allows fences to have 233 strings for names, the sync framework uses the window name and buffer index 234 that is being queued to name the fence (i.e., <code>SurfaceView:0</code>). This 235 is helpful in debugging to identify the source of a deadlock as the names appear 236 in the output of <code>/d/sync</code> and bug reports.</p> 237 238 <h3 id=anativewindow_integration>ANativeWindow integration</h3> 239 240 <p>ANativeWindow is fence aware and <code>dequeueBuffer</code>, 241 <code>queueBuffer</code>, and <code>cancelBuffer</code> have fence parameters. 242 </p> 243 244 <h3 id=opengl_es_integration>OpenGL ES integration</h3> 245 246 <p>OpenGL ES sync integration relies upon two EGL extensions:</p> 247 248 <ul> 249 <li><code>EGL_ANDROID_native_fence_sync</code>. Provides a way to either 250 wrap or create native Android fence file descriptors in EGLSyncKHR objects.</li> 251 <li><code>EGL_ANDROID_wait_sync</code>. Allows GPU-side stalls rather than in 252 CPU, making the GPU wait for an EGLSyncKHR. This is essentially the same as the 253 <code>EGL_KHR_wait_sync</code> extension (refer to that specification for 254 details).</li> 255 </ul> 256 257 <p>These extensions can be used independently and are controlled by a compile 258 flag in libgui. To use them, first implement the 259 <code>EGL_ANDROID_native_fence_sync</code> extension along with the associated 260 kernel support. Next, add a ANativeWindow support for fences to your driver then 261 turn on support in libgui to make use of the 262 <code>EGL_ANDROID_native_fence_sync</code> extension.</p> 263 264 <p>In a second pass, enable the <code>EGL_ANDROID_wait_sync</code> 265 extension in your driver and turn it on separately. The 266 <code>EGL_ANDROID_native_fence_sync</code> extension consists of a distinct 267 native fence EGLSync object type so extensions that apply to existing EGLSync 268 object types dont necessarily apply to <code>EGL_ANDROID_native_fence</code> 269 objects to avoid unwanted interactions.</p> 270 271 <p>The EGL_ANDROID_native_fence_sync extension employs a corresponding native 272 fence file descriptor attribute that can be set only at creation time and 273 cannot be directly queried onward from an existing sync object. This attribute 274 can be set to one of two modes:</p> 275 276 <ul> 277 <li><em>A valid fence file descriptor</em>. Wraps an existing native Android 278 fence file descriptor in an EGLSyncKHR object.</li> 279 <li><em>-1</em>. Creates a native Android fence file descriptor from an 280 EGLSyncKHR object.</li> 281 </ul> 282 283 <p>The DupNativeFenceFD function call is used to extract the EGLSyncKHR object 284 from the native Android fence file descriptor. This has the same result as 285 querying the attribute that was set but adheres to the convention that the 286 recipient closes the fence (hence the duplicate operation). Finally, destroying 287 the EGLSync object should close the internal fence attribute.</p> 288 289 <h3 id=hardware_composer_integration>Hardware Composer integration</h3> 290 291 <p>The Hardware Composer handles three types of sync fences:</p> 292 293 <ul> 294 <li><em>Acquire fence</em>. One per layer, set before calling 295 <code>HWC::set</code>. It signals when Hardware Composer may read the buffer.</li> 296 <li><em>Release fence</em>. One per layer, filled in by the driver in 297 <code>HWC::set</code>. It signals when Hardware Composer is done reading the 298 buffer so the framework can start using that buffer again for that particular 299 layer.</li> 300 <li><em>Retire fence</em>. One per the entire frame, filled in by the driver 301 each time <code>HWC::set</code> is called. This covers all layers for the set 302 operation and signals to the framework when all effects of this set operation 303 have completed. The retire fence signals when the next set operation takes place 304 on the screen.</li> 305 </ul> 306 307 <p>The retire fence can be used to determine how long each frame appears on the 308 screen. This is useful in identifying the location and source of delays, such 309 as a stuttering animation.</p> 310 311 <h2 id=vsync_offset>VSYNC offset</h2> 312 313 <p>Application and SurfaceFlinger render loops should be synchronized to the 314 hardware VSYNC. On a VSYNC event, the display begins showing frame N while 315 SurfaceFlinger begins compositing windows for frame N+1. The app handles 316 pending input and generates frame N+2.</p> 317 318 <p>Synchronizing with VSYNC delivers consistent latency. It reduces errors in 319 apps and SurfaceFlinger and the drifting of displays in and out of phase with 320 each other. This, however, does assume application and SurfaceFlinger per-frame 321 times dont vary widely. Nevertheless, the latency is at least two frames.</p> 322 323 <p>To remedy this, you can employ VSYNC offsets to reduce the input-to-display 324 latency by making application and composition signal relative to hardware 325 VSYNC. This is possible because application plus composition usually takes less 326 than 33 ms.</p> 327 328 <p>The result of VSYNC offset is three signals with same period, offset 329 phase:</p> 330 331 <ul> 332 <li><code>HW_VSYNC_0</code>. Display begins showing next frame.</li> 333 <li><code>VSYNC</code>. App reads input and generates next frame.</li> 334 <li><code>SF VSYNC</code>. SurfaceFlinger begins compositing for next frame.</li> 335 </ul> 336 337 <p>With VSYNC offset, SurfaceFlinger receives the buffer and composites the 338 frame, while the application processes the input and renders the frame, all 339 within a single frame of time.</p> 340 341 <p class="note"><strong>Note:</strong> VSYNC offsets reduce the time available 342 for app and composition and therefore provide a greater chance for error.</p> 343 344 <h3 id=dispsync>DispSync</h3> 345 346 <p>DispSync maintains a model of the periodic hardware-based VSYNC events of a 347 display and uses that model to execute periodic callbacks at specific phase 348 offsets from the hardware VSYNC events.</p> 349 350 <p>DispSync is essentially a software phase lock loop (PLL) that generates the 351 VSYNC and SF VSYNC signals used by Choreographer and SurfaceFlinger, even if 352 not offset from hardware VSYNC.</p> 353 354 <img src="images/dispsync.png" alt="DispSync flow"> 355 356 <p class="img-caption"><strong>Figure 1.</strong> DispSync flow</p> 357 358 <p>DispSync has the following qualities:</p> 359 360 <ul> 361 <li><em>Reference</em>. HW_VSYNC_0.</li> 362 <li><em>Output</em>. VSYNC and SF VSYNC.</li> 363 <li><em>Feedback</em>. Retire fence signal timestamps from Hardware Composer. 364 </li> 365 </ul> 366 367 <h3 id=vsync_retire_offset>VSYNC/Retire offset</h3> 368 369 <p>The signal timestamp of retire fences must match HW VSYNC even on devices 370 that dont use the offset phase. Otherwise, errors appear to have greater 371 severity than reality. Smart panels often have a delta: Retire fence is the end 372 of direct memory access (DMA) to display memory, but the actual display switch 373 and HW VSYNC is some time later.</p> 374 375 <p><code>PRESENT_TIME_OFFSET_FROM_VSYNC_NS</code> is set in the devices 376 BoardConfig.mk make file. It is based upon the display controller and panel 377 characteristics. Time from retire fence timestamp to HW VSYNC signal is 378 measured in nanoseconds.</p> 379 380 <h3 id=vsync_and_sf_vsync_offsets>VSYNC and SF_VSYNC offsets</h3> 381 382 <p>The <code>VSYNC_EVENT_PHASE_OFFSET_NS</code> and 383 <code>SF_VSYNC_EVENT_PHASE_OFFSET_NS</code> are set conservatively based on 384 high-load use cases, such as partial GPU composition during window transition 385 or Chrome scrolling through a webpage containing animations. These offsets 386 allow for long application render time and long GPU composition time.</p> 387 388 <p>More than a millisecond or two of latency is noticeable. We recommend 389 integrating thorough automated error testing to minimize latency without 390 significantly increasing error counts.</p> 391 392 <p class="note"><strong>Note:</strong> Theses offsets are also configured in the 393 devices BoardConfig.mk file. Both settings are offset in nanoseconds after 394 HW_VSYNC_0, default to zero (if not set), and can be negative.</p> 395