Home | History | Annotate | Download | only in graphics
      1 page.title=Implementing graphics
      2 @jd:body
      3 
      4 <!--
      5     Copyright 2014 The Android Open Source Project
      6 
      7     Licensed under the Apache License, Version 2.0 (the "License");
      8     you may not use this file except in compliance with the License.
      9     You may obtain a copy of the License at
     10 
     11         http://www.apache.org/licenses/LICENSE-2.0
     12 
     13     Unless required by applicable law or agreed to in writing, software
     14     distributed under the License is distributed on an "AS IS" BASIS,
     15     WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     16     See the License for the specific language governing permissions and
     17     limitations under the License.
     18 -->
     19 
     20 <div id="qv-wrapper">
     21   <div id="qv">
     22     <h2>In this document</h2>
     23     <ol id="auto-toc">
     24     </ol>
     25   </div>
     26 </div>
     27 
     28 
     29 <p>Follow the instructions here to implement the Android graphics HAL.</p>
     30 
     31 <h2 id=requirements>Requirements</h2>
     32 
     33 <p>The following list and sections describe what you need to provide to support
     34 graphics in your product:</p>
     35 
     36 <ul> <li> OpenGL ES 1.x Driver <li> OpenGL ES 2.0 Driver <li> OpenGL ES 3.0
     37 Driver (optional) <li> EGL Driver <li> Gralloc HAL implementation <li> Hardware
     38 Composer HAL implementation <li> Framebuffer HAL implementation </ul>
     39 
     40 <h2 id=implementation>Implementation</h2>
     41 
     42 <h3 id=opengl_and_egl_drivers>OpenGL and EGL drivers</h3>
     43 
     44 <p>You must provide drivers for OpenGL ES 1.x, OpenGL ES 2.0, and EGL. Here are
     45 some key considerations:</p>
     46 
     47 <ul> <li> The GL driver needs to be robust and conformant to OpenGL ES
     48 standards.  <li> Do not limit the number of GL contexts. Because Android allows
     49 apps in the background and tries to keep GL contexts alive, you should not
     50 limit the number of contexts in your driver.  <li> It is not uncommon to have
     51 20-30 active GL contexts at once, so you should also be careful with the amount
     52 of memory allocated for each context.  <li> Support the YV12 image format and
     53 any other YUV image formats that come from other components in the system such
     54 as media codecs or the camera.  <li> Support the mandatory extensions:
     55 <code>GL_OES_texture_external</code>,
     56 <code>EGL_ANDROID_image_native_buffer</code>, and
     57 <code>EGL_ANDROID_recordable</code>. The
     58 <code>EGL_ANDROID_framebuffer_target</code> extension is required for Hardware
     59 Composer 1.1 and higher, as well.  <li> We highly recommend also supporting
     60 <code>EGL_ANDROID_blob_cache</code>, <code>EGL_KHR_fence_sync</code>,
     61 <code>EGL_KHR_wait_sync</code>, and <code>EGL_ANDROID_native_fence_sync</code>.
     62 </ul>
     63 
     64 <p>Note the OpenGL API exposed to app developers is different from the OpenGL
     65 interface that you are implementing. Apps do not have access to the GL driver
     66 layer and must go through the interface provided by the APIs.</p>
     67 
     68 <h3 id=pre-rotation>Pre-rotation</h3>
     69 
     70 <p>Many hardware overlays do not support rotation, and even if they do it costs
     71 processing power. So the solution is to pre-transform the buffer before it
     72 reaches SurfaceFlinger. A query hint in <code>ANativeWindow</code> was added
     73 (<code>NATIVE_WINDOW_TRANSFORM_HINT</code>) that represents the most likely
     74 transform to be applied to the buffer by SurfaceFlinger. Your GL driver can use
     75 this hint to pre-transform the buffer before it reaches SurfaceFlinger so when
     76 the buffer arrives, it is correctly transformed.</p>
     77 
     78 <p>For example, you may receive a hint to rotate 90 degrees. You must generate
     79 a matrix and apply it to the buffer to prevent it from running off the end of
     80 the page. To save power, this should be done in pre-rotation. See the
     81 <code>ANativeWindow</code> interface defined in
     82 <code>system/core/include/system/window.h</code> for more details.</p>
     83 
     84 <h3 id=gralloc_hal>Gralloc HAL</h3>
     85 
     86 <p>The graphics memory allocator is needed to allocate memory that is requested
     87 by image producers. You can find the interface definition of the HAL at:
     88 <code>hardware/libhardware/modules/gralloc.h</code></p>
     89 
     90 <h3 id=protected_buffers>Protected buffers</h3>
     91 
     92 <p>The gralloc usage flag <code>GRALLOC_USAGE_PROTECTED</code> allows the
     93 graphics buffer to be displayed only through a hardware-protected path. These
     94 overlay planes are the only way to display DRM content. DRM-protected buffers
     95 cannot be accessed by SurfaceFlinger or the OpenGL ES driver.</p>
     96 
     97 <p>DRM-protected video can be presented only on an overlay plane. Video players
     98 that support protected content must be implemented with SurfaceView. Software
     99 running on unprotected hardware cannot read or write the buffer.
    100 Hardware-protected paths must appear on the Hardware Composer overlay. For
    101 instance, protected videos will disappear from the display if Hardware Composer
    102 switches to OpenGL ES composition.</p>
    103 
    104 <p>See the <a href="{@docRoot}devices/drm.html">DRM</a> page for a description
    105 of protected content.</p>
    106 
    107 <h3 id=hardware_composer_hal>Hardware Composer HAL</h3>
    108 
    109 <p>The Hardware Composer HAL is used by SurfaceFlinger to composite surfaces to
    110 the screen. The Hardware Composer abstracts objects like overlays and 2D
    111 blitters and helps offload some work that would normally be done with
    112 OpenGL.</p>
    113 
    114 <p>We recommend you start using version 1.3 of the Hardware Composer HAL as it
    115 will provide support for the newest features (explicit synchronization,
    116 external displays, and more). Because the physical display hardware behind the
    117 Hardware Composer abstraction layer can vary from device to device, it is
    118 difficult to define recommended features. But here is some guidance:</p>
    119 
    120 <ul> <li> The Hardware Composer should support at least four overlays (status
    121 bar, system bar, application, and wallpaper/background).  <li> Layers can be
    122 bigger than the screen, so the Hardware Composer should be able to handle
    123 layers that are larger than the display (for example, a wallpaper).  <li>
    124 Pre-multiplied per-pixel alpha blending and per-plane alpha blending should be
    125 supported at the same time.  <li> The Hardware Composer should be able to
    126 consume the same buffers that the GPU, camera, video decoder, and Skia buffers
    127 are producing, so supporting some of the following properties is helpful: <ul>
    128 <li> RGBA packing order <li> YUV formats <li> Tiling, swizzling, and stride
    129 properties </ul> <li> A hardware path for protected video playback must be
    130 present if you want to support protected content.  </ul>
    131 
    132 <p>The general recommendation when implementing your Hardware Composer is to
    133 implement a non-operational Hardware Composer first. Once you have the
    134 structure done, implement a simple algorithm to delegate composition to the
    135 Hardware Composer. For example, just delegate the first three or four surfaces
    136 to the overlay hardware of the Hardware Composer.</p>
    137 
    138 <p>Focus on optimization, such as intelligently selecting the surfaces to send
    139 to the overlay hardware that maximizes the load taken off of the GPU. Another
    140 optimization is to detect whether the screen is updating. If not, delegate
    141 composition to OpenGL instead of the Hardware Composer to save power. When the
    142 screen updates again, continue to offload composition to the Hardware
    143 Composer.</p>
    144 
    145 <p>Devices must report the display mode (or resolution). Android uses the first
    146 mode reported by the device. To support televisions, have the TV device report
    147 the mode selected for it by the manufacturer to Hardware Composer. See
    148 hwcomposer.h for more details.</p>
    149 
    150 <p>Prepare for common use cases, such as:</p>
    151 
    152 <ul> <li> Full-screen games in portrait and landscape mode <li> Full-screen
    153 video with closed captioning and playback control <li> The home screen
    154 (compositing the status bar, system bar, application window, and live
    155 wallpapers) <li> Protected video playback <li> Multiple display support </ul>
    156 
    157 <p>These use cases should address regular, predictable uses rather than edge
    158 cases that are rarely encountered. Otherwise, any optimization will have little
    159 benefit. Implementations must balance two competing goals: animation smoothness
    160 and interaction latency.</p>
    161 
    162 <p>Further, to make best use of Android graphics, you must develop a robust
    163 clocking strategy. Performance matters little if clocks have been turned down
    164 to make every operation slow. You need a clocking strategy that puts the clocks
    165 at high speed when needed, such as to make animations seamless, and then slows
    166 the clocks whenever the increased speed is no longer needed.</p>
    167 
    168 <p>Use the <code>adb shell dumpsys SurfaceFlinger</code> command to see
    169 precisely what SurfaceFlinger is doing. See the <a
    170 href="{@docRoot}devices/graphics/architecture.html#hwcomposer">Hardware
    171 Composer</a> section of the Architecture page for example output and a
    172 description of relevant fields.</p>
    173 
    174 <p>You can find the HAL for the Hardware Composer and additional documentation
    175 in: <code>hardware/libhardware/include/hardware/hwcomposer.h
    176 hardware/libhardware/include/hardware/hwcomposer_defs.h</code></p>
    177 
    178 <p>A stub implementation is available in the
    179 <code>hardware/libhardware/modules/hwcomposer</code> directory.</p>
    180 
    181 <h3 id=vsync>VSYNC</h3>
    182 
    183 <p>VSYNC synchronizes certain events to the refresh cycle of the display.
    184 Applications always start drawing on a VSYNC boundary, and SurfaceFlinger
    185 always composites on a VSYNC boundary. This eliminates stutters and improves
    186 visual performance of graphics. The Hardware Composer has a function
    187 pointer:</p>
    188 
    189 <pre class=prettyprint> int (waitForVsync*) (int64_t *timestamp) </pre>
    190 
    191 
    192 <p>This points to a function you must implement for VSYNC. This function blocks
    193 until a VSYNC occurs and returns the timestamp of the actual VSYNC. A message
    194 must be sent every time VSYNC occurs. A client can receive a VSYNC timestamp
    195 once, at specified intervals, or continuously (interval of 1). You must
    196 implement VSYNC to have no more than a 1ms lag at the maximum (0.5ms or less is
    197 recommended), and the timestamps returned must be extremely accurate.</p>
    198 
    199 <h4 id=explicit_synchronization>Explicit synchronization</h4>
    200 
    201 <p>Explicit synchronization is required and provides a mechanism for Gralloc
    202 buffers to be acquired and released in a synchronized way. Explicit
    203 synchronization allows producers and consumers of graphics buffers to signal
    204 when they are done with a buffer. This allows the Android system to
    205 asynchronously queue buffers to be read or written with the certainty that
    206 another consumer or producer does not currently need them. See the <a
    207 href="#synchronization_framework">Synchronization framework</a> section for an overview of
    208 this mechanism.</p>
    209 
    210 <p>The benefits of explicit synchronization include less behavior variation
    211 between devices, better debugging support, and improved testing metrics. For
    212 instance, the sync framework output readily identifies problem areas and root
    213 causes. And centralized SurfaceFlinger presentation timestamps show when events
    214 occur in the normal flow of the system.</p>
    215 
    216 <p>This communication is facilitated by the use of synchronization fences,
    217 which are now required when requesting a buffer for consuming or producing. The
    218 synchronization framework consists of three main building blocks:
    219 sync_timeline, sync_pt, and sync_fence.</p>
    220 
    221 <h5 id=sync_timeline>sync_timeline</h5>
    222 
    223 <p>A sync_timeline is a monotonically increasing timeline that should be
    224 implemented for each driver instance, such as a GL context, display controller,
    225 or 2D blitter. This is essentially a counter of jobs submitted to the kernel
    226 for a particular piece of hardware. It provides guarantees about the order of
    227 operations and allows hardware-specific implementations.</p>
    228 
    229 <p>Please note, the sync_timeline is offered as a CPU-only reference
    230 implementation called sw_sync (which stands for software sync). If possible,
    231 use sw_sync instead of a sync_timeline to save resources and avoid complexity.
    232 If youre not employing a hardware resource, sw_sync should be sufficient.</p>
    233 
    234 <p>If you must implement a sync_timeline, use the sw_sync driver as a starting
    235 point. Follow these guidelines:</p>
    236 
    237 <ul> <li> Provide useful names for all drivers, timelines, and fences. This
    238 simplifies debugging.  <li> Implement timeline_value str and pt_value_str
    239 operators in your timelines as they make debugging output much more readable.
    240 <li> If you want your userspace libraries (such as the GL library) to have
    241 access to the private data of your timelines, implement the fill driver_data
    242 operator. This lets you get information about the immutable sync_fence and
    243 sync_pts so you might build command lines based upon them.  </ul>
    244 
    245 <p>When implementing a sync_timeline, <strong>dont</strong>:</p>
    246 
    247 <ul> <li> Base it on any real view of time, such as when a wall clock or other
    248 piece of work might finish. It is better to create an abstract timeline that
    249 you can control.  <li> Allow userspace to explicitly create or signal a fence.
    250 This can result in one piece of the user pipeline creating a denial-of-service
    251 attack that halts all functionality. This is because the userspace cannot make
    252 promises on behalf of the kernel.  <li> Access sync_timeline, sync_pt, or
    253 sync_fence elements explicitly, as the API should provide all required
    254 functions.  </ul>
    255 
    256 <h5 id=sync_pt>sync_pt</h5>
    257 
    258 <p>A sync_pt is a single value or point on a sync_timeline. A point has three
    259 states: active, signaled, and error. Points start in the active state and
    260 transition to the signaled or error states. For instance, when a buffer is no
    261 longer needed by an image consumer, this sync_point is signaled so that image
    262 producers know it is okay to write into the buffer again.</p>
    263 
    264 <h5 id=sync_fence>sync_fence</h5>
    265 
    266 <p>A sync_fence is a collection of sync_pts that often have different
    267 sync_timeline parents (such as for the display controller and GPU). These are
    268 the main primitives over which drivers and userspace communicate their
    269 dependencies. A fence is a promise from the kernel that it gives upon accepting
    270 work that has been queued and assures completion in a finite amount of
    271 time.</p>
    272 
    273 <p>This allows multiple consumers or producers to signal they are using a
    274 buffer and to allow this information to be communicated with one function
    275 parameter. Fences are backed by a file descriptor and can be passed from
    276 kernel-space to user-space. For instance, a fence can contain two sync_points
    277 that signify when two separate image consumers are done reading a buffer. When
    278 the fence is signaled, the image producers know both consumers are done
    279 consuming.
    280 
    281 Fences, like sync_pts, start active and then change state based upon the state
    282 of their points. If all sync_pts become signaled, the sync_fence becomes
    283 signaled. If one sync_pt falls into an error state, the entire sync_fence has
    284 an error state.
    285 
    286 Membership in the sync_fence is immutable once the fence is created. And since
    287 a sync_pt can be in only one fence, it is included as a copy. Even if two
    288 points have the same value, there will be two copies of the sync_pt in the
    289 fence.
    290 
    291 To get more than one point in a fence, a merge operation is conducted. In the
    292 merge, the points from two distinct fences are added to a third fence. If one
    293 of those points was signaled in the originating fence, and the other was not,
    294 the third fence will also not be in a signaled state.</p>
    295 
    296 <p>To implement explicit synchronization, you need to provide the
    297 following:</p>
    298 
    299 <ul> <li> A kernel-space driver that implements a synchronization timeline for
    300 a particular piece of hardware. Drivers that need to be fence-aware are
    301 generally anything that accesses or communicates with the Hardware Composer.
    302 Here are the key files (found in the android-3.4 kernel branch): <ul> <li> Core
    303 implementation: <ul> <li> <code>kernel/common/include/linux/sync.h</code> <li>
    304 <code>kernel/common/drivers/base/sync.c</code> </ul> <li> sw_sync: <ul> <li>
    305 <code>kernel/common/include/linux/sw_sync.h</code> <li>
    306 <code>kernel/common/drivers/base/sw_sync.c</code> </ul> <li> Documentation:
    307 <li> <code>kernel/common//Documentation/sync.txt</code> Finally, the
    308 <code>platform/system/core/libsync</code> directory includes a library to
    309 communicate with the kernel-space.  </ul> <li> A Hardware Composer HAL module
    310 (version 1.3 or later) that supports the new synchronization functionality. You
    311 will need to provide the appropriate synchronization fences as parameters to
    312 the set() and prepare() functions in the HAL.  <li> Two GL-specific extensions
    313 related to fences, <code>EGL_ANDROID_native_fence_sync</code> and
    314 <code>EGL_ANDROID_wait_sync</code>, along with incorporating fence support into
    315 your graphics drivers.  </ul>
    316 
    317 <p>For example, to use the API supporting the synchronization function, you
    318 might develop a display driver that has a display buffer function. Before the
    319 synchronization framework existed, this function would receive dma-bufs, put
    320 those buffers on the display, and block while the buffer is visible, like
    321 so:</p>
    322 
    323 <pre class=prettyprint>
    324 /*
    325  * assumes buf is ready to be displayed.  returns when buffer is no longer on
    326  * screen.
    327  */
    328 void display_buffer(struct dma_buf *buf); </pre>
    329 
    330 
    331 <p>With the synchronization framework, the API call is slightly more complex.
    332 While putting a buffer on display, you associate it with a fence that says when
    333 the buffer will be ready. So you queue up the work, which you will initiate
    334 once the fence clears.</p>
    335 
    336 <p>In this manner, you are not blocking anything. You immediately return your
    337 own fence, which is a guarantee of when the buffer will be off of the display.
    338 As you queue up buffers, the kernel will list dependencies. With the
    339 synchronization framework:</p>
    340 
    341 <pre class=prettyprint>
    342 /*
    343  * will display buf when fence is signaled.  returns immediately with a fence
    344  * that will signal when buf is no longer displayed.
    345  */
    346 struct sync_fence* display_buffer(struct dma_buf *buf, struct sync_fence
    347 *fence); </pre>
    348 
    349 
    350 <h4 id=sync_integration>Sync integration</h4>
    351 
    352 <h5 id=integration_conventions>Integration conventions</h5>
    353 
    354 <p>This section explains how to integrate the low-level sync framework with
    355 different parts of the Android framework and the drivers that need to
    356 communicate with one another.</p>
    357 
    358 <p>The Android HAL interfaces for graphics follow consistent conventions so
    359 when file descriptors are passed across a HAL interface, ownership of the file
    360 descriptor is always transferred. This means:</p>
    361 
    362 <ul> <li> if you receive a fence file descriptor from the sync framework, you
    363 must close it.  <li> if you return a fence file descriptor to the sync
    364 framework, the framework will close it.  <li> if you want to continue using the
    365 fence file descriptor, you must duplicate the descriptor.  </ul>
    366 
    367 <p>Every time a fence is passed through BufferQueue - such as for a window that
    368 passes a fence to BufferQueue saying when its new contents will be ready - the
    369 fence object is renamed. Since kernel fence support allows fences to have
    370 strings for names, the sync framework uses the window name and buffer index
    371 that is being queued to name the fence, for example:
    372 <code>SurfaceView:0</code></p>
    373 
    374 <p>This is helpful in debugging to identify the source of a deadlock. Those
    375 names appear in the output of <code>/d/sync</code> and bug reports when
    376 taken.</p>
    377 
    378 <h5 id=anativewindow_integration>ANativeWindow integration</h5>
    379 
    380 <p>ANativeWindow is fence aware. <code>dequeueBuffer</code>,
    381 <code>queueBuffer</code>, and <code>cancelBuffer</code> have fence
    382 parameters.</p>
    383 
    384 <h5 id=opengl_es_integration>OpenGL ES integration</h5>
    385 
    386 <p>OpenGL ES sync integration relies upon these two EGL extensions:</p>
    387 
    388 <ul> <li> <code>EGL_ANDROID_native_fence_sync</code> - provides a way to either
    389 wrap or create native Android fence file descriptors in EGLSyncKHR objects.
    390 <li> <code>EGL_ANDROID_wait_sync</code> - allows GPU-side stalls rather than in
    391 CPU, making the GPU wait for an EGLSyncKHR. This is essentially the same as the
    392 <code>EGL_KHR_wait_sync</code> extension. See the
    393 <code>EGL_KHR_wait_sync</code> specification for details.  </ul>
    394 
    395 <p>These extensions can be used independently and are controlled by a compile
    396 flag in libgui. To use them, first implement the
    397 <code>EGL_ANDROID_native_fence_sync</code> extension along with the associated
    398 kernel support. Next add a ANativeWindow support for fences to your driver and
    399 then turn on support in libgui to make use of the
    400 <code>EGL_ANDROID_native_fence_sync</code> extension.</p>
    401 
    402 <p>Then, as a second pass, enable the <code>EGL_ANDROID_wait_sync</code>
    403 extension in your driver and turn it on separately. The
    404 <code>EGL_ANDROID_native_fence_sync</code> extension consists of a distinct
    405 native fence EGLSync object type so extensions that apply to existing EGLSync
    406 object types dont necessarily apply to <code>EGL_ANDROID_native_fence</code>
    407 objects to avoid unwanted interactions.</p>
    408 
    409 <p>The EGL_ANDROID_native_fence_sync extension employs a corresponding native
    410 fence file descriptor attribute that can be set only at creation time and
    411 cannot be directly queried onward from an existing sync object. This attribute
    412 can be set to one of two modes:</p>
    413 
    414 <ul> <li> A valid fence file descriptor - wraps an existing native Android
    415 fence file descriptor in an EGLSyncKHR object.  <li> -1 - creates a native
    416 Android fence file descriptor from an EGLSyncKHR object.  </ul>
    417 
    418 <p>The DupNativeFenceFD function call is used to extract the EGLSyncKHR object
    419 from the native Android fence file descriptor. This has the same result as
    420 querying the attribute that was set but adheres to the convention that the
    421 recipient closes the fence (hence the duplicate operation). Finally, destroying
    422 the EGLSync object should close the internal fence attribute.</p>
    423 
    424 <h5 id=hardware_composer_integration>Hardware Composer integration</h5>
    425 
    426 <p>Hardware Composer handles three types of sync fences:</p>
    427 
    428 <ul> <li> <em>Acquire fence</em> - one per layer, this is set before calling
    429 HWC::set. It signals when Hardware Composer may read the buffer.  <li>
    430 <em>Release fence</em> - one per layer, this is filled in by the driver in
    431 HWC::set. It signals when Hardware Composer is done reading the buffer so the
    432 framework can start using that buffer again for that particular layer.  <li>
    433 <em>Retire fence</em> - one per the entire frame, this is filled in by the
    434 driver each time HWC::set is called. This covers all of the layers for the set
    435 operation. It signals to the framework when all of the effects of this set
    436 operation has completed. The retire fence signals when the next set operation
    437 takes place on the screen.  </ul>
    438 
    439 <p>The retire fence can be used to determine how long each frame appears on the
    440 screen. This is useful in identifying the location and source of delays, such
    441 as a stuttering animation. </p>
    442 
    443 <h4 id=vsync_offset>VSYNC Offset</h4>
    444 
    445 <p>Application and SurfaceFlinger render loops should be synchronized to the
    446 hardware VSYNC. On a VSYNC event, the display begins showing frame N while
    447 SurfaceFlinger begins compositing windows for frame N+1. The app handles
    448 pending input and generates frame N+2.</p>
    449 
    450 <p>Synchronizing with VSYNC delivers consistent latency. It reduces errors in
    451 apps and SurfaceFlinger and the drifting of displays in and out of phase with
    452 each other. This, however, does assume application and SurfaceFlinger per-frame
    453 times dont vary widely. Nevertheless, the latency is at least two frames.</p>
    454 
    455 <p>To remedy this, you may employ VSYNC offsets to reduce the input-to-display
    456 latency by making application and composition signal relative to hardware
    457 VSYNC. This is possible because application plus composition usually takes less
    458 than 33 ms.</p>
    459 
    460 <p>The result of VSYNC offset is three signals with same period, offset
    461 phase:</p>
    462 
    463 <ul> <li> <em>HW_VSYNC_0</em> - Display begins showing next frame <li>
    464 <em>VSYNC</em> - App reads input and generates next frame <li> <em>SF
    465 VSYNC</em> - SurfaceFlinger begins compositing for next frame </ul>
    466 
    467 <p>With VSYNC offset, SurfaceFlinger receives the buffer and composites the
    468 frame, while the application processes the input and renders the frame, all
    469 within a single frame of time.</p>
    470 
    471 <p>Please note, VSYNC offsets reduce the time available for app and composition
    472 and therefore provide a greater chance for error.</p>
    473 
    474 <h5 id=dispsync>DispSync</h5>
    475 
    476 <p>DispSync maintains a model of the periodic hardware-based VSYNC events of a
    477 display and uses that model to execute periodic callbacks at specific phase
    478 offsets from the hardware VSYNC events.</p>
    479 
    480 <p>DispSync is essentially a software phase lock loop (PLL) that generates the
    481 VSYNC and SF VSYNC signals used by Choreographer and SurfaceFlinger, even if
    482 not offset from hardware VSYNC.</p>
    483 
    484 <img src="images/dispsync.png" alt="DispSync flow">
    485 
    486 <p class="img-caption"><strong>Figure 4.</strong> DispSync flow</p>
    487 
    488 <p>DispSync has these qualities:</p>
    489 
    490 <ul> <li> <em>Reference</em> - HW_VSYNC_0 <li> <em>Output</em> - VSYNC and SF
    491 VSYNC <li> <em>Feedback</em> - Retire fence signal timestamps from Hardware
    492 Composer </ul>
    493 
    494 <h5 id=vsync_retire_offset>VSYNC/Retire Offset</h5>
    495 
    496 <p>The signal timestamp of retire fences must match HW VSYNC even on devices
    497 that dont use the offset phase. Otherwise, errors appear to have greater
    498 severity than reality.</p>
    499 
    500 <p>Smart panels often have a delta. Retire fence is the end of direct memory
    501 access (DMA) to display memory. The actual display switch and HW VSYNC is some
    502 time later.</p>
    503 
    504 <p><code>PRESENT_TIME_OFFSET_FROM_VSYNC_NS</code> is set in the devices
    505 BoardConfig.mk make file. It is based upon the display controller and panel
    506 characteristics. Time from retire fence timestamp to HW Vsync signal is
    507 measured in nanoseconds.</p>
    508 
    509 <h5 id=vsync_and_sf_vsync_offsets>VSYNC and SF_VSYNC Offsets</h5>
    510 
    511 <p>The <code>VSYNC_EVENT_PHASE_OFFSET_NS</code> and
    512 <code>SF_VSYNC_EVENT_PHASE_OFFSET_NS</code> are set conservatively based on
    513 high-load use cases, such as partial GPU composition during window transition
    514 or Chrome scrolling through a webpage containing animations. These offsets
    515 allow for long application render time and long GPU composition time.</p>
    516 
    517 <p>More than a millisecond or two of latency is noticeable. We recommend
    518 integrating thorough automated error testing to minimize latency without
    519 significantly increasing error counts.</p>
    520 
    521 <p>Note these offsets are also set in the devices BoardConfig.mk make file.
    522 The default if not set is zero offset. Both settings are offset in nanoseconds
    523 after HW_VSYNC_0. Either can be negative.</p>
    524 
    525 <h3 id=virtual_displays>Virtual displays</h3>
    526 
    527 <p>Android added support for virtual displays to Hardware Composer in version
    528 1.3. This support was implemented in the Android platform and can be used by
    529 Miracast.</p>
    530 
    531 <p>The virtual display composition is similar to the physical display: Input
    532 layers are described in prepare(), SurfaceFlinger conducts GPU composition, and
    533 layers and GPU framebuffer are  provided to Hardware Composer in set().</p>
    534 
    535 <p>Instead of the output going to the screen, it is sent to a gralloc buffer.
    536 Hardware Composer writes output to a buffer and provides the completion fence.
    537 The buffer is sent to an arbitrary consumer: video encoder, GPU, CPU, etc.
    538 Virtual displays can use 2D/blitter or overlays if the display pipeline can
    539 write to memory.</p>
    540 
    541 <h4 id=modes>Modes</h4>
    542 
    543 <p>Each frame is in one of three modes after prepare():</p>
    544 
    545 <ul> <li> <em>GLES</em> - All layers composited by GPU. GPU writes directly to
    546 the output buffer while Hardware Composer does nothing. This is equivalent to
    547 virtual display composition with Hardware Composer <1.3.  <li> <em>MIXED</em> -
    548 GPU composites some layers to framebuffer, and Hardware Composer composites
    549 framebuffer and remaining layers. GPU writes to scratch buffer (framebuffer).
    550 Hardware Composer reads scratch buffer and writes to the output buffer. Buffers
    551 may have different formats, e.g. RGBA and YCbCr.  <li> <em>HWC</em> - All
    552 layers composited by Hardware Composer. Hardware Composer writes directly to
    553 the output buffer.  </ul>
    554 
    555 <h4 id=output_format>Output format</h4>
    556 
    557 <p><em>MIXED and HWC modes</em>: If the consumer needs CPU access, the consumer
    558 chooses the format. Otherwise, the format is IMPLEMENTATION_DEFINED. Gralloc
    559 can choose best format based on usage flags. For example, choose a YCbCr format
    560 if the consumer is video encoder, and Hardware Composer can write the format
    561 efficiently.</p>
    562 
    563 <p><em>GLES mode</em>: EGL driver chooses output buffer format in
    564 dequeueBuffer(), typically RGBA8888. The consumer must be able to accept this
    565 format.</p>
    566 
    567 <h4 id=egl_requirement>EGL requirement</h4>
    568 
    569 <p>Hardware Composer 1.3 virtual displays require that eglSwapBuffers() does
    570 not dequeue the next buffer immediately. Instead, it should defer dequeueing
    571 the buffer until rendering begins. Otherwise, EGL always owns the next output
    572 buffer. SurfaceFlinger cant get the output buffer for Hardware Composer in
    573 MIXED/HWC mode. </p>
    574 
    575 <p>If Hardware Composer always sends all virtual display layers to GPU, all
    576 frames will be in GLES mode. Although it is not recommended, you may use this
    577 method if you need to support Hardware Composer 1.3 for some other reason but
    578 cant conduct virtual display composition.</p>
    579 
    580 <h2 id=testing>Testing</h2>
    581 
    582 <p>For benchmarking, we suggest following this flow by phase:</p>
    583 
    584 <ul> <li> <em>Specification</em> - When initially specifying the device, such
    585 as when using immature drivers, you should use predefined (fixed) clocks and
    586 workloads to measure the frames per second rendered. This gives a clear view of
    587 what the hardware is capable of doing.  <li> <em>Development</em> - In the
    588 development phase as drivers mature, you should use a fixed set of user actions
    589 to measure the number of visible stutters (janks) in animations.  <li>
    590 <em>Production</em> - Once the device is ready for production and you want to
    591 compare against competitors, you should increase the workload until stutters
    592 increase. Determine if the current clock settings can keep up with the load.
    593 This can help you identify where you might be able to slow the clocks and
    594 reduce power use.  </ul>
    595 
    596 <p>For the specification phase, Android offers the Flatland tool to help derive
    597 device capabilities. It can be found at:
    598 <code>platform/frameworks/native/cmds/flatland/</code></p>
    599 
    600 <p>Flatland relies upon fixed clocks and shows the throughput that can be
    601 achieved with composition-based workloads. It uses gralloc buffers to simulate
    602 multiple window scenarios, filling in the window with GL and then measuring the
    603 compositing. Please note, Flatland uses the synchronization framework to
    604 measure time. So you must support the synchronization framework to readily use
    605 Flatland.</p>
    606