Home | History | Annotate | Download | only in renderscript
      1 page.title=RenderScript
      2 parent.title=Computation
      3 parent.link=index.html
      4 
      5 @jd:body
      6 
      7 <div id="qv-wrapper">
      8   <div id="qv">
      9     <h2>In this document</h2>
     10 
     11     <ol>
     12       <li><a href="#writing-an-rs-kernel">Writing a RenderScript Kernel</a></li>
     13       <li><a href="#access-rs-apis">Accessing RenderScript APIs from Java</a>
     14         <ol>
     15           <li><a href="#ide-setup">Setting Up Your Development Environment</a></li>
     16         </ol>
     17       </li>
     18       <li><a href="#using-rs-from-java">Using RenderScript from Java Code</a></li>
     19       <li><a href="#single-source-rs">Single-Source RenderScript</a></li>
     20       <li><a href="#reduction-in-depth">Reduction Kernels in Depth</a>
     21         <ol>
     22           <li><a href="#writing-reduction-kernel">Writing a reduction kernel</a></li>
     23           <li><a href="#calling-reduction-kernel">Calling a reduction kernel from Java code</a></li>
     24           <li><a href="#more-example">More example reduction kernels</a></li>
     25         </ol>
     26       </li>
     27     </ol>
     28 
     29     <h2>Related Samples</h2>
     30 
     31     <ol>
     32       <li><a class="external-link"href="https://github.com/android/platform_development/tree/master/samples/RenderScript/HelloCompute">Hello
     33       Compute</a></li>
     34     </ol>
     35   </div>
     36 </div>
     37 
     38 <p>RenderScript is a framework for running computationally intensive tasks at high performance on
     39 Android. RenderScript is primarily oriented for use with data-parallel computation, although serial
     40 workloads can benefit as well. The RenderScript runtime parallelizes
     41 work across processors available on a device, such as multi-core CPUs and GPUs. This allows
     42 you to focus on expressing algorithms rather than scheduling work. RenderScript is
     43 especially useful for applications performing image processing, computational photography, or
     44 computer vision.</p>
     45 
     46 <p>To begin with RenderScript, there are two main concepts you should understand:</p>
     47 <ul>
     48 
     49 <li>The <em>language</em> itself is a C99-derived language for writing high-performance compute
     50 code. <a href="#writing-an-rs-kernel">Writing a RenderScript Kernel</a> describes
     51 how to use it to write compute kernels.</li>
     52 
     53 <li>The <em>control API</em> is used for managing the lifetime of RenderScript resources and
     54 controlling kernel execution. It is available in three different languages: Java, C++ in Android
     55 NDK, and the C99-derived kernel language itself.
     56 <a href="#using-rs-from-java">Using RenderScript from Java Code</a> and
     57 <a href=#single-source-rs>Single-Source RenderScript</a> describe the first and the third
     58 options, respectively.</li>
     59 </ul>
     60 
     61 <h2 id="writing-an-rs-kernel">Writing a RenderScript Kernel</h2>
     62 
     63 <p>A RenderScript kernel typically resides in a <code>.rs</code> file in the
     64 <code>&lt;project_root&gt;/src/</code> directory; each <code>.rs</code> file is called a
     65 <i>script</i>. Every script contains its own set of kernels, functions, and variables. A script can
     66 contain:</p>
     67 
     68 <ul>
     69 <li>A pragma declaration (<code>#pragma version(1)</code>) that declares the version of the
     70 RenderScript kernel language used in this script. Currently, 1 is the only valid value.</li>
     71 
     72 <li>A pragma declaration (<code>#pragma rs java_package_name(com.example.app)</code>) that
     73 declares the package name of the Java classes reflected from this script.
     74 Note that your <code>.rs</code> file must be part of your application package, and not in a
     75 library project.</li>
     76 
     77 <li>Zero or more <strong><i>invokable functions</i></strong>. An invokable function is a single-threaded RenderScript
     78 function that you can call from your Java code with arbitrary arguments. These are often useful for
     79 initial setup or serial computations within a larger processing pipeline.</li>
     80 
     81 <li><p>Zero or more <strong><i>script globals</i></strong>. A script global is equivalent to a global variable in C. You can
     82 access script globals from Java code, and these are often used for parameter passing to RenderScript
     83 kernels.</p></li>
     84 
     85 <li><p>Zero or more <strong><i>compute kernels</i></strong>. A compute kernel is a function
     86 or collection of functions that you can direct the RenderScript runtime to execute in parallel
     87 across a collection of data. There are two kinds of compute
     88 kernels: <i>mapping</i> kernels (also called <i>foreach</i> kernels)
     89 and <i>reduction</i> kernels.</p>
     90 
     91 <p>A <em>mapping kernel</em> is a parallel function that operates on a collection of {@link
     92   android.renderscript.Allocation Allocations} of the same dimensions. By default, it executes
     93   once for every coordinate in those dimensions. It is typically (but not exclusively) used to
     94   transform a collection of input {@link android.renderscript.Allocation Allocations} to an
     95   output {@link android.renderscript.Allocation} one {@link android.renderscript.Element} at a
     96   time.</p>
     97 
     98 <ul>
     99 <li><p>Here is an example of a simple <strong>mapping kernel</strong>:</p>
    100 
    101 <pre>uchar4 RS_KERNEL invert(uchar4 in, uint32_t x, uint32_t y) {
    102   uchar4 out = in;
    103   out.r = 255 - in.r;
    104   out.g = 255 - in.g;
    105   out.b = 255 - in.b;
    106   return out;
    107 }</pre>
    108 
    109 <p>In most respects, this is identical to a standard C
    110   function. The <a href="#RS_KERNEL"><code>RS_KERNEL</code></a> property applied to the
    111   function prototype specifies that the function is a RenderScript mapping kernel instead of an
    112   invokable function. The <code>in</code> argument is automatically filled in based on the
    113   input {@link android.renderscript.Allocation} passed to the kernel launch. The
    114   arguments <code>x</code> and <code>y</code> are
    115   discussed <a href="#special-arguments">below</a>. The value returned from the kernel is
    116   automatically written to the appropriate location in the output {@link
    117   android.renderscript.Allocation}. By default, this kernel is run across its entire input
    118   {@link android.renderscript.Allocation}, with one execution of the kernel function per {@link
    119   android.renderscript.Element} in the {@link android.renderscript.Allocation}.</p>
    120 
    121 <p>A mapping kernel may have one or more input {@link android.renderscript.Allocation
    122   Allocations}, a single output {@link android.renderscript.Allocation}, or both. The
    123   RenderScript runtime checks to ensure that all input and output Allocations have the same
    124   dimensions, and that the {@link android.renderscript.Element} types of the input and output
    125   Allocations match the kernel's prototype; if either of these checks fails, RenderScript
    126   throws an exception.</p>
    127 
    128 <p class="note"><strong>NOTE:</strong> Before Android 6.0 (API level 23), a mapping kernel may
    129   not have more than one input {@link android.renderscript.Allocation}.</p>
    130 
    131 <p>If you need more input or output {@link android.renderscript.Allocation Allocations} than
    132   the kernel has, those objects should be bound to <code>rs_allocation</code> script globals
    133   and accessed from a kernel or invokable function
    134   via <code>rsGetElementAt_<i>type</i>()</code> or <code>rsSetElementAt_<i>type</i>()</code>.</p>
    135 
    136 <p><strong>NOTE:</strong> <a id="RS_KERNEL"><code>RS_KERNEL</code></a> is a macro
    137   defined automatically by RenderScript for your convenience:</p>
    138 <pre>
    139 #define RS_KERNEL __attribute__((kernel))
    140 </pre>
    141 </li>
    142 </ul>
    143 
    144 <p>A <em>reduction kernel</em> is a family of functions that operates on a collection of input
    145   {@link android.renderscript.Allocation Allocations} of the same dimensions. By default,
    146   its <a href="#accumulator-function">accumulator function</a> executes once for every
    147   coordinate in those dimensions.  It is typically (but not exclusively) used to "reduce" a
    148   collection of input {@link android.renderscript.Allocation Allocations} to a single
    149   value.</p>
    150 
    151 <ul>
    152 <li><p>Here is an <a id="example-addint">example</a> of a simple <strong>reduction
    153 kernel</strong> that adds up the {@link android.renderscript.Element Elements} of its
    154 input:</p>
    155 
    156 <pre>#pragma rs reduce(addint) accumulator(addintAccum)
    157 
    158 static void addintAccum(int *accum, int val) {
    159   *accum += val;
    160 }</pre>
    161 
    162 <p>A reduction kernel consists of one or more user-written functions.
    163 <code>#pragma rs reduce</code> is used to define the kernel by specifying its name
    164 (<code>addint</code>, in this example) and the names and roles of the functions that make
    165 up the kernel (an <code>accumulator</code> function <code>addintAccum</code>, in this
    166 example). All such functions must be <code>static</code>. A reduction kernel always
    167 requires an <code>accumulator</code> function; it may also have other functions, depending
    168 on what you want the kernel to do.</p>
    169 
    170 <p>A reduction kernel accumulator function must return <code>void</code> and must have at least
    171 two arguments. The first argument (<code>accum</code>, in this example) is a pointer to
    172 an <i>accumulator data item</i> and the second (<code>val</code>, in this example) is
    173 automatically filled in based on the input {@link android.renderscript.Allocation} passed to
    174 the kernel launch. The accumulator data item is created by the RenderScript runtime; by
    175 default, it is initialized to zero. By default, this kernel is run across its entire input
    176 {@link android.renderscript.Allocation}, with one execution of the accumulator function per
    177 {@link android.renderscript.Element} in the {@link android.renderscript.Allocation}. By
    178 default, the final value of the accumulator data item is treated as the result of the
    179 reduction, and is returned to Java.  The RenderScript runtime checks to ensure that the {@link
    180 android.renderscript.Element} type of the input Allocation matches the accumulator function's
    181 prototype; if it does not match, RenderScript throws an exception.</p>
    182 
    183 <p>A reduction kernel has one or more input {@link android.renderscript.Allocation
    184 Allocations} but no output {@link android.renderscript.Allocation Allocations}.</p></li>
    185 
    186 <p>Reduction kernels are explained in more detail <a href="#reduction-in-depth">here</a>.</p>
    187 
    188 <p>Reduction kernels are supported in Android 7.0 (API level 24) and later.</p>
    189 </li>
    190 </ul>
    191 
    192 <p>A mapping kernel function or a reduction kernel accumulator function may access the coordinates
    193 of the current execution using the <a id="special-arguments">special arguments</a> <code>x</code>,
    194 <code>y</code>, and <code>z</code>, which must be of type <code>int</code> or <code>uint32_t</code>.
    195 These arguments are optional.</p>
    196 
    197 <p>A mapping kernel function or a reduction kernel accumulator
    198 function may also take the optional special argument
    199 <code>context</code> of type <a
    200 href='reference/rs_for_each.html#android_rs:rs_kernel_context'>rs_kernel_context</a>.
    201 It is needed by a family of runtime APIs that are used to query
    202 certain properties of the current execution -- for example, <a
    203 href='reference/rs_for_each.html#android_rs:rsGetDimX'>rsGetDimX</a>.
    204 (The <code>context</code> argument is available in Android 6.0 (API level 23) and later.)</p>
    205 </li>
    206 
    207 <li>An optional <code>init()</code> function. An <code>init()</code> function is a special type of
    208 invokable function that RenderScript runs when the script is first instantiated. This allows for some
    209 computation to occur automatically at script creation.</li>
    210 
    211 <li>Zero or more <strong><i>static script globals and functions</i></strong>. A static script global is equivalent to a
    212 script global except that it cannot be accessed from Java code. A static function is a standard C
    213 function that can be called from any kernel or invokable function in the script but is not exposed
    214 to the Java API. If a script global or function does not need to be called from Java code, it is
    215 highly recommended that it be declared <code>static</code>.</li> </ul>
    216 
    217 <h4>Setting floating point precision</h4>
    218 
    219 <p>You can control the required level of floating point precision in a script. This is useful if
    220 full IEEE 754-2008 standard (used by default) is not required. The following pragmas can set a
    221 different level of floating point precision:</p>
    222 
    223 <ul>
    224 
    225 <li><code>#pragma rs_fp_full</code> (default if nothing is specified): For apps that require
    226   floating point precision as outlined by the IEEE 754-2008 standard.
    227 
    228 </li>
    229 
    230   <li><code>#pragma rs_fp_relaxed</code>: For apps that dont require strict IEEE 754-2008
    231     compliance and can tolerate less precision. This mode enables flush-to-zero for denorms and
    232     round-towards-zero.
    233 
    234 </li>
    235 
    236   <li><code>#pragma rs_fp_imprecise</code>: For apps that dont have stringent precision
    237     requirements. This mode enables everything in <code>rs_fp_relaxed</code> along with the
    238     following:
    239 
    240 <ul>
    241 
    242   <li>Operations resulting in -0.0 can return +0.0 instead.</li>
    243   <li>Operations on INF and NAN are undefined.</li>
    244 </ul>
    245 </li>
    246 </ul>
    247 
    248 <p>Most applications can use <code>rs_fp_relaxed</code> without any side effects. This may be very
    249 beneficial on some architectures due to additional optimizations only available with relaxed
    250 precision (such as SIMD CPU instructions).</p>
    251 
    252 
    253 <h2 id="access-rs-apis">Accessing RenderScript APIs from Java</h2>
    254 
    255 <p>When developing an Android application that uses RenderScript, you can access its API from Java in
    256   one of two ways:</p>
    257 
    258 <ul>
    259   <li><strong>{@link android.renderscript}</strong> - The APIs in this class package are
    260     available on devices running Android 3.0 (API level 11) and higher. </li>
    261   <li><strong>{@link android.support.v8.renderscript}</strong> - The APIs in this package are
    262     available through a <a href="{@docRoot}tools/support-library/features.html#v8">Support
    263     Library</a>, which allows you to use them on devices running Android 2.3 (API level 9) and
    264     higher.</li>
    265 </ul>
    266 
    267 <p>Here are the tradeoffs:</p>
    268 
    269 <ul>
    270 <li>If you use the Support Library APIs, the RenderScript portion of your application will be
    271   compatible with devices running Android 2.3 (API level 9) and higher, regardless of which RenderScript
    272   features you use. This allows your application to work on more devices than if you use the
    273   native (<strong>{@link android.renderscript}</strong>) APIs.</li>
    274 <li>Certain RenderScript features are not available through the Support Library APIs.</li>
    275 <li>If you use the Support Library APIs, you will get (possibly significantly) larger APKs than
    276 if you use the native (<strong>{@link android.renderscript}</strong>) APIs.</li>
    277 </ul>
    278 
    279 <h3 id="ide-setup">Using the RenderScript Support Library APIs</h3>
    280 
    281 <p>In order to use the Support Library RenderScript APIs, you must configure your development
    282   environment to be able to access them. The following Android SDK tools are required for using
    283   these APIs:</p>
    284 
    285 <ul>
    286   <li>Android SDK Tools revision 22.2 or higher</li>
    287   <li>Android SDK Build-tools revision 18.1.0 or higher</li>
    288 </ul>
    289 
    290 <p>You can check and update the installed version of these tools in the
    291   <a href="{@docRoot}tools/help/sdk-manager.html">Android SDK Manager</a>.</p>
    292 
    293 
    294 <p>To use the Support Library RenderScript APIs:</p>
    295 
    296 <ol>
    297   <li>Make sure you have the required Android SDK version and Build Tools version installed.</li>
    298   <li> Update the settings for the Android build process to include the RenderScript settings:
    299 
    300     <ul>
    301       <li>Open the {@code build.gradle} file in the app folder of your application module. </li>
    302       <li>Add the following RenderScript settings to the file:
    303 
    304 <pre>
    305 android {
    306     compileSdkVersion 23
    307     buildToolsVersion "23.0.3"
    308 
    309     defaultConfig {
    310         minSdkVersion 9
    311         targetSdkVersion 19
    312 <strong>
    313         renderscriptTargetApi 18
    314         renderscriptSupportModeEnabled true
    315 </strong>
    316     }
    317 }
    318 </pre>
    319 
    320 
    321     <p>The settings listed above control specific behavior in the Android build process:</p>
    322 
    323     <ul>
    324       <li>{@code renderscriptTargetApi} - Specifies the bytecode version to be generated. We
    325       recommend you set this value to the lowest API level able to provide all the functionality
    326       you are using and set {@code renderscriptSupportModeEnabled} to {@code true}.
    327       Valid values for this setting are any integer value
    328       from 11 to the most recently released API level. If your minimum SDK version specified in your
    329       application manifest is set to a different value, that value is ignored and the target value
    330       in the build file is used to set the minimum SDK version.</li>
    331       <li>{@code renderscriptSupportModeEnabled} - Specifies that the generated bytecode should fall
    332       back to a compatible version if the device it is running on does not support the target
    333       version.
    334       </li>
    335       <li>{@code buildToolsVersion} - The version of the Android SDK build tools to use. This value
    336       should be set to {@code 18.1.0} or higher. If this option is not specified, the highest
    337       installed build tools version is used. You should always set this value to ensure the
    338       consistency of builds across development machines with different configurations.</li>
    339     </ul>
    340     </li>
    341    </ul>
    342 
    343   <li>In your application classes that use RenderScript, add an import for the Support Library
    344     classes:
    345 
    346 <pre>
    347 import android.support.v8.renderscript.*;
    348 </pre>
    349 
    350   </li>
    351 
    352 </ol>
    353 
    354 <h2 id="using-rs-from-java">Using RenderScript from Java Code</h2>
    355 
    356 <p>Using RenderScript from Java code relies on the API classes located in the
    357 {@link android.renderscript} or the {@link android.support.v8.renderscript} package. Most
    358 applications follow the same basic usage pattern:</p>
    359 
    360 <ol>
    361 
    362 <li><strong>Initialize a RenderScript context.</strong> The {@link
    363 android.renderscript.RenderScript} context, created with {@link
    364 android.renderscript.RenderScript#create}, ensures that RenderScript can be used and provides an
    365 object to control the lifetime of all subsequent RenderScript objects. You should consider context
    366 creation to be a potentially long-running operation, since it may create resources on different
    367 pieces of hardware; it should not be in an application's critical path if at all
    368 possible. Typically, an application will have only a single RenderScript context at a time.</li>
    369 
    370 <li><strong>Create at least one {@link android.renderscript.Allocation} to be passed to a
    371 script.</strong> An {@link android.renderscript.Allocation} is a RenderScript object that provides
    372 storage for a fixed amount of data. Kernels in scripts take {@link android.renderscript.Allocation}
    373 objects as their input and output, and {@link android.renderscript.Allocation} objects can be
    374 accessed in kernels using <code>rsGetElementAt_<i>type</i>()</code> and
    375 <code>rsSetElementAt_<i>type</i>()</code> when bound as script globals. {@link
    376 android.renderscript.Allocation} objects allow arrays to be passed from Java code to RenderScript
    377 code and vice-versa. {@link android.renderscript.Allocation} objects are typically created using
    378 {@link android.renderscript.Allocation#createTyped createTyped()} or {@link
    379 android.renderscript.Allocation#createFromBitmap createFromBitmap()}.</li>
    380 
    381 <li><strong>Create whatever scripts are necessary.</strong> There are two types of scripts available
    382 to you when using RenderScript:
    383 
    384 <ul>
    385 
    386 <li><strong>ScriptC</strong>: These are the user-defined scripts as described in <a
    387 href="#writing-an-rs-kernel"><i>Writing a RenderScript Kernel</i></a> above. Every script has a Java class
    388 reflected by the RenderScript compiler in order to make it easy to access the script from Java code;
    389 this class has the name <code>ScriptC_<i>filename</i></code>. For example, if the mapping kernel
    390 above were located in <code>invert.rs</code> and a RenderScript context were already located in
    391 <code>mRenderScript</code>, the Java code to instantiate the script would be:
    392 
    393 <pre>ScriptC_invert invert = new ScriptC_invert(mRenderScript);</pre></li>
    394 
    395 <li><strong>ScriptIntrinsic</strong>: These are built-in RenderScript kernels for common operations,
    396 such as Gaussian blur, convolution, and image blending. For more information, see the subclasses of
    397 {@link android.renderscript.ScriptIntrinsic}.</li>
    398 
    399 </ul></li>
    400 
    401 <li><strong>Populate Allocations with data.</strong> Except for Allocations created with {@link
    402 android.renderscript.Allocation#createFromBitmap createFromBitmap()}, an Allocation is populated with empty data when it is
    403 first created. To populate an Allocation, use one of the "copy" methods in {@link
    404 android.renderscript.Allocation}. The "copy" methods are <a href="#asynchronous-model">synchronous</a>.</li>
    405 
    406 <li><strong>Set any necessary script globals.</strong> You may set globals using methods in the
    407   same <code>ScriptC_<i>filename</i></code> class named <code>set_<i>globalname</i></code>. For
    408   example, in order to set an <code>int</code> variable named <code>threshold</code>, use the
    409   Java method <code>set_threshold(int)</code>; and in order to set
    410   an <code>rs_allocation</code> variable named <code>lookup</code>, use the Java
    411   method <code>set_lookup(Allocation)</code>. The <code>set</code> methods
    412   are <a href="#asynchronous-model">asynchronous</a>.</li>
    413 
    414 <li><strong>Launch the appropriate kernels and invokable functions.</strong>
    415 <p>Methods to launch a given kernel are
    416 reflected in the same <code>ScriptC_<i>filename</i></code> class with methods named
    417 <code>forEach_<i>mappingKernelName</i>()</code>
    418 or <code>reduce_<i>reductionKernelName</i>()</code>.
    419 These launches are <a href="#asynchronous-model">asynchronous</a>.
    420 Depending on the arguments to the kernel, the
    421 method takes one or more Allocations, all of which must have the same dimensions. By default, a
    422 kernel executes over every coordinate in those dimensions; to execute a kernel over a subset of those coordinates,
    423 pass an appropriate {@link
    424 android.renderscript.Script.LaunchOptions} as the last argument to the <code>forEach</code> or <code>reduce</code> method.</p>
    425 
    426 <p>Launch invokable functions using the <code>invoke_<i>functionName</i></code> methods
    427 reflected in the same <code>ScriptC_<i>filename</i></code> class.
    428 These launches are <a href="#asynchronous-model">asynchronous</a>.</p></li>
    429 
    430 <li><strong>Retrieve data from {@link android.renderscript.Allocation} objects
    431 and <i><a href="#javaFutureType">javaFutureType</a></i> objects.</strong>
    432 In order to
    433 access data from an {@link android.renderscript.Allocation} from Java code, you must copy that data
    434 back to Java using one of the "copy" methods in {@link
    435 android.renderscript.Allocation}.
    436 In order to obtain the result of a reduction kernel, you must use the <code><i>javaFutureType</i>.get()</code> method.
    437 The "copy" and <code>get()</code> methods are <a href="#asynchronous-model">synchronous</a>.</li>
    438 
    439 <li><strong>Tear down the RenderScript context.</strong> You can destroy the RenderScript context
    440 with {@link android.renderscript.RenderScript#destroy} or by allowing the RenderScript context
    441 object to be garbage collected. This causes any further use of any object belonging to that
    442 context to throw an exception.</li> </ol>
    443 
    444 <h3 id="asynchronous-model">Asynchronous execution model</h3>
    445 
    446 <p>The reflected <code>forEach</code>, <code>invoke</code>, <code>reduce</code>,
    447   and <code>set</code> methods are asynchronous -- each may return to Java before completing the
    448   requested action.  However, the individual actions are serialized in the order in which they are launched.</p>
    449 
    450 <p>The {@link android.renderscript.Allocation} class provides "copy" methods to copy data to
    451   and from Allocations.  A "copy" method is synchronous, and is serialized with respect to any
    452   of the asynchronous actions above that touch the same Allocation.</p>
    453 
    454 <p>The reflected <i><a href="#javaFutureType">javaFutureType</a></i> classes provide
    455   a <code>get()</code> method to obtain the result of a reduction. <code>get()</code> is
    456   synchronous, and is serialized with respect to the reduction (which is asynchronous).</p>
    457 
    458 <h2 id="single-source-rs">Single-Source RenderScript</h2>
    459 
    460 <p>Android 7.0 (API level 24) introduces a new programming feature called <em>Single-Source
    461 RenderScript</em>, in which kernels are launched from the script where they are defined, rather than
    462 from Java. This approach is currently limited to mapping kernels, which are simply referred to as "kernels"
    463 in this section for conciseness. This new feature also supports creating allocations of type
    464 <a href={@docRoot}guide/topics/renderscript/reference/rs_object_types.html#android_rs:rs_allocation>
    465 <code>rs_allocation</code></a> from inside the script. It is now possible to
    466 implement a whole algorithm solely within a script, even if multiple kernel launches are required.
    467 The benefit is twofold: more readable code, because it keeps the implementation of an algorithm in
    468 one language; and potentially faster code, because of fewer transitions between Java and
    469 RenderScript across multiple kernel launches.</p>
    470 
    471 <p>In Single-Source RenderScript, you write kernels as described in <a href="#writing-an-rs-kernel">
    472 Writing a RenderScript Kernel</a>. You then write an invokable function that calls
    473 <a href="{@docRoot}guide/topics/renderscript/reference/rs_for_each.html#android_rs:rsForEach">
    474 <code>rsForEach()</code></a> to launch them. That API takes a kernel function as the first
    475 parameter, followed by input and output allocations. A similar API
    476 <a href="{@docRoot}guide/topics/renderscript/reference/rs_for_each.html#android_rs:rsForEachWithOptions">
    477 <code>rsForEachWithOptions()</code></a> takes an extra argument of type
    478 <a href="{@docRoot}guide/topics/renderscript/reference/rs_for_each.html#android_rs:rs_script_call_t">
    479 <code>rs_script_call_t</code></a>, which specifies a subset of the elements from the input and
    480 output allocations for the kernel function to process.</p>
    481 
    482 <p>To start RenderScript computation, you call the invokable function from Java.
    483 Follow the steps in <a href="#using-rs-from-java">Using RenderScript from Java Code</a>.
    484 In the step <a href="#launching_kernels">launch the appropriate kernels</a>, call
    485 the invokable function using <code>invoke_<i>function_name</i>()</code>, which will start the
    486 whole computation, including launching kernels.</p>
    487 
    488 <p>Allocations are often needed to save and pass
    489 intermediate results from one kernel launch to another. You can create them using
    490 <a href="{@docRoot}guide/topics/renderscript/reference/rs_allocation_create.html#android_rs:rsCreateAllocation">
    491 rsCreateAllocation()</a>. One easy-to-use form of that API is <code>
    492 rsCreateAllocation_&ltT&gt&ltW&gt(&hellip;)</code>, where <i>T</i> is the data type for an
    493 element, and <i>W</i> is the vector width for the element. The API takes the sizes in
    494 dimensions X, Y, and Z as arguments. For 1D or 2D allocations, the size for dimension Y or Z can
    495 be omitted. For example, <code>rsCreateAllocation_uchar4(16384)</code> creates a 1D allocation of
    496 16384 elements, each of which is of type <code>uchar4</code>.</p>
    497 
    498 <p>Allocations are managed by the system automatically. You
    499 do not have to explicitly release or free them. However, you can call
    500 <a href="{@docRoot}guide/topics/renderscript/reference/rs_object_info.html#android_rs:rsClearObject">
    501 <code>rsClearObject(rs_allocation* alloc)</code></a> to indicate you no longer need the handle
    502 <code>alloc</code> to the underlying allocation,
    503 so that the system can free up resources as early as possible.</p>
    504 
    505 <p>The <a href="#writing-an-rs-kernel">Writing a RenderScript Kernel</a> section contains an example
    506 kernel that inverts an image. The example below expands that to apply more than one effect to an image,
    507 using Single-Source RenderScript. It includes another kernel, <code>greyscale</code>, which turns a
    508 color image into black-and-white. An invokable function <code>process()</code> then applies those two kernels
    509 consecutively to an input image, and produces an output image. Allocations for both the input and
    510 the output are passed in as arguments of type
    511 <a href={@docRoot}guide/topics/renderscript/reference/rs_object_types.html#android_rs:rs_allocation>
    512 <code>rs_allocation</code></a>.</p>
    513 
    514 <pre>
    515 // File: singlesource.rs
    516 
    517 #pragma version(1)
    518 #pragma rs java_package_name(com.android.rssample)
    519 
    520 static const float4 weight = {0.299f, 0.587f, 0.114f, 0.0f};
    521 
    522 uchar4 RS_KERNEL invert(uchar4 in, uint32_t x, uint32_t y) {
    523   uchar4 out = in;
    524   out.r = 255 - in.r;
    525   out.g = 255 - in.g;
    526   out.b = 255 - in.b;
    527   return out;
    528 }
    529 
    530 uchar4 RS_KERNEL greyscale(uchar4 in) {
    531   const float4 inF = rsUnpackColor8888(in);
    532   const float4 outF = (float4){ dot(inF, weight) };
    533   return rsPackColorTo8888(outF);
    534 }
    535 
    536 void process(rs_allocation inputImage, rs_allocation outputImage) {
    537   const uint32_t imageWidth = rsAllocationGetDimX(inputImage);
    538   const uint32_t imageHeight = rsAllocationGetDimY(inputImage);
    539   rs_allocation tmp = rsCreateAllocation_uchar4(imageWidth, imageHeight);
    540   rsForEach(invert, inputImage, tmp);
    541   rsForEach(greyscale, tmp, outputImage);
    542 }
    543 </pre>
    544 
    545 <p>You can call the <code>process()</code> function from Java as follows:</p>
    546 
    547 <pre>
    548 // File SingleSource.java
    549 
    550 RenderScript RS = RenderScript.create(context);
    551 ScriptC_singlesource script = new ScriptC_singlesource(RS);
    552 Allocation inputAllocation = Allocation.createFromBitmapResource(
    553     RS, getResources(), R.drawable.image);
    554 Allocation outputAllocation = Allocation.createTyped(
    555     RS, inputAllocation.getType(),
    556     Allocation.USAGE_SCRIPT | Allocation.USAGE_IO_OUTPUT);
    557 script.invoke_process(inputAllocation, outputAllocation);
    558 </pre>
    559 
    560 <p>This example shows how an algorithm that involves two kernel launches can be implemented completely
    561 in the RenderScript language itself. Without Single-Source
    562 RenderScript, you would have to launch both kernels from the Java code, separating kernel launches
    563 from kernel definitions and making it harder to understand the whole algorithm. Not only is the
    564 Single-Source RenderScript code easier to read, it also eliminates the transitioning
    565 between Java and the script across kernel launches. Some iterative algorithms may launch kernels
    566 hundreds of times, making the overhead of such transitioning considerable.</p>
    567 
    568 <h2 id="reduction-in-depth">Reduction Kernels in Depth</h2>
    569 
    570 <p><i>Reduction</i> is the process of combining a collection of data into a single
    571 value. This is a useful primitive in parallel programming, with applications such as the
    572 following:</p>
    573 <ul>
    574   <li>computing the sum or product over all the data</li>
    575   <li>computing logical operations (<code>and</code>, <code>or</code>, <code>xor</code>)
    576   over all the data</li>
    577   <li>finding the minimum or maximum value within the data</li>
    578   <li>searching for a specific value or for the coordinate of a specific value within the data</li>
    579 </ul>
    580 
    581 <p>In Android 7.0 (API level 24) and later, RenderScript supports <i>reduction kernels</i> to allow
    582 efficient user-written reduction algorithms. You may launch reduction kernels on inputs with
    583 1, 2, or 3 dimensions.<p>
    584 
    585 <p>An example above shows a simple <a href="#example-addint">addint</a> reduction kernel.
    586 Here is a more complicated <a id="example-findMinAndMax">findMinAndMax</a> reduction kernel
    587 that finds the locations of the minimum and maximum <code>long</code> values in a
    588 1-dimensional {@link android.renderscript.Allocation}:</p>
    589 
    590 <pre>
    591 #define LONG_MAX (long)((1UL << 63) - 1)
    592 #define LONG_MIN (long)(1UL << 63)
    593 
    594 #pragma rs reduce(findMinAndMax) \
    595   initializer(fMMInit) accumulator(fMMAccumulator) \
    596   combiner(fMMCombiner) outconverter(fMMOutConverter)
    597 
    598 // Either a value and the location where it was found, or <a href="#INITVAL">INITVAL</a>.
    599 typedef struct {
    600   long val;
    601   int idx;     // -1 indicates <a href="#INITVAL">INITVAL</a>
    602 } IndexedVal;
    603 
    604 typedef struct {
    605   IndexedVal min, max;
    606 } MinAndMax;
    607 
    608 // In discussion below, this initial value { { LONG_MAX, -1 }, { LONG_MIN, -1 } }
    609 // is called <a id="INITVAL">INITVAL</a>.
    610 static void fMMInit(MinAndMax *accum) {
    611   accum->min.val = LONG_MAX;
    612   accum->min.idx = -1;
    613   accum->max.val = LONG_MIN;
    614   accum->max.idx = -1;
    615 }
    616 
    617 //----------------------------------------------------------------------
    618 // In describing the behavior of the accumulator and combiner functions,
    619 // it is helpful to describe hypothetical functions
    620 //   IndexedVal min(IndexedVal a, IndexedVal b)
    621 //   IndexedVal max(IndexedVal a, IndexedVal b)
    622 //   MinAndMax  minmax(MinAndMax a, MinAndMax b)
    623 //   MinAndMax  minmax(MinAndMax accum, IndexedVal val)
    624 //
    625 // The effect of
    626 //   IndexedVal min(IndexedVal a, IndexedVal b)
    627 // is to return the IndexedVal from among the two arguments
    628 // whose val is lesser, except that when an IndexedVal
    629 // has a negative index, that IndexedVal is never less than
    630 // any other IndexedVal; therefore, if exactly one of the
    631 // two arguments has a negative index, the min is the other
    632 // argument. Like ordinary arithmetic min and max, this function
    633 // is commutative and associative; that is,
    634 //
    635 //   min(A, B) == min(B, A)               // commutative
    636 //   min(A, min(B, C)) == min((A, B), C)  // associative
    637 //
    638 // The effect of
    639 //   IndexedVal max(IndexedVal a, IndexedVal b)
    640 // is analogous (greater . . . never greater than).
    641 //
    642 // Then there is
    643 //
    644 //   MinAndMax minmax(MinAndMax a, MinAndMax b) {
    645 //     return MinAndMax(min(a.min, b.min), max(a.max, b.max));
    646 //   }
    647 //
    648 // Like ordinary arithmetic min and max, the above function
    649 // is commutative and associative; that is:
    650 //
    651 //   minmax(A, B) == minmax(B, A)                  // commutative
    652 //   minmax(A, minmax(B, C)) == minmax((A, B), C)  // associative
    653 //
    654 // Finally define
    655 //
    656 //   MinAndMax minmax(MinAndMax accum, IndexedVal val) {
    657 //     return minmax(accum, MinAndMax(val, val));
    658 //   }
    659 //----------------------------------------------------------------------
    660 
    661 // This function can be explained as doing:
    662 //   *accum = minmax(*accum, IndexedVal(in, x))
    663 //
    664 // This function simply computes minimum and maximum values as if
    665 // INITVAL.min were greater than any other minimum value and
    666 // INITVAL.max were less than any other maximum value.  Note that if
    667 // *accum is INITVAL, then this function sets
    668 //   *accum = IndexedVal(in, x)
    669 //
    670 // After this function is called, both accum->min.idx and accum->max.idx
    671 // will have nonnegative values:
    672 // - x is always nonnegative, so if this function ever sets one of the
    673 //   idx fields, it will set it to a nonnegative value
    674 // - if one of the idx fields is negative, then the corresponding
    675 //   val field must be LONG_MAX or LONG_MIN, so the function will always
    676 //   set both the val and idx fields
    677 static void fMMAccumulator(MinAndMax *accum, long in, int x) {
    678   IndexedVal me;
    679   me.val = in;
    680   me.idx = x;
    681 
    682   if (me.val <= accum->min.val)
    683     accum->min = me;
    684   if (me.val >= accum->max.val)
    685     accum->max = me;
    686 }
    687 
    688 // This function can be explained as doing:
    689 //   *accum = minmax(*accum, *val)
    690 //
    691 // This function simply computes minimum and maximum values as if
    692 // INITVAL.min were greater than any other minimum value and
    693 // INITVAL.max were less than any other maximum value.  Note that if
    694 // one of the two accumulator data items is INITVAL, then this
    695 // function sets *accum to the other one.
    696 static void fMMCombiner(MinAndMax *accum,
    697                         const MinAndMax *val) {
    698   if ((accum->min.idx < 0) || (val->min.val < accum->min.val))
    699     accum->min = val->min;
    700   if ((accum->max.idx < 0) || (val->max.val > accum->max.val))
    701     accum->max = val->max;
    702 }
    703 
    704 static void fMMOutConverter(int2 *result,
    705                             const MinAndMax *val) {
    706   result->x = val->min.idx;
    707   result->y = val->max.idx;
    708 }
    709 </pre>
    710 
    711 <p class="note"><strong>NOTE:</strong> There are more example reduction
    712   kernels <a href="#more-example">here</a>.</p>
    713 
    714 <p>In order to run a reduction kernel, the RenderScript runtime creates <em>one or more</em>
    715 variables called <a id="accumulator-data-items"><strong><i>accumulator data
    716 items</i></strong></a> to hold the state of the reduction process. The RenderScript runtime
    717 picks the number of accumulator data items in such a way as to maximize performance. The type
    718 of the accumulator data items (<i>accumType</i>) is determined by the kernel's <i>accumulator
    719 function</i> -- the first argument to that function is a pointer to an accumulator data
    720 item. By default, every accumulator data item is initialized to zero (as if
    721 by <code>memset</code>); however, you may write an <i>initializer function</i> to do something
    722 different.</p>
    723 
    724 <p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a>
    725 kernel, the accumulator data items (of type <code>int</code>) are used to add up input
    726 values. There is no initializer function, so each accumulator data item is initialized to
    727 zero.</p>
    728 
    729 <p class="note"><strong>Example:</strong> In
    730 the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the accumulator data items
    731 (of type <code>MinAndMax</code>) are used to keep track of the minimum and maximum values
    732 found so far. There is an initializer function to set these to <code>LONG_MAX</code> and
    733 <code>LONG_MIN</code>, respectively; and to set the locations of these values to -1, indicating that
    734 the values are not actually present in the (empty) portion of the input that has been
    735 processed.</p>
    736 
    737 <p>RenderScript calls your accumulator function once for every coordinate in the
    738 input(s). Typically, your function should update the accumulator data item in some way
    739 according to the input.</p>
    740 
    741 <p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a>
    742 kernel, the accumulator function adds the value of an input Element to the accumulator
    743 data item.</p>
    744 
    745 <p class="note"><strong>Example:</strong> In
    746 the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the accumulator function
    747 checks to see whether the value of an input Element is less than or equal to the minimum
    748 value recorded in the accumulator data item and/or greater than or equal to the maximum
    749 value recorded in the accumulator data item, and updates the accumulator data item
    750 accordingly.</p>
    751 
    752 <p>After the accumulator function has been called once for every coordinate in the input(s),
    753 RenderScript must <strong>combine</strong> the <a href="#accumulator-data-items">accumulator
    754 data items</a> together into a single accumulator data item. You may write a <i>combiner
    755 function</i> to do this. If the accumulator function has a single input and
    756 no <a href="#special-arguments">special arguments</a>, then you do not need to write a combiner
    757 function; RenderScript will use the accumulator function to combine the accumulator data
    758 items. (You may still write a combiner function if this default behavior is not what you
    759 want.)</p>
    760 
    761 <p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a>
    762 kernel, there is no combiner function, so the accumulator function will be used. This is
    763 the correct behavior, because if we split a collection of values into two pieces, and we
    764 add up the values in those two pieces separately, adding up those two sums is the same as
    765 adding up the entire collection.</p>
    766 
    767 <p class="note"><strong>Example:</strong> In
    768 the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the combiner function
    769 checks to see whether the minimum value recorded in the "source" accumulator data
    770 item <code>*val</code> is less then the minimum value recorded in the "destination"
    771 accumulator data item <code>*accum</code>, and updates <code>*accum</code>
    772 accordingly. It does similar work for the maximum value. This updates <code>*accum</code>
    773 to the state it would have had if all of the input values had been accumulated into
    774 <code>*accum</code> rather than some into <code>*accum</code> and some into
    775 <code>*val</code>.</p>
    776 
    777 <p>After all of the accumulator data items have been combined, RenderScript determines
    778 the result of the reduction to return to Java. You may write an <i>outconverter
    779 function</i> to do this. You do not need to write an outconverter function if you want
    780 the final value of the combined accumulator data items to be the result of the reduction.</p>
    781 
    782 <p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a> kernel,
    783 there is no outconverter function.  The final value of the combined data items is the sum of
    784 all Elements of the input, which is the value we want to return.</p>
    785 
    786 <p class="note"><strong>Example:</strong> In
    787 the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the outconverter function
    788 initializes an <code>int2</code> result value to hold the locations of the minimum and
    789 maximum values resulting from the combination of all of the accumulator data items.</p>
    790 
    791 <h3 id="writing-reduction-kernel">Writing a reduction kernel</h3>
    792 
    793 <p><code>#pragma rs reduce</code> defines a reduction kernel by
    794 specifying its name and the names and roles of the functions that make
    795 up the kernel.  All such functions must be
    796 <code>static</code>. A reduction kernel always requires an <code>accumulator</code>
    797 function; you can omit some or all of the other functions, depending on what you want the
    798 kernel to do.</p>
    799 
    800 <pre>#pragma rs reduce(<i>kernelName</i>) \
    801   initializer(<i>initializerName</i>) \
    802   accumulator(<i>accumulatorName</i>) \
    803   combiner(<i>combinerName</i>) \
    804   outconverter(<i>outconverterName</i>)
    805 </pre>
    806 
    807 <p>The meaning of the items in the <code>#pragma</code> is as follows:</p>
    808 <ul>
    809 
    810 <li><code>reduce(<i>kernelName</i>)</code> (mandatory): Specifies that a reduction kernel is
    811 being defined. A reflected Java method <code>reduce_<i>kernelName</i></code> will launch the
    812 kernel.</li>
    813 
    814 <li><p><code>initializer(<i>initializerName</i>)</code> (optional): Specifies the name of the
    815 initializer function for this reduction kernel. When you launch the kernel, RenderScript calls
    816 this function once for each <a href="#accumulator-data-items">accumulator data item</a>. The
    817 function must be defined like this:</p>
    818 
    819 <pre>static void <i>initializerName</i>(<i>accumType</i> *accum) {  }</pre>
    820 
    821 <p><code>accum</code> is a pointer to an accumulator data item for this function to
    822 initialize.</p>
    823 
    824 <p>If you do not provide an initializer function, RenderScript initializes every accumulator
    825 data item to zero (as if by <code>memset</code>), behaving as if there were an initializer
    826 function that looks like this:</p>
    827 <pre>static void <i>initializerName</i>(<i>accumType</i> *accum) {
    828   memset(accum, 0, sizeof(*accum));
    829 }</pre>
    830 </li>
    831 
    832 <li><p><code><a id="accumulator-function">accumulator(<i>accumulatorName</i>)</a></code>
    833 (mandatory): Specifies the name of the accumulator function for this
    834 reduction kernel. When you launch the kernel, RenderScript calls
    835 this function once for every coordinate in the input(s), to update an
    836 accumulator data item in some way according to the input(s). The function
    837 must be defined like this:</p>
    838 
    839 <pre>
    840 static void <i>accumulatorName</i>(<i>accumType</i> *accum,
    841                             <i>in1Type</i> in1, <i>&hellip;,</i> <i>inNType</i> in<i>N</i>
    842                             <i>[, specialArguments]</i>) { &hellip; }
    843 </pre>
    844 
    845 <p><code>accum</code> is a pointer to an accumulator data item for this function to
    846 modify. <code>in1</code> through <code>in<i>N</i></code> are one <em>or more</em> arguments that
    847 are automatically filled in based on the inputs passed to the kernel launch, one argument
    848 per input. The accumulator function may optionally take any of the <a
    849 href="#special-arguments">special arguments</a>.</p>
    850 
    851 <p>An example kernel with multiple inputs is <a href="#dot-product"><code>dotProduct</code></a>.</p>
    852 </li>
    853 
    854 <li><code><a id="combiner-function">combiner(<i>combinerName</i>)</a></code>
    855 (optional): Specifies the name of the combiner function for this
    856 reduction kernel. After RenderScript calls the accumulator function
    857 once for every coordinate in the input(s), it calls this function as many
    858 times as necessary to combine all accumulator data items into a single
    859 accumulator data item. The function must be defined like this:</p>
    860 
    861 <pre>static void <i>combinerName</i>(<i>accumType</i> *accum, const <i>accumType</i> *other) {  }</pre>
    862 
    863 <p><code>accum</code> is a pointer to a "destination" accumulator data item for this
    864 function to modify. <code>other</code> is a pointer to a "source" accumulator data item
    865 for this function to "combine" into <code>*accum</code>.</p>
    866 
    867 <p class="note"><strong>NOTE:</strong> It is possible
    868   that <code>*accum</code>, <code>*other</code>, or both have been initialized but have never
    869   been passed to the accumulator function; that is, one or both have never been updated
    870   according to any input data. For example, in
    871   the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the combiner
    872   function <code>fMMCombiner</code> explicitly checks for <code>idx &lt; 0</code> because that
    873   indicates such an accumulator data item, whose value is <a href="#INITVAL">INITVAL</a>.</p>
    874 
    875 <p>If you do not provide a combiner function, RenderScript uses the accumulator function in its
    876 place, behaving as if there were a combiner function that looks like this:</p>
    877 
    878 <pre>static void <i>combinerName</i>(<i>accumType</i> *accum, const <i>accumType</i> *other) {
    879   <i>accumulatorName</i>(accum, *other);
    880 }</pre>
    881 
    882 <p>A combiner function is mandatory if the kernel has more than one input, if the input data
    883   type is not the same as the accumulator data type, or if the accumulator function takes one
    884   or more <a href="#special-arguments">special arguments</a>.</p>
    885 </li>
    886 
    887 <li><p><code><a id="outconverter-function">outconverter(<i>outconverterName</i>)</a></code>
    888 (optional): Specifies the name of the outconverter function for this
    889 reduction kernel. After RenderScript combines all of the accumulator
    890 data items, it calls this function to determine the result of the
    891 reduction to return to Java. The function must be defined like
    892 this:</p>
    893 
    894 <pre>static void <i>outconverterName</i>(<i>resultType</i> *result, const <i>accumType</i> *accum) {  }</pre>
    895 
    896 <p><code>result</code> is a pointer to a result data item (allocated but not initialized
    897 by the RenderScript runtime) for this function to initialize with the result of the
    898 reduction. <i>resultType</i> is the type of that data item, which need not be the same
    899 as <i>accumType</i>. <code>accum</code> is a pointer to the final accumulator data item
    900 computed by the <a href="#combiner-function">combiner function</a>.</p>
    901 
    902 <p>If you do not provide an outconverter function, RenderScript copies the final accumulator
    903 data item to the result data item, behaving as if there were an outconverter function that
    904 looks like this:</p>
    905 
    906 <pre>static void <i>outconverterName</i>(<i>accumType</i> *result, const <i>accumType</i> *accum) {
    907   *result = *accum;
    908 }</pre>
    909 
    910 <p>If you want a different result type than the accumulator data type, then the outconverter function is mandatory.</p>
    911 </li>
    912 
    913 </ul>
    914 
    915 <p>Note that a kernel has input types, an accumulator data item type, and a result type,
    916   none of which need to be the same. For example, in
    917   the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the input
    918   type <code>long</code>, accumulator data item type <code>MinAndMax</code>, and result
    919   type <code>int2</code> are all different.</p>
    920 
    921 <h4 id="assume">What can't you assume?</h4>
    922 
    923 <p>You must not rely on the number of accumulator data items created by RenderScript for a
    924   given kernel launch.  There is no guarantee that two launches of the same kernel with the
    925   same input(s) will create the same number of accumulator data items.</p>
    926 
    927 <p>You must not rely on the order in which RenderScript calls the initializer, accumulator, and
    928   combiner functions; it may even call some of them in parallel.  There is no guarantee that
    929   two launches of the same kernel with the same input will follow the same order.  The only
    930   guarantee is that only the initializer function will ever see an uninitialized accumulator
    931   data item. For example:</p>
    932 <ul>
    933 <li>There is no guarantee that all accumulator data items will be initialized before the
    934   accumulator function is called, although it will only be called on an initialized accumulator
    935   data item.</li>
    936 <li>There is no guarantee on the order in which input Elements are passed to the accumulator
    937   function.</li>
    938 <li>There is no guarantee that the accumulator function has been called for all input Elements
    939   before the combiner function is called.</li>
    940 </ul>
    941 
    942 <p>One consequence of this is that the <a href="#example-findMinAndMax">findMinAndMax</a>
    943   kernel is not deterministic: If the input contains more than one occurrence of the same
    944   minimum or maximum value, you have no way of knowing which occurrence the kernel will
    945   find.</p>
    946 
    947 <h4 id="guarantee">What must you guarantee?</h4>
    948 
    949 <p>Because the RenderScript system can choose to execute a kernel <a href="#assume">in many
    950     different ways</a>, you must follow certain rules to ensure that your kernel behaves the
    951     way you want. If you do not follow these rules, you may get incorrect results,
    952     nondeterministic behavior, or runtime errors.</p>
    953 
    954 <p>The rules below often say that two accumulator data items must have "<a id="the-same">the
    955   same value"</a>.  What does this mean?  That depends on what you want the kernel to do.  For
    956   a mathematical reduction such as <a href="#example-addint">addint</a>, it usually makes sense
    957   for "the same" to mean mathematical equality.  For a "pick any" search such
    958   as <a href="#example-findMinAndMax">findMinAndMax</a> ("find the location of minimum and
    959   maximum input values") where there might be more than one occurrence of identical input
    960   values, all locations of a given input value must be considered "the same".  You could write
    961   a similar kernel to "find the location of <em>leftmost</em> minimum and maximum input values"
    962   where (say) a minimum value at location 100 is preferred over an identical minimum value at location
    963   200; for this kernel, "the same" would mean identical <em>location</em>, not merely
    964   identical <em>value</em>, and the accumulator and combiner functions would have to be
    965   different than those for <a href="#example-findMinAndMax">findMinAndMax</a>.</p>
    966 
    967 <strong>The initializer function must create an <i>identity value</i>.</strong>  That is,
    968   if <code><i>I</i></code> and <code><i>A</i></code> are accumulator data items initialized
    969   by the initializer function, and <code><i>I</i></code> has never been passed to the
    970   accumulator function (but <code><i>A</i></code> may have been), then
    971 <ul>
    972 <li><code><i>combinerName</i>(&<i>A</i>, &<i>I</i>)</code> must
    973   leave <code><i>A</i></code> <a href="#the-same">the same</a></li>
    974 <li><code><i>combinerName</i>(&<i>I</i>, &<i>A</i>)</code> must
    975   leave <code><i>I</i></code> <a href="#the-same">the same</a> as <code><i>A</i></code></li>
    976 </ul>
    977 <p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a>
    978   kernel, an accumulator data item is initialized to zero. The combiner function for this
    979   kernel performs addition; zero is the identity value for addition.</p>
    980 <div class="note">
    981 <p><strong>Example:</strong> In the <a href="#example-findMinAndMax">findMinAndMax</a>
    982   kernel, an accumulator data item is initialized
    983   to <a href="#INITVAL"><code>INITVAL</code></a>.
    984 <ul>
    985 <li><code>fMMCombiner(&<i>A</i>, &<i>I</i>)</code> leaves <code><i>A</i></code> the same,
    986   because <code><i>I</i></code> is <code>INITVAL</code>.</li>
    987 <li><code>fMMCombiner(&<i>I</i>, &<i>A</i>)</code> sets <code><i>I</i></code>
    988   to <code><i>A</i></code>, because <code><i>I</i></code> is <code>INITVAL</code>.</li>
    989 </ul>
    990 Therefore, <code>INITVAL</code> is indeed an identity value.
    991 </p></div>
    992 
    993 <p><strong>The combiner function must be <i>commutative</i>.</strong>  That is,
    994   if <code><i>A</i></code> and <code><i>B</i></code> are accumulator data items initialized
    995   by the initializer function, and that may have been passed to the accumulator function zero
    996   or more times, then <code><i>combinerName</i>(&<i>A</i>, &<i>B</i>)</code> must
    997   set <code><i>A</i></code> to <a href="#the-same">the same value</a>
    998   that <code><i>combinerName</i>(&<i>B</i>, &<i>A</i>)</code>
    999   sets <code><i>B</i></code>.</p>
   1000 <p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a>
   1001   kernel, the combiner function adds the two accumulator data item values; addition is
   1002   commutative.</p>
   1003 <div class="note">
   1004 <p><strong>Example:</strong> In the <a href="#example-findMinAndMax">findMinAndMax</a> kernel,
   1005 <pre>
   1006 fMMCombiner(&<i>A</i>, &<i>B</i>)
   1007 </pre>
   1008 is the same as
   1009 <pre>
   1010 <i>A</i> = minmax(<i>A</i>, <i>B</i>)
   1011 </pre>
   1012 and <code>minmax</code> is commutative, so <code>fMMCombiner</code> is also.
   1013 </p>
   1014 </div>
   1015 
   1016 <p><strong>The combiner function must be <i>associative</i>.</strong>  That is,
   1017   if <code><i>A</i></code>, <code><i>B</i></code>, and <code><i>C</i></code> are
   1018   accumulator data items initialized by the initializer function, and that may have been passed
   1019   to the accumulator function zero or more times, then the following two code sequences must
   1020   set <code><i>A</i></code> to <a href="#the-same">the same value</a>:</p>
   1021 <ul>
   1022 <li><pre>
   1023 <i>combinerName</i>(&<i>A</i>, &<i>B</i>);
   1024 <i>combinerName</i>(&<i>A</i>, &<i>C</i>);
   1025 </pre></li>
   1026 <li><pre>
   1027 <i>combinerName</i>(&<i>B</i>, &<i>C</i>);
   1028 <i>combinerName</i>(&<i>A</i>, &<i>B</i>);
   1029 </pre></li>
   1030 </ul>
   1031 <div class="note">
   1032 <p><strong>Example:</strong> In the <a href="#example-addint">addint</a> kernel, the
   1033   combiner function adds the two accumulator data item values:
   1034 <ul>
   1035 <li><pre>
   1036 <i>A</i> = <i>A</i> + <i>B</i>
   1037 <i>A</i> = <i>A</i> + <i>C</i>
   1038 // Same as
   1039 //   <i>A</i> = (<i>A</i> + <i>B</i>) + <i>C</i>
   1040 </pre></li>
   1041 <li><pre>
   1042 <i>B</i> = <i>B</i> + <i>C</i>
   1043 <i>A</i> = <i>A</i> + <i>B</i>
   1044 // Same as
   1045 //   <i>A</i> = <i>A</i> + (<i>B</i> + <i>C</i>)
   1046 //   <i>B</i> = <i>B</i> + <i>C</i>
   1047 </li>
   1048 </ul>
   1049 Addition is associative, and so the combiner function is also.
   1050 </p>
   1051 </div>
   1052 <div class="note">
   1053 <p><strong>Example:</strong> In the <a href="#example-findMinAndMax">findMinAndMax</a> kernel,
   1054 <pre>
   1055 fMMCombiner(&<i>A</i>, &<i>B</i>)
   1056 </pre>
   1057 is the same as
   1058 <pre>
   1059 <i>A</i> = minmax(<i>A</i>, <i>B</i>)
   1060 </pre>
   1061 So the two sequences are
   1062 <ul>
   1063 <li><pre>
   1064 <i>A</i> = minmax(<i>A</i>, <i>B</i>)
   1065 <i>A</i> = minmax(<i>A</i>, <i>C</i>)
   1066 // Same as
   1067 //   <i>A</i> = minmax(minmax(<i>A</i>, <i>B</i>), <i>C</i>)
   1068 </pre></li>
   1069 <li><pre>
   1070 <i>B</i> = minmax(<i>B</i>, <i>C</i>)
   1071 <i>A</i> = minmax(<i>A</i>, <i>B</i>)
   1072 // Same as
   1073 //   <i>A</i> = minmax(<i>A</i>, minmax(<i>B</i>, <i>C</i>))
   1074 //   <i>B</i> = minmax(<i>B</i>, <i>C</i>)
   1075 </pre></li>
   1076 <code>minmax</code> is associative, and so <code>fMMCombiner</code> is also.
   1077 </p>
   1078 </div>
   1079 
   1080 <p><strong>The accumulator function and combiner function together must obey the <i>basic
   1081   folding rule</i>.</strong>  That is, if <code><i>A</i></code>
   1082   and <code><i>B</i></code> are accumulator data items, <code><i>A</i></code> has been
   1083   initialized by the initializer function and may have been passed to the accumulator function
   1084   zero or more times, <code><i>B</i></code> has not been initialized, and <i>args</i> is
   1085   the list of input arguments and special arguments for a particular call to the accumulator
   1086   function, then the following two code sequences must set <code><i>A</i></code>
   1087   to <a href="#the-same">the same value</a>:</p>
   1088 <ul>
   1089 <li><pre>
   1090 <i>accumulatorName</i>(&<i>A</i>, <i>args</i>);  // statement 1
   1091 </pre></li>
   1092 <li><pre>
   1093 <i>initializerName</i>(&<i>B</i>);        // statement 2
   1094 <i>accumulatorName</i>(&<i>B</i>, <i>args</i>);  // statement 3
   1095 <i>combinerName</i>(&<i>A</i>, &<i>B</i>);       // statement 4
   1096 </pre></li>
   1097 </ul>
   1098 <div class="note">
   1099 <p><strong>Example:</strong> In the <a href="#example-addint">addint</a> kernel, for an input value <i>V</i>:
   1100 <ul>
   1101 <li>Statement 1 is the same as <code>A += <i>V</i></code></li>
   1102 <li>Statement 2 is the same as <code>B = 0</code></li>
   1103 <li>Statement 3 is the same as <code>B += <i>V</i></code>, which is the same as <code>B = <i>V</i></code></li>
   1104 <li>Statement 4 is the same as <code>A += B</code>, which is the same as <code>A += <i>V</i></code></li>
   1105 </ul>
   1106 Statements 1 and 4 set <code><i>A</i></code> to the same value, and so this kernel obeys the
   1107 basic folding rule.
   1108 </p>
   1109 </div>
   1110 <div class="note">
   1111 <p><strong>Example:</strong> In the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, for an input
   1112   value <i>V</i> at coordinate <i>X</i>:
   1113 <ul>
   1114 <li>Statement 1 is the same as <code>A = minmax(A, IndexedVal(<i>V</i>, <i>X</i>))</code></li>
   1115 <li>Statement 2 is the same as <code>B = <a href="#INITVAL">INITVAL</a></code></li>
   1116 <li>Statement 3 is the same as
   1117 <pre>
   1118 B = minmax(B, IndexedVal(<i>V</i>, <i>X</i>))
   1119 </pre>
   1120 which, because <i>B</i> is the initial value, is the same as
   1121 <pre>
   1122 B = IndexedVal(<i>V</i>, <i>X</i>)
   1123 </pre>
   1124 </li>
   1125 <li>Statement 4 is the same as
   1126 <pre>
   1127 A = minmax(A, B)
   1128 </pre>
   1129 which is the same as
   1130 <pre>
   1131 A = minmax(A, IndexedVal(<i>V</i>, <i>X</i>))
   1132 </pre>
   1133 </ul>
   1134 Statements 1 and 4 set <code><i>A</i></code> to the same value, and so this kernel obeys the
   1135 basic folding rule.
   1136 </p>
   1137 </div>
   1138 
   1139 <h3 id="calling-reduction-kernel">Calling a reduction kernel from Java code</h3>
   1140 
   1141 <p>For a reduction kernel named <i>kernelName</i> defined in the
   1142 file <code><i>filename</i>.rs</code>, there are three methods reflected in the
   1143 class <code>ScriptC_<i>filename</i></code>:</p>
   1144 
   1145 <pre>
   1146 // Method 1
   1147 public <i>javaFutureType</i> reduce_<i>kernelName</i>(Allocation ain1, <i>&hellip;,</i>
   1148                                         Allocation ain<i>N</i>);
   1149 
   1150 // Method 2
   1151 public <i>javaFutureType</i> reduce_<i>kernelName</i>(Allocation ain1, <i>&hellip;,</i>
   1152                                         Allocation ain<i>N</i>,
   1153                                         Script.LaunchOptions sc);
   1154 
   1155 // Method 3
   1156 public <i>javaFutureType</i> reduce_<i>kernelName</i>(<i><a href="#devec">devecSiIn1Type</a></i>[] in1, &hellip;,
   1157                                         <i><a href="#devec">devecSiInNType</a></i>[] in<i>N</i>);
   1158 </pre>
   1159 
   1160 <p>Here are some examples of calling the <a href="#example-addint">addint</a> kernel:</p>
   1161 <pre>
   1162 ScriptC_example script = new ScriptC_example(mRenderScript);
   1163 
   1164 // 1D array
   1165 //   and obtain answer immediately
   1166 int input1[] = <i>&hellip;</i>;
   1167 int sum1 = script.reduce_addint(input1).get();  // Method 3
   1168 
   1169 // 2D allocation
   1170 //   and do some additional work before obtaining answer
   1171 Type.Builder typeBuilder =
   1172   new Type.Builder(RS, Element.I32(RS));
   1173 typeBuilder.setX(<i>&hellip;</i>);
   1174 typeBuilder.setY(<i>&hellip;</i>);
   1175 Allocation input2 = createTyped(RS, typeBuilder.create());
   1176 <i>populateSomehow</i>(input2);  // fill in input Allocation with data
   1177 script.result_int result2 = script.reduce_addint(input2);  // Method 1
   1178 <i>doSomeAdditionalWork</i>(); // might run at same time as reduction
   1179 int sum2 = result2.get();
   1180 </pre>
   1181 
   1182 <p><strong>Method 1</strong> has one input {@link android.renderscript.Allocation} argument for
   1183   every input argument in the kernel's <a href="#accumulator-function">accumulator
   1184     function</a>. The RenderScript runtime checks to ensure that all of the input Allocations
   1185   have the same dimensions and that the {@link android.renderscript.Element} type of each of
   1186   the input Allocations matches that of the corresponding input argument of the accumulator
   1187   function's prototype. If any of these checks fail, RenderScript throws an exception. The
   1188   kernel executes over every coordinate in those dimensions.</p>
   1189 
   1190 <p><strong>Method 2</strong> is the same as Method 1 except that Method 2 takes an additional
   1191   argument <code>sc</code> that can be used to limit the kernel execution to a subset of the
   1192   coordinates.</p>
   1193 
   1194 <p><strong><a id="reduce-method-3">Method 3</a></strong> is the same as Method 1 except that
   1195   instead of taking Allocation inputs it takes Java array inputs. This is a convenience that
   1196   saves you from having to write code to explicitly create an Allocation and copy data to it
   1197   from a Java array. <em>However, using Method 3 instead of Method 1 does not increase the
   1198   performance of the code</em>. For each input array, Method 3 creates a temporary
   1199   1-dimensional Allocation with the appropriate {@link android.renderscript.Element} type and
   1200   {@link android.renderscript.Allocation#setAutoPadding} enabled, and copies the array to the
   1201   Allocation as if by the appropriate <code>copyFrom()</code> method of {@link
   1202   android.renderscript.Allocation}. It then calls Method 1, passing those temporary
   1203   Allocations.</p>
   1204 <p class="note"><strong>NOTE:</strong> If your application will make multiple kernel calls with
   1205   the same array, or with different arrays of the same dimensions and Element type, you may improve
   1206   performance by explicitly creating, populating, and reusing Allocations yourself, instead of
   1207   by using Method 3.</p>
   1208 <p><strong><i><a id="javaFutureType">javaFutureType</a></i></strong>,
   1209   the return type of the reflected reduction methods, is a reflected
   1210   static nested class within the <code>ScriptC_<i>filename</i></code>
   1211   class. It represents the future result of a reduction
   1212   kernel run. To obtain the actual result of the run, call
   1213   the <code>get()</code> method of that class, which returns a value
   1214   of type <i>javaResultType</i>. <code>get()</code> is <a href="#asynchronous-model">synchronous</a>.</p>
   1215 
   1216 <pre>
   1217 public class ScriptC_<i>filename</i> extends ScriptC {
   1218   public static class <i>javaFutureType</i> {
   1219     public <i>javaResultType</i> get() { &hellip; }
   1220   }
   1221 }
   1222 </pre>
   1223 
   1224 <p><strong><i>javaResultType</i></strong> is determined from the <i>resultType</i> of the
   1225   <a href="#outconverter-function">outconverter function</a>. Unless <i>resultType</i> is an
   1226   unsigned type (scalar, vector, or array), <i>javaResultType</i> is the directly corresponding
   1227   Java type. If <i>resultType</i> is an unsigned type and there is a larger Java signed type,
   1228   then <i>javaResultType</i> is that larger Java signed type; otherwise, it is the directly
   1229   corresponding Java type. For example:</p>
   1230 <ul>
   1231 <li>If <i>resultType</i> is <code>int</code>, <code>int2</code>, or <code>int[15]</code>,
   1232   then <i>javaResultType</i> is <code>int</code>, <code>Int2</code>,
   1233   or <code>int[]</code>. All values of <i>resultType</i> can be represented
   1234   by <i>javaResultType</i>.</li>
   1235 <li>If <i>resultType</i> is <code>uint</code>, <code>uint2</code>, or <code>uint[15]</code>,
   1236   then <i>javaResultType</i> is <code>long</code>, <code>Long2</code>,
   1237   or <code>long[]</code>.  All values of <i>resultType</i> can be represented
   1238   by <i>javaResultType</i>.</li>
   1239 <li>If <i>resultType</i> is <code>ulong</code>, <code>ulong2</code>,
   1240   or <code>ulong[15]</code>, then <i>javaResultType</i>
   1241   is <code>long</code>, <code>Long2</code>, or <code>long[]</code>. There are certain values
   1242   of <i>resultType</i> that cannot be represented by <i>javaResultType</i>.</li>
   1243 </ul>
   1244 
   1245 <p><strong><i>javaFutureType</i></strong> is the future result type corresponding
   1246   to the <i>resultType</i> of the <a href="#outconverter-function">outconverter
   1247   function</a>.</p>
   1248 <ul>
   1249 <li>If <i>resultType</i> is not an array type, then <i>javaFutureType</i>
   1250   is <code>result_<i>resultType</i></code>.</li>
   1251 <li>If <i>resultType</i> is an array of length <i>Count</i> with members of type <i>memberType</i>,
   1252   then <i>javaFutureType</i> is <code>resultArray<i>Count</i>_<i>memberType</i></code>.</li>
   1253 </ul>
   1254 
   1255 <p>For example:</p>
   1256 
   1257 <pre>
   1258 public class ScriptC_<i>filename</i> extends ScriptC {
   1259   // for kernels with int result
   1260   public static class result_int {
   1261     public int get() { &hellip; }
   1262   }
   1263 
   1264   // for kernels with int[10] result
   1265   public static class resultArray10_int {
   1266     public int[] get() { &hellip; }
   1267   }
   1268 
   1269   // for kernels with int2 result
   1270   //   note that the Java type name "Int2" is not the same as the script type name "int2"
   1271   public static class result_int2 {
   1272     public Int2 get() { &hellip; }
   1273   }
   1274 
   1275   // for kernels with int2[10] result
   1276   //   note that the Java type name "Int2" is not the same as the script type name "int2"
   1277   public static class resultArray10_int2 {
   1278     public Int2[] get() { &hellip; }
   1279   }
   1280 
   1281   // for kernels with uint result
   1282   //   note that the Java type "long" is a wider signed type than the unsigned script type "uint"
   1283   public static class result_uint {
   1284     public long get() { &hellip; }
   1285   }
   1286 
   1287   // for kernels with uint[10] result
   1288   //   note that the Java type "long" is a wider signed type than the unsigned script type "uint"
   1289   public static class resultArray10_uint {
   1290     public long[] get() { &hellip; }
   1291   }
   1292 
   1293   // for kernels with uint2 result
   1294   //   note that the Java type "Long2" is a wider signed type than the unsigned script type "uint2"
   1295   public static class result_uint2 {
   1296     public Long2 get() { &hellip; }
   1297   }
   1298 
   1299   // for kernels with uint2[10] result
   1300   //   note that the Java type "Long2" is a wider signed type than the unsigned script type "uint2"
   1301   public static class resultArray10_uint2 {
   1302     public Long2[] get() { &hellip; }
   1303   }
   1304 }
   1305 </pre>
   1306 
   1307 <p>If <i>javaResultType</i> is an object type (including an array type), each call
   1308   to <code><i>javaFutureType</i>.get()</code> on the same instance will return the same
   1309   object.</p>
   1310 
   1311 <p>If <i>javaResultType</i> cannot represent all values of type <i>resultType</i>, and a
   1312   reduction kernel produces an unrepresentible value,
   1313   then <code><i>javaFutureType</i>.get()</code> throws an exception.</p>
   1314 
   1315 <h4 id="devec">Method 3 and <i>devecSiInXType</i></h4>
   1316 
   1317 <p><strong><i>devecSiInXType</i></strong> is the Java type corresponding to
   1318   the <i>inXType</i> of the corresponding argument of
   1319   the <a href="#accumulator-function">accumulator function</a>. Unless <i>inXType</i> is an
   1320   unsigned type or a vector type, <i>devecSiInXType</i> is the directly corresponding Java
   1321   type. If <i>inXType</i> is an unsigned scalar type, then <i>devecSiInXType</i> is the
   1322   Java type directly corresponding to the signed scalar type of the same
   1323   size. If <i>inXType</i> is a signed vector type, then <i>devecSiInXType</i> is the Java
   1324   type directly corresponding to the vector component type. If <i>inXType</i> is an unsigned
   1325   vector type, then <i>devecSiInXType</i> is the Java type directly corresponding to the
   1326   signed scalar type of the same size as the vector component type. For example:</p>
   1327 <ul>
   1328 <li>If <i>inXType</i> is <code>int</code>, then <i>devecSiInXType</i>
   1329   is <code>int</code>.</li>
   1330 <li>If <i>inXType</i> is <code>int2</code>, then <i>devecSiInXType</i>
   1331   is <code>int</code>. The array is a <em>flattened</em> representation: It has twice as
   1332   many <em>scalar</em> Elements as the Allocation has 2-component <em>vector</em>
   1333   Elements. This is the same way that the <code>copyFrom()</code> methods of {@link
   1334   android.renderscript.Allocation} work.</li>
   1335 <li>If <i>inXType</i> is <code>uint</code>, then <i>deviceSiInXType</i>
   1336   is <code>int</code>. A signed value in the Java array is interpreted as an unsigned value of
   1337   the same bitpattern in the Allocation. This is the same way that the <code>copyFrom()</code>
   1338   methods of {@link android.renderscript.Allocation} work.</li>
   1339 <li>If <i>inXType</i> is <code>uint2</code>, then <i>deviceSiInXType</i>
   1340   is <code>int</code>. This is a combination of the way <code>int2</code> and <code>uint</code>
   1341   are handled: The array is a flattened representation, and Java array signed values are
   1342   interpreted as RenderScript unsigned Element values.</li>
   1343 </ul>
   1344 
   1345 <p>Note that for <a href="#reduce-method-3">Method 3</a>, input types are handled differently
   1346 than result types:</p>
   1347 
   1348 <ul>
   1349 <li>A script's vector input is flattened on the Java side, whereas a script's vector result is not.</li>
   1350 <li>A script's unsigned input is represented as a signed input of the same size on the Java
   1351   side, whereas a script's unsigned result is represented as a widened signed type on the Java
   1352   side (except in the case of <code>ulong</code>).</li>
   1353 </ul>
   1354 
   1355 <h3 id="more-example">More example reduction kernels</h3>
   1356 
   1357 <pre id="dot-product">
   1358 #pragma rs reduce(dotProduct) \
   1359   accumulator(dotProductAccum) combiner(dotProductSum)
   1360 
   1361 // Note: No initializer function -- therefore,
   1362 // each accumulator data item is implicitly initialized to 0.0f.
   1363 
   1364 static void dotProductAccum(float *accum, float in1, float in2) {
   1365   *accum += in1*in2;
   1366 }
   1367 
   1368 // combiner function
   1369 static void dotProductSum(float *accum, const float *val) {
   1370   *accum += *val;
   1371 }
   1372 </pre>
   1373 
   1374 <pre>
   1375 // Find a zero Element in a 2D allocation; return (-1, -1) if none
   1376 #pragma rs reduce(fz2) \
   1377   initializer(fz2Init) \
   1378   accumulator(fz2Accum) combiner(fz2Combine)
   1379 
   1380 static void fz2Init(int2 *accum) { accum->x = accum->y = -1; }
   1381 
   1382 static void fz2Accum(int2 *accum,
   1383                      int inVal,
   1384                      int x /* special arg */,
   1385                      int y /* special arg */) {
   1386   if (inVal==0) {
   1387     accum->x = x;
   1388     accum->y = y;
   1389   }
   1390 }
   1391 
   1392 static void fz2Combine(int2 *accum, const int2 *accum2) {
   1393   if (accum2->x >= 0) *accum = *accum2;
   1394 }
   1395 </pre>
   1396 
   1397 <pre>
   1398 // Note that this kernel returns an array to Java
   1399 #pragma rs reduce(histogram) \
   1400   accumulator(hsgAccum) combiner(hsgCombine)
   1401 
   1402 #define BUCKETS 256
   1403 typedef uint32_t Histogram[BUCKETS];
   1404 
   1405 // Note: No initializer function --
   1406 // therefore, each bucket is implicitly initialized to 0.
   1407 
   1408 static void hsgAccum(Histogram *h, uchar in) { ++(*h)[in]; }
   1409 
   1410 static void hsgCombine(Histogram *accum,
   1411                        const Histogram *addend) {
   1412   for (int i = 0; i < BUCKETS; ++i)
   1413     (*accum)[i] += (*addend)[i];
   1414 }
   1415 
   1416 // Determines the mode (most frequently occurring value), and returns
   1417 // the value and the frequency.
   1418 //
   1419 // If multiple values have the same highest frequency, returns the lowest
   1420 // of those values.
   1421 //
   1422 // Shares functions with the histogram reduction kernel.
   1423 #pragma rs reduce(mode) \
   1424   accumulator(hsgAccum) combiner(hsgCombine) \
   1425   outconverter(modeOutConvert)
   1426 
   1427 static void modeOutConvert(int2 *result, const Histogram *h) {
   1428   uint32_t mode = 0;
   1429   for (int i = 1; i < BUCKETS; ++i)
   1430     if ((*h)[i] > (*h)[mode]) mode = i;
   1431   result->x = mode;
   1432   result->y = (*h)[mode];
   1433 }
   1434 </pre>
   1435