Home | History | Annotate | Download | only in app-design
      1 page.title=Designing for Performance
      2 @jd:body
      3 
      4 <div id="qv-wrapper">
      5 <div id="qv">
      6 
      7 <h2>In this document</h2>
      8 <ol>
      9   <li><a href="#intro">Introduction</a></li>
     10   <li><a href="#optimize_judiciously">Optimize Judiciously</a></li>
     11   <li><a href="#object_creation">Avoid Creating Unnecessary Objects</a></li>
     12   <li><a href="#myths">Performance Myths</a></li>
     13   <li><a href="#prefer_static">Prefer Static Over Virtual</a></li>
     14   <li><a href="#internal_get_set">Avoid Internal Getters/Setters</a></li>
     15   <li><a href="#use_final">Use Static Final For Constants</a></li>
     16   <li><a href="#foreach">Use Enhanced For Loop Syntax</a></li>
     17   <li><a href="#package_inner">Consider Package Instead of Private Access with Inner Classes</a></li>
     18   <li><a href="#avoidfloat">Use Floating-Point Judiciously</a> </li>
     19   <li><a href="#library">Know And Use The Libraries</a></li>
     20   <li><a href="#native_methods">Use Native Methods Judiciously</a></li>
     21   <li><a href="#closing_notes">Closing Notes</a></li>
     22 </ol>
     23 
     24 </div>
     25 </div>
     26 
     27 <p>An Android application will run on a mobile device with limited computing
     28 power and storage, and constrained battery life. Because of
     29 this, it should be <em>efficient</em>. Battery life is one reason you might
     30 want to optimize your app even if it already seems to run "fast enough".
     31 Battery life is important to users, and Android's battery usage breakdown
     32 means users will know if your app is responsible draining their battery.</p>
     33 
     34 <p>Note that although this document primarily covers micro-optimizations,
     35 these will almost never make or break your software. Choosing the right
     36 algorithms and data structures should always be your priority, but is
     37 outside the scope of this document.</p>
     38 
     39 <a name="intro" id="intro"></a>
     40 <h2>Introduction</h2>
     41 
     42 <p>There are two basic rules for writing efficient code:</p>
     43 <ul>
     44     <li>Don't do work that you don't need to do.</li>
     45     <li>Don't allocate memory if you can avoid it.</li>
     46 </ul>
     47 
     48 <h2 id="optimize_judiciously">Optimize Judiciously</h2>
     49 
     50 <p>This document is about Android-specific micro-optimization, so it assumes
     51 that you've already used profiling to work out exactly what code needs to be
     52 optimized, and that you already have a way to measure the effect (good or bad)
     53 of any changes you make. You only have so much engineering time to invest, so
     54 it's important to know you're spending it wisely.
     55 
     56 <p>(See <a href="#closing_notes">Closing Notes</a> for more on profiling and
     57 writing effective benchmarks.)
     58 
     59 <p>This document also assumes that you made the best decisions about data
     60 structures and algorithms, and that you've also considered the future
     61 performance consequences of your API decisions. Using the right data
     62 structures and algorithms will make more difference than any of the advice
     63 here, and considering the performance consequences of your API decisions will
     64 make it easier to switch to better implementations later (this is more
     65 important for library code than for application code).
     66 
     67 <p>(If you need that kind of advice, see Josh Bloch's <em>Effective Java</em>,
     68 item 47.)</p>
     69 
     70 <p>One of the trickiest problems you'll face when micro-optimizing an Android
     71 app is that your app is pretty much guaranteed to be running on multiple
     72 hardware platforms. Different versions of the VM running on different
     73 processors running at different speeds. It's not even generally the case
     74 that you can simply say "device X is a factor F faster/slower than device Y",
     75 and scale your results from one device to others. In particular, measurement
     76 on the emulator tells you very little about performance on any device. There
     77 are also huge differences between devices with and without a JIT: the "best"
     78 code for a device with a JIT is not always the best code for a device
     79 without.</p>
     80 
     81 <p>If you want to know how your app performs on a given device, you need to
     82 test on that device.</p>
     83 
     84 <a name="object_creation"></a>
     85 <h2>Avoid Creating Unnecessary Objects</h2>
     86 
     87 <p>Object creation is never free. A generational GC with per-thread allocation
     88 pools for temporary objects can make allocation cheaper, but allocating memory
     89 is always more expensive than not allocating memory.</p>
     90 
     91 <p>If you allocate objects in a user interface loop, you will force a periodic
     92 garbage collection, creating little "hiccups" in the user experience. The
     93 concurrent collector introduced in Gingerbread helps, but unnecessary work
     94 should always be avoided.</p>
     95 
     96 <p>Thus, you should avoid creating object instances you don't need to.  Some
     97 examples of things that can help:</p>
     98 
     99 <ul>
    100     <li>If you have a method returning a string, and you know that its result
    101     will always be appended to a StringBuffer anyway, change your signature
    102     and implementation so that the function does the append directly,
    103     instead of creating a short-lived temporary object.</li>
    104     <li>When extracting strings from a set of input data, try
    105     to return a substring of the original data, instead of creating a copy.
    106     You will create a new String object, but it will share the char[]
    107     with the data. (The trade-off being that if you're only using a small
    108     part of the original input, you'll be keeping it all around in memory
    109     anyway if you go this route.)</li>
    110 </ul>
    111 
    112 <p>A somewhat more radical idea is to slice up multidimensional arrays into
    113 parallel single one-dimension arrays:</p>
    114 
    115 <ul>
    116     <li>An array of ints is a much better than an array of Integers,
    117     but this also generalizes to the fact that two parallel arrays of ints
    118     are also a <strong>lot</strong> more efficient than an array of (int,int)
    119     objects.  The same goes for any combination of primitive types.</li>
    120     <li>If you need to implement a container that stores tuples of (Foo,Bar)
    121     objects, try to remember that two parallel Foo[] and Bar[] arrays are
    122     generally much better than a single array of custom (Foo,Bar) objects.
    123     (The exception to this, of course, is when you're designing an API for
    124     other code to access;  in those cases, it's usually better to trade
    125     good API design for a small hit in speed. But in your own internal
    126     code, you should try and be as efficient as possible.)</li>
    127 </ul>
    128 
    129 <p>Generally speaking, avoid creating short-term temporary objects if you
    130 can.  Fewer objects created mean less-frequent garbage collection, which has
    131 a direct impact on user experience.</p>
    132 
    133 <a name="avoid_enums" id="avoid_enums"></a>
    134 <a name="myths" id="myths"></a>
    135 <h2>Performance Myths</h2>
    136 
    137 <p>Previous versions of this document made various misleading claims. We
    138 address some of them here.</p>
    139 
    140 <p>On devices without a JIT, it is true that invoking methods via a
    141 variable with an exact type rather than an interface is slightly more
    142 efficient. (So, for example, it was cheaper to invoke methods on a
    143 <code>HashMap map</code> than a <code>Map map</code>, even though in both
    144 cases the map was a <code>HashMap</code>.) It was not the case that this
    145 was 2x slower; the actual difference was more like 6% slower. Furthermore,
    146 the JIT makes the two effectively indistinguishable.</p>
    147 
    148 <p>On devices without a JIT, caching field accesses is about 20% faster than
    149 repeatedly accesssing the field. With a JIT, field access costs about the same
    150 as local access, so this isn't a worthwhile optimization unless you feel it
    151 makes your code easier to read. (This is true of final, static, and static
    152 final fields too.)
    153 
    154 <a name="prefer_static" id="prefer_static"></a>
    155 <h2>Prefer Static Over Virtual</h2>
    156 
    157 <p>If you don't need to access an object's fields, make your method static.
    158 Invocations will be about 15%-20% faster.
    159 It's also good practice, because you can tell from the method
    160 signature that calling the method can't alter the object's state.</p>
    161 
    162 <a name="internal_get_set" id="internal_get_set"></a>
    163 <h2>Avoid Internal Getters/Setters</h2>
    164 
    165 <p>In native languages like C++ it's common practice to use getters (e.g.
    166 <code>i = getCount()</code>) instead of accessing the field directly (<code>i
    167 = mCount</code>). This is an excellent habit for C++, because the compiler can
    168 usually inline the access, and if you need to restrict or debug field access
    169 you can add the code at any time.</p>
    170 
    171 <p>On Android, this is a bad idea.  Virtual method calls are expensive,
    172 much more so than instance field lookups.  It's reasonable to follow
    173 common object-oriented programming practices and have getters and setters
    174 in the public interface, but within a class you should always access
    175 fields directly.</p>
    176 
    177 <p>Without a JIT, direct field access is about 3x faster than invoking a
    178 trivial getter. With the JIT (where direct field access is as cheap as
    179 accessing a local), direct field access is about 7x faster than invoking a
    180 trivial getter. This is true in Froyo, but will improve in the future when
    181 the JIT inlines getter methods.</p>
    182 
    183 <p>Note that if you're using ProGuard, you can have the best
    184 of both worlds because ProGuard can inline accessors for you.</p>
    185 
    186 <a name="use_final" id="use_final"></a>
    187 <h2>Use Static Final For Constants</h2>
    188 
    189 <p>Consider the following declaration at the top of a class:</p>
    190 
    191 <pre>static int intVal = 42;
    192 static String strVal = "Hello, world!";</pre>
    193 
    194 <p>The compiler generates a class initializer method, called
    195 <code>&lt;clinit&gt;</code>, that is executed when the class is first used.
    196 The method stores the value 42 into <code>intVal</code>, and extracts a
    197 reference from the classfile string constant table for <code>strVal</code>.
    198 When these values are referenced later on, they are accessed with field
    199 lookups.</p>
    200 
    201 <p>We can improve matters with the "final" keyword:</p>
    202 
    203 <pre>static final int intVal = 42;
    204 static final String strVal = "Hello, world!";</pre>
    205 
    206 <p>The class no longer requires a <code>&lt;clinit&gt;</code> method,
    207 because the constants go into static field initializers in the dex file.
    208 Code that refers to <code>intVal</code> will use
    209 the integer value 42 directly, and accesses to <code>strVal</code> will
    210 use a relatively inexpensive "string constant" instruction instead of a
    211 field lookup. (Note that this optimization only applies to primitive types and
    212 <code>String</code> constants, not arbitrary reference types. Still, it's good
    213 practice to declare constants <code>static final</code> whenever possible.)</p>
    214 
    215 <a name="foreach" id="foreach"></a>
    216 <h2>Use Enhanced For Loop Syntax</h2>
    217 
    218 <p>The enhanced for loop (also sometimes known as "for-each" loop) can be used
    219 for collections that implement the Iterable interface and for arrays.
    220 With collections, an iterator is allocated to make interface calls
    221 to hasNext() and next(). With an ArrayList, a hand-written counted loop is
    222 about 3x faster (with or without JIT), but for other collections the enhanced
    223 for loop syntax will be exactly equivalent to explicit iterator usage.</p>
    224 
    225 <p>There are several alternatives for iterating through an array:</p>
    226 
    227 <pre>    static class Foo {
    228         int mSplat;
    229     }
    230     Foo[] mArray = ...
    231 
    232     public void zero() {
    233         int sum = 0;
    234         for (int i = 0; i &lt; mArray.length; ++i) {
    235             sum += mArray[i].mSplat;
    236         }
    237     }
    238 
    239     public void one() {
    240         int sum = 0;
    241         Foo[] localArray = mArray;
    242         int len = localArray.length;
    243 
    244         for (int i = 0; i &lt; len; ++i) {
    245             sum += localArray[i].mSplat;
    246         }
    247     }
    248 
    249     public void two() {
    250         int sum = 0;
    251         for (Foo a : mArray) {
    252             sum += a.mSplat;
    253         }
    254     }
    255 </pre>
    256 
    257 <p><strong>zero()</strong> is slowest, because the JIT can't yet optimize away
    258 the cost of getting the array length once for every iteration through the
    259 loop.</p>
    260 
    261 <p><strong>one()</strong> is faster. It pulls everything out into local
    262 variables, avoiding the lookups. Only the array length offers a performance
    263 benefit.</p>
    264 
    265 <p><strong>two()</strong> is fastest for devices without a JIT, and
    266 indistinguishable from <strong>one()</strong> for devices with a JIT.
    267 It uses the enhanced for loop syntax introduced in version 1.5 of the Java
    268 programming language.</p>
    269 
    270 <p>To summarize: use the enhanced for loop by default, but consider a
    271 hand-written counted loop for performance-critical ArrayList iteration.</p>
    272 
    273 <p>(See also <em>Effective Java</em> item 46.)</p>
    274 
    275 <a name="package_inner" id="package_inner"></a>
    276 <h2>Consider Package Instead of Private Access with Private Inner Classes</h2>
    277 
    278 <p>Consider the following class definition:</p>
    279 
    280 <pre>public class Foo {
    281     private class Inner {
    282         void stuff() {
    283             Foo.this.doStuff(Foo.this.mValue);
    284         }
    285     }
    286 
    287     private int mValue;
    288 
    289     public void run() {
    290         Inner in = new Inner();
    291         mValue = 27;
    292         in.stuff();
    293     }
    294 
    295     private void doStuff(int value) {
    296         System.out.println("Value is " + value);
    297     }
    298 }</pre>
    299 
    300 <p>The key things to note here are that we define a private inner class
    301 (<code>Foo$Inner</code>) that directly accesses a private method and a private
    302 instance field in the outer class. This is legal, and the code prints "Value is
    303 27" as expected.</p>
    304 
    305 <p>The problem is that the VM considers direct access to <code>Foo</code>'s
    306 private members from <code>Foo$Inner</code> to be illegal because
    307 <code>Foo</code> and <code>Foo$Inner</code> are different classes, even though
    308 the Java language allows an inner class to access an outer class' private
    309 members. To bridge the gap, the compiler generates a couple of synthetic
    310 methods:</p>
    311 
    312 <pre>/*package*/ static int Foo.access$100(Foo foo) {
    313     return foo.mValue;
    314 }
    315 /*package*/ static void Foo.access$200(Foo foo, int value) {
    316     foo.doStuff(value);
    317 }</pre>
    318 
    319 <p>The inner class code calls these static methods whenever it needs to
    320 access the <code>mValue</code> field or invoke the <code>doStuff</code> method
    321 in the outer class. What this means is that the code above really boils down to
    322 a case where you're accessing member fields through accessor methods.
    323 Earlier we talked about how accessors are slower than direct field
    324 accesses, so this is an example of a certain language idiom resulting in an
    325 "invisible" performance hit.</p>
    326 
    327 <p>If you're using code like this in a performance hotspot, you can avoid the
    328 overhead by declaring fields and methods accessed by inner classes to have
    329 package access, rather than private access. Unfortunately this means the fields
    330 can be accessed directly by other classes in the same package, so you shouldn't
    331 use this in public API.</p>
    332 
    333 <a name="avoidfloat" id="avoidfloat"></a>
    334 <h2>Use Floating-Point Judiciously</h2>
    335 
    336 <p>As a rule of thumb, floating-point is about 2x slower than integer on
    337 Android devices. This is true on a FPU-less, JIT-less G1 and a Nexus One with
    338 an FPU and the JIT. (Of course, absolute speed difference between those two
    339 devices is about 10x for arithmetic operations.)</p>
    340 
    341 <p>In speed terms, there's no difference between <code>float</code> and
    342 <code>double</code> on the more modern hardware. Space-wise, <code>double</code>
    343 is 2x larger. As with desktop machines, assuming space isn't an issue, you
    344 should prefer <code>double</code> to <code>float</code>.</p>
    345 
    346 <p>Also, even for integers, some chips have hardware multiply but lack
    347 hardware divide. In such cases, integer division and modulus operations are
    348 performed in software &mdash; something to think about if you're designing a
    349 hash table or doing lots of math.</p>
    350 
    351 <a name="library" id="library"></a>
    352 <h2>Know And Use The Libraries</h2>
    353 
    354 <p>In addition to all the usual reasons to prefer library code over rolling
    355 your own, bear in mind that the system is at liberty to replace calls
    356 to library methods with hand-coded assembler, which may be better than the
    357 best code the JIT can produce for the equivalent Java. The typical example
    358 here is <code>String.indexOf</code> and friends, which Dalvik replaces with
    359 an inlined intrinsic. Similarly, the <code>System.arraycopy</code> method
    360 is about 9x faster than a hand-coded loop on a Nexus One with the JIT.</p>
    361 
    362 <p>(See also <em>Effective Java</em> item 47.)</p>
    363 
    364 <a name="native_methods" id="native_methods"></a>
    365 <h2>Use Native Methods Judiciously</h2>
    366 
    367 <p>Native code isn't necessarily more efficient than Java. For one thing,
    368 there's a cost associated with the Java-native transition, and the JIT can't
    369 optimize across these boundaries. If you're allocating native resources (memory
    370 on the native heap, file descriptors, or whatever), it can be significantly
    371 more difficult to arrange timely collection of these resources. You also
    372 need to compile your code for each architecture you wish to run on (rather
    373 than rely on it having a JIT). You may even have to compile multiple versions
    374 for what you consider the same architecture: native code compiled for the ARM
    375 processor in the G1 can't take full advantage of the ARM in the Nexus One, and
    376 code compiled for the ARM in the Nexus One won't run on the ARM in the G1.</p>
    377 
    378 <p>Native code is primarily useful when you have an existing native codebase
    379 that you want to port to Android, not for "speeding up" parts of a Java app.</p>
    380 
    381 <p>If you do need to use native code, you should read our
    382 <a href="{@docRoot}guide/practices/jni.html">JNI Tips</a>.</p>
    383 
    384 <p>(See also <em>Effective Java</em> item 54.)</p>
    385 
    386 <a name="closing_notes" id="closing_notes"></a>
    387 <h2>Closing Notes</h2>
    388 
    389 <p>One last thing: always measure. Before you start optimizing, make sure you
    390 have a problem. Make sure you can accurately measure your existing performance,
    391 or you won't be able to measure the benefit of the alternatives you try.</p>
    392 
    393 <p>Every claim made in this document is backed up by a benchmark. The source
    394 to these benchmarks can be found in the <a href="http://code.google.com/p/dalvik/source/browse/#svn/trunk/benchmarks">code.google.com "dalvik" project</a>.</p>
    395 
    396 <p>The benchmarks are built with the
    397 <a href="http://code.google.com/p/caliper/">Caliper</a> microbenchmarking
    398 framework for Java. Microbenchmarks are hard to get right, so Caliper goes out
    399 of its way to do the hard work for you, and even detect some cases where you're
    400 not measuring what you think you're measuring (because, say, the VM has
    401 managed to optimize all your code away). We highly recommend you use Caliper
    402 to run your own microbenchmarks.</p>
    403 
    404 <p>You may also find
    405 <a href="{@docRoot}tools/debugging/debugging-tracing.html">Traceview</a> useful
    406 for profiling, but it's important to realize that it currently disables the JIT,
    407 which may cause it to misattribute time to code that the JIT may be able to win
    408 back. It's especially important after making changes suggested by Traceview
    409 data to ensure that the resulting code actually runs faster when run without
    410 Traceview.
    411