1 page.title=Designing for Performance 2 @jd:body 3 4 <div id="qv-wrapper"> 5 <div id="qv"> 6 7 <h2>In this document</h2> 8 <ol> 9 <li><a href="#intro">Introduction</a></li> 10 <li><a href="#optimize_judiciously">Optimize Judiciously</a></li> 11 <li><a href="#object_creation">Avoid Creating Unnecessary Objects</a></li> 12 <li><a href="#myths">Performance Myths</a></li> 13 <li><a href="#prefer_static">Prefer Static Over Virtual</a></li> 14 <li><a href="#internal_get_set">Avoid Internal Getters/Setters</a></li> 15 <li><a href="#use_final">Use Static Final For Constants</a></li> 16 <li><a href="#foreach">Use Enhanced For Loop Syntax</a></li> 17 <li><a href="#package_inner">Consider Package Instead of Private Access with Inner Classes</a></li> 18 <li><a href="#avoidfloat">Use Floating-Point Judiciously</a> </li> 19 <li><a href="#library">Know And Use The Libraries</a></li> 20 <li><a href="#native_methods">Use Native Methods Judiciously</a></li> 21 <li><a href="#closing_notes">Closing Notes</a></li> 22 </ol> 23 24 </div> 25 </div> 26 27 <p>An Android application will run on a mobile device with limited computing 28 power and storage, and constrained battery life. Because of 29 this, it should be <em>efficient</em>. Battery life is one reason you might 30 want to optimize your app even if it already seems to run "fast enough". 31 Battery life is important to users, and Android's battery usage breakdown 32 means users will know if your app is responsible draining their battery.</p> 33 34 <p>Note that although this document primarily covers micro-optimizations, 35 these will almost never make or break your software. Choosing the right 36 algorithms and data structures should always be your priority, but is 37 outside the scope of this document.</p> 38 39 <a name="intro" id="intro"></a> 40 <h2>Introduction</h2> 41 42 <p>There are two basic rules for writing efficient code:</p> 43 <ul> 44 <li>Don't do work that you don't need to do.</li> 45 <li>Don't allocate memory if you can avoid it.</li> 46 </ul> 47 48 <h2 id="optimize_judiciously">Optimize Judiciously</h2> 49 50 <p>This document is about Android-specific micro-optimization, so it assumes 51 that you've already used profiling to work out exactly what code needs to be 52 optimized, and that you already have a way to measure the effect (good or bad) 53 of any changes you make. You only have so much engineering time to invest, so 54 it's important to know you're spending it wisely. 55 56 <p>(See <a href="#closing_notes">Closing Notes</a> for more on profiling and 57 writing effective benchmarks.) 58 59 <p>This document also assumes that you made the best decisions about data 60 structures and algorithms, and that you've also considered the future 61 performance consequences of your API decisions. Using the right data 62 structures and algorithms will make more difference than any of the advice 63 here, and considering the performance consequences of your API decisions will 64 make it easier to switch to better implementations later (this is more 65 important for library code than for application code). 66 67 <p>(If you need that kind of advice, see Josh Bloch's <em>Effective Java</em>, 68 item 47.)</p> 69 70 <p>One of the trickiest problems you'll face when micro-optimizing an Android 71 app is that your app is pretty much guaranteed to be running on multiple 72 hardware platforms. Different versions of the VM running on different 73 processors running at different speeds. It's not even generally the case 74 that you can simply say "device X is a factor F faster/slower than device Y", 75 and scale your results from one device to others. In particular, measurement 76 on the emulator tells you very little about performance on any device. There 77 are also huge differences between devices with and without a JIT: the "best" 78 code for a device with a JIT is not always the best code for a device 79 without.</p> 80 81 <p>If you want to know how your app performs on a given device, you need to 82 test on that device.</p> 83 84 <a name="object_creation"></a> 85 <h2>Avoid Creating Unnecessary Objects</h2> 86 87 <p>Object creation is never free. A generational GC with per-thread allocation 88 pools for temporary objects can make allocation cheaper, but allocating memory 89 is always more expensive than not allocating memory.</p> 90 91 <p>If you allocate objects in a user interface loop, you will force a periodic 92 garbage collection, creating little "hiccups" in the user experience. The 93 concurrent collector introduced in Gingerbread helps, but unnecessary work 94 should always be avoided.</p> 95 96 <p>Thus, you should avoid creating object instances you don't need to. Some 97 examples of things that can help:</p> 98 99 <ul> 100 <li>If you have a method returning a string, and you know that its result 101 will always be appended to a StringBuffer anyway, change your signature 102 and implementation so that the function does the append directly, 103 instead of creating a short-lived temporary object.</li> 104 <li>When extracting strings from a set of input data, try 105 to return a substring of the original data, instead of creating a copy. 106 You will create a new String object, but it will share the char[] 107 with the data. (The trade-off being that if you're only using a small 108 part of the original input, you'll be keeping it all around in memory 109 anyway if you go this route.)</li> 110 </ul> 111 112 <p>A somewhat more radical idea is to slice up multidimensional arrays into 113 parallel single one-dimension arrays:</p> 114 115 <ul> 116 <li>An array of ints is a much better than an array of Integers, 117 but this also generalizes to the fact that two parallel arrays of ints 118 are also a <strong>lot</strong> more efficient than an array of (int,int) 119 objects. The same goes for any combination of primitive types.</li> 120 <li>If you need to implement a container that stores tuples of (Foo,Bar) 121 objects, try to remember that two parallel Foo[] and Bar[] arrays are 122 generally much better than a single array of custom (Foo,Bar) objects. 123 (The exception to this, of course, is when you're designing an API for 124 other code to access; in those cases, it's usually better to trade 125 good API design for a small hit in speed. But in your own internal 126 code, you should try and be as efficient as possible.)</li> 127 </ul> 128 129 <p>Generally speaking, avoid creating short-term temporary objects if you 130 can. Fewer objects created mean less-frequent garbage collection, which has 131 a direct impact on user experience.</p> 132 133 <a name="avoid_enums" id="avoid_enums"></a> 134 <a name="myths" id="myths"></a> 135 <h2>Performance Myths</h2> 136 137 <p>Previous versions of this document made various misleading claims. We 138 address some of them here.</p> 139 140 <p>On devices without a JIT, it is true that invoking methods via a 141 variable with an exact type rather than an interface is slightly more 142 efficient. (So, for example, it was cheaper to invoke methods on a 143 <code>HashMap map</code> than a <code>Map map</code>, even though in both 144 cases the map was a <code>HashMap</code>.) It was not the case that this 145 was 2x slower; the actual difference was more like 6% slower. Furthermore, 146 the JIT makes the two effectively indistinguishable.</p> 147 148 <p>On devices without a JIT, caching field accesses is about 20% faster than 149 repeatedly accesssing the field. With a JIT, field access costs about the same 150 as local access, so this isn't a worthwhile optimization unless you feel it 151 makes your code easier to read. (This is true of final, static, and static 152 final fields too.) 153 154 <a name="prefer_static" id="prefer_static"></a> 155 <h2>Prefer Static Over Virtual</h2> 156 157 <p>If you don't need to access an object's fields, make your method static. 158 Invocations will be about 15%-20% faster. 159 It's also good practice, because you can tell from the method 160 signature that calling the method can't alter the object's state.</p> 161 162 <a name="internal_get_set" id="internal_get_set"></a> 163 <h2>Avoid Internal Getters/Setters</h2> 164 165 <p>In native languages like C++ it's common practice to use getters (e.g. 166 <code>i = getCount()</code>) instead of accessing the field directly (<code>i 167 = mCount</code>). This is an excellent habit for C++, because the compiler can 168 usually inline the access, and if you need to restrict or debug field access 169 you can add the code at any time.</p> 170 171 <p>On Android, this is a bad idea. Virtual method calls are expensive, 172 much more so than instance field lookups. It's reasonable to follow 173 common object-oriented programming practices and have getters and setters 174 in the public interface, but within a class you should always access 175 fields directly.</p> 176 177 <p>Without a JIT, direct field access is about 3x faster than invoking a 178 trivial getter. With the JIT (where direct field access is as cheap as 179 accessing a local), direct field access is about 7x faster than invoking a 180 trivial getter. This is true in Froyo, but will improve in the future when 181 the JIT inlines getter methods.</p> 182 183 <a name="use_final" id="use_final"></a> 184 <h2>Use Static Final For Constants</h2> 185 186 <p>Consider the following declaration at the top of a class:</p> 187 188 <pre>static int intVal = 42; 189 static String strVal = "Hello, world!";</pre> 190 191 <p>The compiler generates a class initializer method, called 192 <code><clinit></code>, that is executed when the class is first used. 193 The method stores the value 42 into <code>intVal</code>, and extracts a 194 reference from the classfile string constant table for <code>strVal</code>. 195 When these values are referenced later on, they are accessed with field 196 lookups.</p> 197 198 <p>We can improve matters with the "final" keyword:</p> 199 200 <pre>static final int intVal = 42; 201 static final String strVal = "Hello, world!";</pre> 202 203 <p>The class no longer requires a <code><clinit></code> method, 204 because the constants go into static field initializers in the dex file. 205 Code that refers to <code>intVal</code> will use 206 the integer value 42 directly, and accesses to <code>strVal</code> will 207 use a relatively inexpensive "string constant" instruction instead of a 208 field lookup. (Note that this optimization only applies to primitive types and 209 <code>String</code> constants, not arbitrary reference types. Still, it's good 210 practice to declare constants <code>static final</code> whenever possible.)</p> 211 212 <a name="foreach" id="foreach"></a> 213 <h2>Use Enhanced For Loop Syntax</h2> 214 215 <p>The enhanced for loop (also sometimes known as "for-each" loop) can be used 216 for collections that implement the Iterable interface and for arrays. 217 With collections, an iterator is allocated to make interface calls 218 to hasNext() and next(). With an ArrayList, a hand-written counted loop is 219 about 3x faster (with or without JIT), but for other collections the enhanced 220 for loop syntax will be exactly equivalent to explicit iterator usage.</p> 221 222 <p>There are several alternatives for iterating through an array:</p> 223 224 <pre> static class Foo { 225 int mSplat; 226 } 227 Foo[] mArray = ... 228 229 public void zero() { 230 int sum = 0; 231 for (int i = 0; i < mArray.length; ++i) { 232 sum += mArray[i].mSplat; 233 } 234 } 235 236 public void one() { 237 int sum = 0; 238 Foo[] localArray = mArray; 239 int len = localArray.length; 240 241 for (int i = 0; i < len; ++i) { 242 sum += localArray[i].mSplat; 243 } 244 } 245 246 public void two() { 247 int sum = 0; 248 for (Foo a : mArray) { 249 sum += a.mSplat; 250 } 251 } 252 </pre> 253 254 <p><strong>zero()</strong> is slowest, because the JIT can't yet optimize away 255 the cost of getting the array length once for every iteration through the 256 loop.</p> 257 258 <p><strong>one()</strong> is faster. It pulls everything out into local 259 variables, avoiding the lookups. Only the array length offers a performance 260 benefit.</p> 261 262 <p><strong>two()</strong> is fastest for devices without a JIT, and 263 indistinguishable from <strong>one()</strong> for devices with a JIT. 264 It uses the enhanced for loop syntax introduced in version 1.5 of the Java 265 programming language.</p> 266 267 <p>To summarize: use the enhanced for loop by default, but consider a 268 hand-written counted loop for performance-critical ArrayList iteration.</p> 269 270 <p>(See also <em>Effective Java</em> item 46.)</p> 271 272 <a name="package_inner" id="package_inner"></a> 273 <h2>Consider Package Instead of Private Access with Private Inner Classes</h2> 274 275 <p>Consider the following class definition:</p> 276 277 <pre>public class Foo { 278 private class Inner { 279 void stuff() { 280 Foo.this.doStuff(Foo.this.mValue); 281 } 282 } 283 284 private int mValue; 285 286 public void run() { 287 Inner in = new Inner(); 288 mValue = 27; 289 in.stuff(); 290 } 291 292 private void doStuff(int value) { 293 System.out.println("Value is " + value); 294 } 295 }</pre> 296 297 <p>The key things to note here are that we define a private inner class 298 (<code>Foo$Inner</code>) that directly accesses a private method and a private 299 instance field in the outer class. This is legal, and the code prints "Value is 300 27" as expected.</p> 301 302 <p>The problem is that the VM considers direct access to <code>Foo</code>'s 303 private members from <code>Foo$Inner</code> to be illegal because 304 <code>Foo</code> and <code>Foo$Inner</code> are different classes, even though 305 the Java language allows an inner class to access an outer class' private 306 members. To bridge the gap, the compiler generates a couple of synthetic 307 methods:</p> 308 309 <pre>/*package*/ static int Foo.access$100(Foo foo) { 310 return foo.mValue; 311 } 312 /*package*/ static void Foo.access$200(Foo foo, int value) { 313 foo.doStuff(value); 314 }</pre> 315 316 <p>The inner class code calls these static methods whenever it needs to 317 access the <code>mValue</code> field or invoke the <code>doStuff</code> method 318 in the outer class. What this means is that the code above really boils down to 319 a case where you're accessing member fields through accessor methods. 320 Earlier we talked about how accessors are slower than direct field 321 accesses, so this is an example of a certain language idiom resulting in an 322 "invisible" performance hit.</p> 323 324 <p>If you're using code like this in a performance hotspot, you can avoid the 325 overhead by declaring fields and methods accessed by inner classes to have 326 package access, rather than private access. Unfortunately this means the fields 327 can be accessed directly by other classes in the same package, so you shouldn't 328 use this in public API.</p> 329 330 <a name="avoidfloat" id="avoidfloat"></a> 331 <h2>Use Floating-Point Judiciously</h2> 332 333 <p>As a rule of thumb, floating-point is about 2x slower than integer on 334 Android devices. This is true on a FPU-less, JIT-less G1 and a Nexus One with 335 an FPU and the JIT. (Of course, absolute speed difference between those two 336 devices is about 10x for arithmetic operations.)</p> 337 338 <p>In speed terms, there's no difference between <code>float</code> and 339 <code>double</code> on the more modern hardware. Space-wise, <code>double</code> 340 is 2x larger. As with desktop machines, assuming space isn't an issue, you 341 should prefer <code>double</code> to <code>float</code>.</p> 342 343 <p>Also, even for integers, some chips have hardware multiply but lack 344 hardware divide. In such cases, integer division and modulus operations are 345 performed in software — something to think about if you're designing a 346 hash table or doing lots of math.</p> 347 348 <a name="library" id="library"></a> 349 <h2>Know And Use The Libraries</h2> 350 351 <p>In addition to all the usual reasons to prefer library code over rolling 352 your own, bear in mind that the system is at liberty to replace calls 353 to library methods with hand-coded assembler, which may be better than the 354 best code the JIT can produce for the equivalent Java. The typical example 355 here is <code>String.indexOf</code> and friends, which Dalvik replaces with 356 an inlined intrinsic. Similarly, the <code>System.arraycopy</code> method 357 is about 9x faster than a hand-coded loop on a Nexus One with the JIT.</p> 358 359 <p>(See also <em>Effective Java</em> item 47.)</p> 360 361 <a name="native_methods" id="native_methods"></a> 362 <h2>Use Native Methods Judiciously</h2> 363 364 <p>Native code isn't necessarily more efficient than Java. For one thing, 365 there's a cost associated with the Java-native transition, and the JIT can't 366 optimize across these boundaries. If you're allocating native resources (memory 367 on the native heap, file descriptors, or whatever), it can be significantly 368 more difficult to arrange timely collection of these resources. You also 369 need to compile your code for each architecture you wish to run on (rather 370 than rely on it having a JIT). You may even have to compile multiple versions 371 for what you consider the same architecture: native code compiled for the ARM 372 processor in the G1 can't take full advantage of the ARM in the Nexus One, and 373 code compiled for the ARM in the Nexus One won't run on the ARM in the G1.</p> 374 375 <p>Native code is primarily useful when you have an existing native codebase 376 that you want to port to Android, not for "speeding up" parts of a Java app.</p> 377 378 <p>If you do need to use native code, you should read our 379 <a href="{@docRoot}guide/practices/design/jni.html">JNI Tips</a>.</p> 380 381 <p>(See also <em>Effective Java</em> item 54.)</p> 382 383 <a name="closing_notes" id="closing_notes"></a> 384 <h2>Closing Notes</h2> 385 386 <p>One last thing: always measure. Before you start optimizing, make sure you 387 have a problem. Make sure you can accurately measure your existing performance, 388 or you won't be able to measure the benefit of the alternatives you try.</p> 389 390 <p>Every claim made in this document is backed up by a benchmark. The source 391 to these benchmarks can be found in the <a href="http://code.google.com/p/dalvik/source/browse/#svn/trunk/benchmarks">code.google.com "dalvik" project</a>.</p> 392 393 <p>The benchmarks are built with the 394 <a href="http://code.google.com/p/caliper/">Caliper</a> microbenchmarking 395 framework for Java. Microbenchmarks are hard to get right, so Caliper goes out 396 of its way to do the hard work for you, and even detect some cases where you're 397 not measuring what you think you're measuring (because, say, the VM has 398 managed to optimize all your code away). We highly recommend you use Caliper 399 to run your own microbenchmarks.</p> 400 401 <p>You may also find 402 <a href="{@docRoot}guide/developing/debugging/debugging-tracing.html">Traceview</a> useful 403 for profiling, but it's important to realize that it currently disables the JIT, 404 which may cause it to misattribute time to code that the JIT may be able to win 405 back. It's especially important after making changes suggested by Traceview 406 data to ensure that the resulting code actually runs faster when run without 407 Traceview. 408