practices/design/performance.jd

page.title=Designing for Performance
@jd:body

<p>An Android application will run on a mobile device with limited computing
power and storage, and constrained battery life. Because of
this, it should be <em>efficient</em>. Battery life is one reason you might
want to optimize your app even if it already seems to run "fast enough".
Battery life is important to users, and Android's battery usage breakdown
means users will know if your app is responsible draining their battery.</p>

<p>This document covers these topics: </p>
<ul>
    <li><a href="#intro">Introduction</a></li>
    <li><a href="#optimize_judiciously">Optimize Judiciously</a></li>
    <li><a href="#object_creation">Avoid Creating Objects</a></li>
    <li><a href="#myths">Performance Myths</a></li>
    <li><a href="#prefer_static">Prefer Static Over Virtual</a></li>
    <li><a href="#internal_get_set">Avoid Internal Getters/Setters</a></li>
    <li><a href="#use_final">Use Static Final For Constants</a></li>
    <li><a href="#foreach">Use Enhanced For Loop Syntax</a></li>
    <li><a href="#avoid_enums">Avoid Enums Where You Only Need Ints</a></li>
    <li><a href="#package_inner">Use Package Scope with Inner Classes</a></li>
    <li><a href="#avoidfloat">Use Floating-Point Judiciously</a> </li>
    <li><a href="#library">Know And Use The Libraries</a></li>
    <li><a href="#native_methods">Use Native Methods Judiciously</a></li>
    <li><a href="#closing_notes">Closing Notes</a></li>
</ul>

<p>Note that although this document primarily covers micro-optimizations,
these will almost never make or break your software. Choosing the right
algorithms and data structures should always be your priority, but is
outside the scope of this document.</p>

<a name="intro" id="intro"></a>
<h2>Introduction</h2>

<p>There are two basic rules for writing efficient code:</p>
<ul>
    <li>Don't do work that you don't need to do.</li>
    <li>Don't allocate memory if you can avoid it.</li>
</ul>

<h2 id="optimize_judiciously">Optimize Judiciously</h2>

<p>As you get started thinking about how to design your application, and as
you write it, consider
the cautionary points about optimization that Josh Bloch makes in his book
<em>Effective Java</em>. Here's "Item 47: Optimize Judiciously", excerpted from
the latest edition of the book with permission. Although Josh didn't have
Android application development in mind when writing this section &mdash; for
example, the <code style="color:black">java.awt.Component</code> class
referenced is not available in Android, and Android uses the
Dalvik VM, rather than a standard JVM &mdash; his points are still valid. </p>

<blockquote>

<p>There are three aphorisms concerning optimization that everyone should know.
They are perhaps beginning to suffer from overexposure, but in case you aren't
yet familiar with them, here they are:</p>

<div style="padding-left:3em;padding-right:4em;">

<p style="margin-bottom:.5em;">More computing sins are committed in the name of
efficiency (without necessarily achieving it) than for any other single
reason&mdash;including blind stupidity.</p>
<p>&mdash;William A. Wulf <span style="font-size:80%;"><sup>1</sup></span></p>

<p style="margin-bottom:.5em;">We should forget about small efficiencies, say
about 97% of the time: premature optimization is the root of all evil. </p>
<p>&mdash;Donald E. Knuth <span style="font-size:80%;"><sup>2</sup></span></p>


<p style="margin-bottom:.5em;">We follow two rules in the matter of optimization:</p>
<ul style="margin-bottom:0">
<li>Rule 1. Don't do it.</li>
<li>Rule 2 (for experts only). Don't do it yet &mdash; that is, not until you have a
perfectly clear and unoptimized solution. </li>
</ul>
<p>&mdash;M. A. Jackson <span style="font-size:80%;"><sup>3</sup></span></p>
</div>

<p>All of these aphorisms predate the Java programming language by two decades.
They tell a deep truth about optimization: it is easy to do more harm than good,
especially if you optimize prematurely. In the process, you may produce software
that is neither fast nor correct and cannot easily be fixed.</p>

<p>Don't sacrifice sound architectural principles for performance.
<strong>Strive to write good programs rather than fast ones.</strong> If a good
program is not fast enough, its architecture will allow it to be optimized. Good
programs embody the principle of <em>information hiding</em>: where possible,
they localize design decisions within individual modules, so individual
decisions can be changed without affecting the remainder of the system (Item
13).</p>

<p>This does <em>not</em> mean that you can ignore performance concerns until
your program is complete. Implementation problems can be fixed by later
optimization, but pervasive architectural flaws that limit performance can be
impossible to fix without rewriting the system. Changing a fundamental facet of
your design after the fact can result in an ill-structured system that is
difficult to maintain and evolve. Therefore you must think about performance
during the design process.</p>

<p><strong>Strive to avoid design decisions that limit performance.</strong> The
components of a design that are most difficult to change after the fact are
those specifying interactions between modules and with the outside world. Chief
among these design components are APIs, wire-level protocols, and persistent
data formats. Not only are these design components difficult or impossible to
change after the fact, but all of them can place significant limitations on the
performance that a system can ever achieve.</p>

<p><strong>Consider the performance consequences of your API design
decisions.</strong> Making a public type mutable may require a lot of needless
defensive copying (Item 39). Similarly, using inheritance in a public class
where composition would have been appropriate ties the class forever to its
superclass, which can place artificial limits on the performance of the subclass
(Item 16). As a final example, using an implementation type rather than an
interface in an API ties you to a specific implementation, even though faster
implementations may be written in the future (Item 52).</p>

<p>The effects of API design on performance are very real. Consider the <code
style="color:black">getSize</code> method in the <code
style="color:black">java.awt.Component</code> class. The decision that this
performance-critical method was to return a <code
style="color:black">Dimension</code> instance, coupled with the decision that
<code style="color:black">Dimension</code> instances are mutable, forces any
implementation of this method to allocate a new <code
style="color:black">Dimension</code> instance on every invocation. Even though
allocating small objects is inexpensive on a modern VM, allocating millions of
objects needlessly can do real harm to performance.</p>

<p>In this case, several alternatives existed. Ideally, <code
style="color:black">Dimension</code> should have been immutable (Item 15);
alternatively, the <code style="color:black">getSize</code> method could have
been replaced by two methods returning the individual primitive components of a
<code style="color:black">Dimension</code> object. In fact, two such methods
were added to the Component API in the 1.2 release for performance reasons.
Preexisting client code, however, still uses the <code
style="color:black">getSize</code> method and still suffers the performance
consequences of the original API design decisions.</p>

<p>Luckily, it is generally the case that good API design is consistent with
good performance. <strong>It is a very bad idea to warp an API to achieve good
performance.</strong> The performance issue that caused you to warp the API may
go away in a future release of the platform or other underlying software, but
the warped API and the support headaches that come with it will be with you for
life.</p>

<p>Once you've carefully designed your program and produced a clear, concise,
and well-structured implementation, <em>then</em> it may be time to consider
optimization, assuming you're not already satisfied with the performance of the
program.</p>

<p>Recall that Jackson's two rules of optimization were "Don't do it," and "(for
experts only). Don't do it yet." He could have added one more: <strong>measure
performance before and after each attempted optimization.</strong> You may be
surprised by what you find. Often, attempted optimizations have no measurable
effect on performance; sometimes, they make it worse. The main reason is that
it's difficult to guess where your program is spending its time. The part of the
program that you think is slow may not be at fault, in which case you'd be
wasting your time trying to optimize it. Common wisdom says that programs spend
80 percent of their time in 20 percent of their code.</p>

<p>Profiling tools can help you decide where to focus your optimization efforts.
Such tools give you runtime information, such as roughly how much time each
method is consuming and how many times it is invoked. In addition to focusing
your tuning efforts, this can alert you to the need for algorithmic changes. If
a quadratic (or worse) algorithm lurks inside your program, no amount of tuning
will fix the problem. You must replace the algorithm with one that is more
efficient. The more code in the system, the more important it is to use a
profiler. It's like looking for a needle in a haystack: the bigger the haystack,
the more useful it is to have a metal detector. The JDK comes with a simple
profiler and modern IDEs provide more sophisticated profiling tools.</p>

<p>The need to measure the effects of attempted optimization is even greater on
the Java platform than on more traditional platforms, because the Java
programming language does not have a strong <em>performance model</em>. The
relative costs of the various primitive operations are not well defined. The
"semantic gap" between what the programmer writes and what the CPU executes is
far greater than in traditional statically compiled languages, which makes it
very difficult to reliably predict the performance consequences of any
optimization. There are plenty of performance myths floating around that turn
out to be half-truths or outright lies.</p>

<p>Not only is Java's performance model ill-defined, but it varies from JVM
implementation to JVM implementation, from release to release, and from
processor to processor. If you will be running your program on multiple JVM
implementations or multiple hardware platforms, it is important that you measure
the effects of your optimization on each. Occasionally you may be forced to make
trade-offs between performance on different JVM implementations or hardware
platforms.</p>

<p>To summarize, do not strive to write fast programs &mdash; strive to write
good ones; speed will follow. Do think about performance issues while you're
designing systems and especially while you're designing APIs, wire-level
protocols, and persistent data formats. When you've finished building the
system, measure its performance. If it's fast enough, you're done. If not,
locate the source of the problems with the aid of a profiler, and go to work
optimizing the relevant parts of the system. The first step is to examine your
choice of algorithms: no amount of low-level optimization can make up for a poor
choice of algorithm. Repeat this process as necessary, measuring the performance
after every change, until you're satisfied.</p>

<p>&mdash;Excerpted from Josh Bloch's <em>Effective Java</em>, Second Ed.
(Addison-Wesley, 2008).</em></p>

<p style="font-size:80%;margin-bottom:0;"><sup>1</sup> Wulf, W. A Case Against
the GOTO. <em>Proceedings of the 25th ACM National
Conference</em> 2 (1972): 791797.</p>
<p style="font-size:80%;margin-bottom:0;"><sup>2</sup> Knuth, Donald. Structured
Programming with go to Statements. <em>Computing
Surveys 6</em> (1974): 261301.</p>
<p style="font-size:80%"><sup>3</sup> Jackson, M. A. <em>Principles of Program
Design</em>, Academic Press, London, 1975.
ISBN: 0123790506.</p>

</blockquote>

<p>One of the trickiest problems you'll face when micro-optimizing Android
apps is that the "if you will be running your program on ... multiple hardware
platforms" clause above is always true. And it's not even generally the case
that you can say "device X is a factor F faster/slower than device Y".
This is especially true if one of the devices is the emulator, or one of the
devices has a JIT. If you want to know how your app performs on a given device,
you need to test it on that device. Drawing conclusions from the emulator is
particularly dangerous, as is attempting to compare JIT versus non-JIT
performance: the performance <em>profiles</em> can differ wildly.</p>

<a name="object_creation"></a>
<h2>Avoid Creating Objects</h2>

<p>Object creation is never free. A generational GC with per-thread allocation
pools for temporary objects can make allocation cheaper, but allocating memory
is always more expensive than not allocating memory.</p>

<p>If you allocate objects in a user interface loop, you will force a periodic
garbage collection, creating little "hiccups" in the user experience.</p>

<p>Thus, you should avoid creating object instances you don't need to.  Some
examples of things that can help:</p>

<ul>
    <li>When extracting strings from a set of input data, try
    to return a substring of the original data, instead of creating a copy.
    You will create a new String object, but it will share the char[]
    with the data.</li>
    <li>If you have a method returning a string, and you know that its result
    will always be appended to a StringBuffer anyway, change your signature
    and implementation so that the function does the append directly,
    instead of creating a short-lived temporary object.</li>
</ul>

<p>A somewhat more radical idea is to slice up multidimensional arrays into
parallel single one-dimension arrays:</p>

<ul>
    <li>An array of ints is a much better than an array of Integers,
    but this also generalizes to the fact that two parallel arrays of ints
    are also a <strong>lot</strong> more efficient than an array of (int,int)
    objects.  The same goes for any combination of primitive types.</li>
    <li>If you need to implement a container that stores tuples of (Foo,Bar)
    objects, try to remember that two parallel Foo[] and Bar[] arrays are
    generally much better than a single array of custom (Foo,Bar) objects.
    (The exception to this, of course, is when you're designing an API for
    other code to access;  in those cases, it's usually better to trade
    correct API design for a small hit in speed. But in your own internal
    code, you should try and be as efficient as possible.)</li>
</ul>

<p>Generally speaking, avoid creating short-term temporary objects if you
can.  Fewer objects created mean less-frequent garbage collection, which has
a direct impact on user experience.</p>

<a name="myths" id="myths"></a>
<h2>Performance Myths</h2>

<p>Previous versions of this document made various misleading claims. We
address some of them here.</p>

<p>On devices without a JIT, it is true that invoking methods via a
variable with an exact type rather than an interface is slightly more
efficient. (So, for example, it was cheaper to invoke methods on a
<code>HashMap map</code> than a <code>Map map</code>, even though in both
cases the map was a <code>HashMap</code>.) It was not the case that this
was 2x slower; the actual difference was more like 6% slower. Furthermore,
the JIT makes the two effectively indistinguishable.</p>

<p>On devices without a JIT, caching field accesses is about 20% faster than
repeatedly accesssing the field. With a JIT, field access costs about the same
as local access, so this isn't a worthwhile optimization unless you feel it
makes your code easier to read. (This is true of final, static, and static
final fields too.)

<a name="prefer_static" id="prefer_static"></a>
<h2>Prefer Static Over Virtual</h2>

<p>If you don't need to access an object's fields, make your method static.
Invocations will be about 15%-20% faster.
It's also good practice, because you can tell from the method
signature that calling the method can't alter the object's state.</p>

<a name="internal_get_set" id="internal_get_set"></a>
<h2>Avoid Internal Getters/Setters</h2>

<p>In native languages like C++ it's common practice to use getters (e.g.
<code>i = getCount()</code>) instead of accessing the field directly (<code>i
= mCount</code>). This is an excellent habit for C++, because the compiler can
usually inline the access, and if you need to restrict or debug field access
you can add the code at any time.</p>

<p>On Android, this is a bad idea.  Virtual method calls are expensive,
much more so than instance field lookups.  It's reasonable to follow
common object-oriented programming practices and have getters and setters
in the public interface, but within a class you should always access
fields directly.</p>

<p>Without a JIT, direct field access is about 3x faster than invoking a
trivial getter. With the JIT (where direct field access is as cheap as
accessing a local), direct field access is about 7x faster than invoking a
trivial getter. This is true in Froyo, but will improve in the future when
the JIT inlines getter methods.</p>

<a name="use_final" id="use_final"></a>
<h2>Use Static Final For Constants</h2>

<p>Consider the following declaration at the top of a class:</p>

<pre>static int intVal = 42;
static String strVal = "Hello, world!";</pre>

<p>The compiler generates a class initializer method, called
<code>&lt;clinit&gt;</code>, that is executed when the class is first used.
The method stores the value 42 into <code>intVal</code>, and extracts a
reference from the classfile string constant table for <code>strVal</code>.
When these values are referenced later on, they are accessed with field
lookups.</p>

<p>We can improve matters with the "final" keyword:</p>

<pre>static final int intVal = 42;
static final String strVal = "Hello, world!";</pre>

<p>The class no longer requires a <code>&lt;clinit&gt;</code> method,
because the constants go into static field initializers in the dex file.
Code that refers to <code>intVal</code> will use
the integer value 42 directly, and accesses to <code>strVal</code> will
use a relatively inexpensive "string constant" instruction instead of a
field lookup. (Note that this optimization only applies to primitive types and
<code>String</code> constants, not arbitrary reference types. Still, it's good
practice to declare constants <code>static final</code> whenever possible.)</p>

<a name="foreach" id="foreach"></a>
<h2>Use Enhanced For Loop Syntax</h2>

<p>The enhanced for loop (also sometimes known as "for-each" loop) can be used
for collections that implement the Iterable interface and for arrays.
With collections, an iterator is allocated to make interface calls
to hasNext() and next(). With an ArrayList, a hand-written counted loop is
about 3x faster (with or without JIT), but for other collections the enhanced
for loop syntax will be exactly equivalent to explicit iterator usage.</p>

<p>There are several alternatives for iterating through an array:</p>

<pre>    static class Foo {
        int mSplat;
    }
    Foo[] mArray = ...

    public void zero() {
        int sum = 0;
        for (int i = 0; i &lt; mArray.length; ++i) {
            sum += mArray[i].mSplat;
        }
    }

    public void one() {
        int sum = 0;
        Foo[] localArray = mArray;
        int len = localArray.length;

        for (int i = 0; i &lt; len; ++i) {
            sum += localArray[i].mSplat;
        }
    }

    public void two() {
        int sum = 0;
        for (Foo a : mArray) {
            sum += a.mSplat;
        }
    }
</pre>

<p><strong>zero()</strong> is slowest, because the JIT can't yet optimize away
the cost of getting the array length once for every iteration through the
loop.</p>

<p><strong>one()</strong> is faster. It pulls everything out into local
variables, avoiding the lookups. Only the array length offers a performance
benefit.</p>

<p><strong>two()</strong> is fastest for devices without a JIT, and
indistinguishable from <strong>one()</strong> for devices with a JIT.
It uses the enhanced for loop syntax introduced in version 1.5 of the Java
programming language.</p>

<p>To summarize: use the enhanced for loop by default, but consider a
hand-written counted loop for performance-critical ArrayList iteration.</p>

<p>(See also <em>Effective Java</em> item 46.)</p>

<a name="avoid_enums" id="avoid_enums"></a>
<h2>Avoid Enums Where You Only Need Ints</h2>

<p>Enums are very convenient, but unfortunately can be painful when size
and speed matter.  For example, this:</p>

<pre>public enum Shrubbery { GROUND, CRAWLING, HANGING }</pre>

<p>adds 740 bytes to your .dex file compared to the equivalent class
with three public static final ints. On first use, the
class initializer invokes the &lt;init&gt; method on objects representing each
of the enumerated values. Each object gets its own static field, and the full
set is stored in an array (a static field called "$VALUES"). That's a lot of
code and data, just for three integers. Additionally, this:</p>

<pre>Shrubbery shrub = Shrubbery.GROUND;</pre>

<p>causes a static field lookup.  If "GROUND" were a static final int,
the compiler would treat it as a known constant and inline it.</p>

<p>The flip side, of course, is that with enums you get nicer APIs and
some compile-time value checking.  So, the usual trade-off applies: you should
by all means use enums for public APIs, but try to avoid them when performance
matters.</p>

<p>If you're using <code>Enum.ordinal</code>, that's usually a sign that you
should be using ints instead. As a rule of thumb, if an enum doesn't have a
constructor and doesn't define its own methods, and it's used in
performance-critical code, you should consider <code>static final int</code>
constants instead.</p>

<a name="package_inner" id="package_inner"></a>
<h2>Use Package Scope with Inner Classes</h2>

<p>Consider the following class definition:</p>

<pre>public class Foo {
    private int mValue;

    public void run() {
        Inner in = new Inner();
        mValue = 27;
        in.stuff();
    }

    private void doStuff(int value) {
        System.out.println("Value is " + value);
    }

    private class Inner {
        void stuff() {
            Foo.this.doStuff(Foo.this.mValue);
        }
    }
}</pre>

<p>The key things to note here are that we define an inner class (Foo$Inner)
that directly accesses a private method and a private instance field
in the outer class.  This is legal, and the code prints "Value is 27" as
expected.</p>

<p>The problem is that the VM considers direct access to Foo's private members
from Foo$Inner to be illegal because Foo and Foo$Inner are different classes,
even though the Java language allows an inner class to access an outer class'
private members. To bridge the gap, the compiler generates a couple of
synthetic methods:</p>

<pre>/*package*/ static int Foo.access$100(Foo foo) {
    return foo.mValue;
}
/*package*/ static void Foo.access$200(Foo foo, int value) {
    foo.doStuff(value);
}</pre>

<p>The inner-class code calls these static methods whenever it needs to
access the "mValue" field or invoke the "doStuff" method in the outer
class. What this means is that the code above really boils down to a case
where you're accessing member fields through accessor methods instead of
directly.  Earlier we talked about how accessors are slower than direct field
accesses, so this is an example of a certain language idiom resulting in an
"invisible" performance hit.</p>

<p>We can avoid this problem by declaring fields and methods accessed
by inner classes to have package scope, rather than private scope.
This runs faster and removes the overhead of the generated methods.
(Unfortunately it also means the fields could be accessed directly by other
classes in the same package, which runs counter to the standard
practice of making all fields private. Once again, if you're
designing a public API you might want to carefully consider using this
optimization.)</p>

<a name="avoidfloat" id="avoidfloat"></a>
<h2>Use Floating-Point Judiciously</h2>

<p>As a rule of thumb, floating-point is about 2x slower than integer on
Android devices. This is true on a FPU-less, JIT-less G1 and a Nexus One with
an FPU and the JIT. (Of course, absolute speed difference between those two
devices is about 10x for arithmetic operations.)</p>

<p>In speed terms, there's no difference between <code>float</code> and
<code>double</code> on the more modern hardware. Space-wise, <code>double</code>
is 2x larger. As with desktop machines, assuming space isn't an issue, you
should prefer <code>double</code> to <code>float</code>.</p>

<p>Also, even for integers, some chips have hardware multiply but lack
hardware divide. In such cases, integer division and modulus operations are
performed in software &mdash; something to think about if you're designing a
hash table or doing lots of math.</p>

<a name="library" id="library"></a>
<h2>Know And Use The Libraries</h2>

<p>In addition to all the usual reasons to prefer library code over rolling
your own, bear in mind that the system is at liberty to replace calls
to library methods with hand-coded assembler, which may be better than the
best code the JIT can produce for the equivalent Java. The typical example
here is <code>String.indexOf</code> and friends, which Dalvik replaces with
an inlined intrinsic. Similarly, the <code>System.arraycopy</code> method
is about 9x faster than a hand-coded loop on a Nexus One with the JIT.</p>

<p>(See also <em>Effective Java</em> item 47.)</p>

<a name="native_methods" id="native_methods"></a>
<h2>Use Native Methods Judiciously</h2>

<p>Native code isn't necessarily more efficient than Java. For one thing,
there's a cost associated with the Java-native transition, and the JIT can't
optimize across these boundaries. If you're allocating native resources (memory
on the native heap, file descriptors, or whatever), it can be significantly
more difficult to arrange timely collection of these resources. You also
need to compile your code for each architecture you wish to run on (rather
than rely on it having a JIT). You may even have to compile multiple versions
for what you consider the same architecture: native code compiled for the ARM
processor in the G1 can't take full advantage of the ARM in the Nexus One, and
code compiled for the ARM in the Nexus One won't run on the ARM in the G1.</p>

<p>Native code is primarily useful when you have an existing native codebase
that you want to port to Android, not for "speeding up" parts of a Java app.</p>

<p>(See also <em>Effective Java</em> item 54.)</p>

<a name="closing_notes" id="closing_notes"></a>
<h2>Closing Notes</h2>

<p>One last thing: always measure. Before you start optimizing, make sure you
have a problem. Make sure you can accurately measure your existing performance,
or you won't be able to measure the benefit of the alternatives you try.</p>

<p>Every claim made in this document is backed up by a benchmark. The source
to these benchmarks can be found in the <a href="http://code.google.com/p/dalvik/source/browse/#svn/trunk/benchmarks">code.google.com "dalvik" project</a>.</p>

<p>The benchmarks are built with the
<a href="http://code.google.com/p/caliper/">Caliper</a> microbenchmarking
framework for Java. Microbenchmarks are hard to get right, so Caliper goes out
of its way to do the hard work for you, and even detect some cases where you're
not measuring what you think you're measuring (because, say, the VM has
managed to optimize all your code away). We highly recommend you use Caliper
to run your own microbenchmarks.</p>