Home | History | Annotate | Download | only in rop
      1 /*
      2  * Copyright (C) 2008 The Android Open Source Project
      3  *
      4  * Licensed under the Apache License, Version 2.0 (the "License");
      5  * you may not use this file except in compliance with the License.
      6  * You may obtain a copy of the License at
      7  *
      8  *      http://www.apache.org/licenses/LICENSE-2.0
      9  *
     10  * Unless required by applicable law or agreed to in writing, software
     11  * distributed under the License is distributed on an "AS IS" BASIS,
     12  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     13  * See the License for the specific language governing permissions and
     14  * limitations under the License.
     15  */
     16 
     17 package com.android.dx.rop;
     18 
     19 /**
     20  * <h1>An Introduction to Rop Form</h1>
     21  *
     22  * This package contains classes associated with dx's {@code Rop}
     23  * intermediate form.<p>
     24  *
     25  * The Rop form is intended to represent the instructions and the control-flow
     26  * graph in a reasonably programmatically useful form while closely mirroring
     27  * the dex instruction set.<p>
     28  *
     29  * <h2>Key Classes</h2>
     30  *
     31  * <ul>
     32  * <li> {@link RopMethod}, the representation of an individual method
     33  * <li> {@link BasicBlock} and its per-method container, {@link BasicBlockList},
     34  * the representation of control flow elements.
     35  * <li> {@link Insn} and its subclasses along with its per-basic block
     36  * container {@link InsnList}. {@code Insn} instances represent
     37  * individual instructions in the abstract register machine.
     38  * <li> {@link RegisterSpec} and its container {@link RegisterSpecList}. A
     39  * register spec encodes register number, register width, type information,
     40  * and potentially local variable information as well for instruction sources
     41  * and results.
     42  * <li> {@link Rop} instances represent opcodes in the abstract machine. Many
     43  * {@code Rop} instances are singletons defined in static fields in
     44  * {@link Rops}. The rest are constructed dynamically using static methods
     45  * in {@code Rops}
     46  * <li> {@link RegOps} lists numeric constants for the opcodes
     47  * <li> {@link Constant} and its subclasses represent constant data values
     48  * that opcodes may refer to.
     49  * <li> {@link Type} instances represent the core data types that can be
     50  * handled by the abstract machine.
     51  * <li> The {@link TypeBearer} interface is implemented by classes that
     52  * represent a core data type, but may also have secondary information
     53  * (such as constant value) associated with them.
     54  * <ul>
     55  *
     56  * <h2>Control-Flow Graph</h2>
     57  *
     58  * Each method is separated into a list of basic blocks. For the most part,
     59  * basic blocks are referred to by a positive integer
     60  * {@link BasicBlock#getLabel label}, which is always unique per method. The
     61  * label value is typically derived from a bytecode address from the source
     62  * bytecode. Blocks that don't originate directly from source bytecode have
     63  * labels generated for them in a mostly arbitrary order.<p>
     64  *
     65  * Blocks are referred to by their label, for the most part, because
     66  * {@code BasicBlock} instances are immutable and thus any modification to
     67  * the control flow graph or the instruction list results in replacement
     68  * instances (with identical labels) being created.<p>
     69  *
     70  * A method has a single {@link RopMethod#getFirstLabel entry block} and 0
     71  * to N {@link RopMethod#getExitPredecessors exit predecessor blocks} which
     72  * will return. All blocks that are not the entry block will have at least
     73  * one predecessor (or are unreachable and should be removed from the block
     74  * list). All blocks that are not exit predecessors must have at least one
     75  * successor.<p>
     76  *
     77  * Since all blocks must branch, all blocks must have, as their final
     78  * instruction, an instruction whose opcode has a {@link Rop#getBranchingness
     79  * branchingness} other than {@link Rop.BRANCH_NONE}. Furthermore, branching
     80  * instructions may only be the final instruction in any basic block. If
     81  * no other terminating opcode is appropriate, use a {@link Rops#GOTO GOTO}.<p>
     82  *
     83  * Typically a block will have a {@link BasicBlock#getPrimarySuccessor
     84  * primary successor} which distinguishes a particular control flow path.
     85  * For {Rops#isCallLike}call or call-like} opcodes, this is the path taken
     86  * in the non-exceptional case, where all other successors represent
     87  * various exception paths. For comparison operators such as
     88  * {@link Rops#IF_EQZ_INT}, the primary successor represents the path taken
     89  * if the <b>condition evaluates to false</b>. For {@link SwitchInsn switch
     90  * instructions}, the primary successor is the default case.<p>
     91  *
     92  * A basic block's successor list is ordered and may refer to unique labels
     93  * multiple times. For example, if a switch statement contains multiple case
     94  * statements for the same code path, a single basic block will likely
     95  * appear in the successor list multiple times. In general, the
     96  * significance of the successor list's order (like the significance of
     97  * the primary successor) is a property of the final instruction of the basic
     98  * block. A basic block containing a {@link ThrowingInsn}, for example, has
     99  * its successor list in an order identical to the
    100  * {@link ThrowingInsn#getCatches} instruction's catches list, with the
    101  * primary successor (the no-exception case) listed at the end.
    102  *
    103  * It is legal for a basic block to have no primary successor. An obvious
    104  * example of this is a block that terminates in a {@link Rops#THROW throw}
    105  * instruction where a catch block exists inside the current method for that
    106  * exception class. Since the only possible path is the exception path, only
    107  * the exception path (which cannot be a primary successor) is a successor.
    108  * An example of this is shown in {@code dx/tests/092-ssa-cfg-edge-cases}.
    109  *
    110  * <h2>Rop Instructions</h2>
    111  *
    112  * <h3>move-result and move-result-pseudo</h3>
    113  *
    114  * An instruction that may throw an exception may not specify a result. This
    115  * is necessary because the result register will not be assigned to if an
    116  * exception occurs while processing said instruction and a result assignment
    117  * may not occur. Since result assignments only occur in the non-exceptional
    118  * case,  the result assignments for throwing instructions can be said to occur
    119  * at the beginning of the primary successor block rather than at the end of
    120  * the current block. The Rop form represents the result assignments this way.
    121  * Throwing instructions may not directly specify results. Instead, result
    122  * assignments are represented by {@link
    123  * Rops#MOVE_RESULT move-result} or {@link Rops#MOVE_RESULT_PSEUDO
    124  * move-result-pseudo} instructions at the top of the primary successor block.
    125  *
    126  * Only a single {@code move-result} or {@code move-result-pseudo}
    127  * may exist in any block and it must be exactly the first instruction in the
    128  * block.
    129  *
    130  * A {@code move-result} instruction is used for the results of call-like
    131  * instructions. If the value produced by a {@code move-result} is not
    132  * used by the method, it may be eliminated as dead code.
    133  *
    134  * A {@code move-result-pseudo} instruction is used for the results of
    135  * non-call-like throwing instructions. It may never be considered dead code
    136  * since the final dex instruction will always indicate a result register.
    137  * If a required {@code move-result-pseudo} instruction is not found
    138  * during conversion to dex bytecode, an exception will be thrown.
    139  *
    140  * <h3>move-exception</h3>
    141  *
    142  * A {@link RegOps.MOVE_EXCEPTION move-exception} instruction may appear at
    143  * the start of a catch block, and represents the obtaining of the thrown
    144  * exception instance. It may only occur as the first instruction in a
    145  * basic block, and any basic block in which it occurs must be reachable only
    146  * as an exception successor.
    147  *
    148  * <h3>move-param</h3>
    149  *
    150  * A {@link RegOps.MOVE_PARAM move-param} instruction represents a method
    151  * parameter. Every {@code move-param} instruction is a
    152  * {@link PlainCstInsn}. The index of the method parameter they refer to is
    153  * carried as the {@link CstInteger integer constant} associated with the
    154  * instruction.
    155  *
    156  * Any number of {@code move-param} instructions referring to the same
    157  * parameter index may be included in a method's instruction lists. They
    158  * have no restrictions on placement beyond those of any other
    159  * {@link Rop.BRANCH_NONE} instruction. Note that the SSA optimizer arranges the
    160  * parameter assignments to align with the dex bytecode calling conventions.
    161  * With parameter assignments so arranged, the
    162  * {@link com.android.dx.dex.code.RopTranslator} sees Rop {@code move-param}
    163  * instructions as unnecessary in dex form and eliminates them.
    164  *
    165  * <h3>mark-local</h3>
    166  *
    167  * A {@link RegOps.MARK_LOCAL mark-local} instruction indicates that a local
    168  * variable becomes live in a specified register specified register for the
    169  * purposes of debug information. A {@code mark-local} instruction has
    170  * a single source (the register which will now be considered a local variable)
    171  * and no results. The instruction has no side effect.<p>
    172  *
    173  * In a typical case, a local variable's lifetime begins with an
    174  * assignment. The local variable whose information is present in a result's
    175  * {@link RegisterSpec#getLocalItem LocalItem} is considered to begin (or move
    176  * between registers) when the instruction is executed.<p>
    177  *
    178  * However, sometimes a local variable can begin its life or move without
    179  * an assignment occurring. A common example of this is occurs in the Rop
    180  * representation of the following code:<p>
    181  *
    182  * <pre>
    183  * try {
    184  *     Object foo = null;
    185  *     foo = new Object();
    186  * } catch (Throwable ex) { }
    187  * </pre>
    188  *
    189  * An object's initialization occurs in two steps. First, a
    190  * {@code new-instance} instruction is executed, whose result is stored in a
    191  * register. However, that register can not yet be considered to contain
    192  * "foo". That's because the instance's constructor method must be called
    193  * via an {@code invoke} instruction. The constructor method, however, may
    194  * throw an exception. And if an exception occurs, then "foo" should remain
    195  * null. So "foo" becomes the value of the result of the {@code new-instance}
    196  * instruction after the (void) constructor method is invoked and
    197  * returns successfully. In such a case, a {@code mark-local} will
    198  * typically occur at the beginning of the primary successor block following
    199  * the invocation to the constructor.
    200  */
    201