1 /* 2 * Copyright (C) 2008 The Android Open Source Project 3 * 4 * Licensed under the Apache License, Version 2.0 (the "License"); 5 * you may not use this file except in compliance with the License. 6 * You may obtain a copy of the License at 7 * 8 * http://www.apache.org/licenses/LICENSE-2.0 9 * 10 * Unless required by applicable law or agreed to in writing, software 11 * distributed under the License is distributed on an "AS IS" BASIS, 12 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 * See the License for the specific language governing permissions and 14 * limitations under the License. 15 */ 16 17 package com.android.dx.rop; 18 19 /** 20 * <h1>An Introduction to Rop Form</h1> 21 * 22 * This package contains classes associated with dx's {@code Rop} 23 * intermediate form.<p> 24 * 25 * The Rop form is intended to represent the instructions and the control-flow 26 * graph in a reasonably programmatically useful form while closely mirroring 27 * the dex instruction set.<p> 28 * 29 * <h2>Key Classes</h2> 30 * 31 * <ul> 32 * <li> {@link RopMethod}, the representation of an individual method 33 * <li> {@link BasicBlock} and its per-method container, {@link BasicBlockList}, 34 * the representation of control flow elements. 35 * <li> {@link Insn} and its subclasses along with its per-basic block 36 * container {@link InsnList}. {@code Insn} instances represent 37 * individual instructions in the abstract register machine. 38 * <li> {@link RegisterSpec} and its container {@link RegisterSpecList}. A 39 * register spec encodes register number, register width, type information, 40 * and potentially local variable information as well for instruction sources 41 * and results. 42 * <li> {@link Rop} instances represent opcodes in the abstract machine. Many 43 * {@code Rop} instances are singletons defined in static fields in 44 * {@link Rops}. The rest are constructed dynamically using static methods 45 * in {@code Rops} 46 * <li> {@link RegOps} lists numeric constants for the opcodes 47 * <li> {@link Constant} and its subclasses represent constant data values 48 * that opcodes may refer to. 49 * <li> {@link Type} instances represent the core data types that can be 50 * handled by the abstract machine. 51 * <li> The {@link TypeBearer} interface is implemented by classes that 52 * represent a core data type, but may also have secondary information 53 * (such as constant value) associated with them. 54 * <ul> 55 * 56 * <h2>Control-Flow Graph</h2> 57 * 58 * Each method is separated into a list of basic blocks. For the most part, 59 * basic blocks are referred to by a positive integer 60 * {@link BasicBlock#getLabel label}, which is always unique per method. The 61 * label value is typically derived from a bytecode address from the source 62 * bytecode. Blocks that don't originate directly from source bytecode have 63 * labels generated for them in a mostly arbitrary order.<p> 64 * 65 * Blocks are referred to by their label, for the most part, because 66 * {@code BasicBlock} instances are immutable and thus any modification to 67 * the control flow graph or the instruction list results in replacement 68 * instances (with identical labels) being created.<p> 69 * 70 * A method has a single {@link RopMethod#getFirstLabel entry block} and 0 71 * to N {@link RopMethod#getExitPredecessors exit predecessor blocks} which 72 * will return. All blocks that are not the entry block will have at least 73 * one predecessor (or are unreachable and should be removed from the block 74 * list). All blocks that are not exit predecessors must have at least one 75 * successor.<p> 76 * 77 * Since all blocks must branch, all blocks must have, as their final 78 * instruction, an instruction whose opcode has a {@link Rop#getBranchingness 79 * branchingness} other than {@link Rop.BRANCH_NONE}. Furthermore, branching 80 * instructions may only be the final instruction in any basic block. If 81 * no other terminating opcode is appropriate, use a {@link Rops#GOTO GOTO}.<p> 82 * 83 * Typically a block will have a {@link BasicBlock#getPrimarySuccessor 84 * primary successor} which distinguishes a particular control flow path. 85 * For {Rops#isCallLike}call or call-like} opcodes, this is the path taken 86 * in the non-exceptional case, where all other successors represent 87 * various exception paths. For comparison operators such as 88 * {@link Rops#IF_EQZ_INT}, the primary successor represents the path taken 89 * if the <b>condition evaluates to false</b>. For {@link SwitchInsn switch 90 * instructions}, the primary successor is the default case.<p> 91 * 92 * A basic block's successor list is ordered and may refer to unique labels 93 * multiple times. For example, if a switch statement contains multiple case 94 * statements for the same code path, a single basic block will likely 95 * appear in the successor list multiple times. In general, the 96 * significance of the successor list's order (like the significance of 97 * the primary successor) is a property of the final instruction of the basic 98 * block. A basic block containing a {@link ThrowingInsn}, for example, has 99 * its successor list in an order identical to the 100 * {@link ThrowingInsn#getCatches} instruction's catches list, with the 101 * primary successor (the no-exception case) listed at the end. 102 * 103 * It is legal for a basic block to have no primary successor. An obvious 104 * example of this is a block that terminates in a {@link Rops#THROW throw} 105 * instruction where a catch block exists inside the current method for that 106 * exception class. Since the only possible path is the exception path, only 107 * the exception path (which cannot be a primary successor) is a successor. 108 * An example of this is shown in {@code dx/tests/092-ssa-cfg-edge-cases}. 109 * 110 * <h2>Rop Instructions</h2> 111 * 112 * <h3>move-result and move-result-pseudo</h3> 113 * 114 * An instruction that may throw an exception may not specify a result. This 115 * is necessary because the result register will not be assigned to if an 116 * exception occurs while processing said instruction and a result assignment 117 * may not occur. Since result assignments only occur in the non-exceptional 118 * case, the result assignments for throwing instructions can be said to occur 119 * at the beginning of the primary successor block rather than at the end of 120 * the current block. The Rop form represents the result assignments this way. 121 * Throwing instructions may not directly specify results. Instead, result 122 * assignments are represented by {@link 123 * Rops#MOVE_RESULT move-result} or {@link Rops#MOVE_RESULT_PSEUDO 124 * move-result-pseudo} instructions at the top of the primary successor block. 125 * 126 * Only a single {@code move-result} or {@code move-result-pseudo} 127 * may exist in any block and it must be exactly the first instruction in the 128 * block. 129 * 130 * A {@code move-result} instruction is used for the results of call-like 131 * instructions. If the value produced by a {@code move-result} is not 132 * used by the method, it may be eliminated as dead code. 133 * 134 * A {@code move-result-pseudo} instruction is used for the results of 135 * non-call-like throwing instructions. It may never be considered dead code 136 * since the final dex instruction will always indicate a result register. 137 * If a required {@code move-result-pseudo} instruction is not found 138 * during conversion to dex bytecode, an exception will be thrown. 139 * 140 * <h3>move-exception</h3> 141 * 142 * A {@link RegOps.MOVE_EXCEPTION move-exception} instruction may appear at 143 * the start of a catch block, and represents the obtaining of the thrown 144 * exception instance. It may only occur as the first instruction in a 145 * basic block, and any basic block in which it occurs must be reachable only 146 * as an exception successor. 147 * 148 * <h3>move-param</h3> 149 * 150 * A {@link RegOps.MOVE_PARAM move-param} instruction represents a method 151 * parameter. Every {@code move-param} instruction is a 152 * {@link PlainCstInsn}. The index of the method parameter they refer to is 153 * carried as the {@link CstInteger integer constant} associated with the 154 * instruction. 155 * 156 * Any number of {@code move-param} instructions referring to the same 157 * parameter index may be included in a method's instruction lists. They 158 * have no restrictions on placement beyond those of any other 159 * {@link Rop.BRANCH_NONE} instruction. Note that the SSA optimizer arranges the 160 * parameter assignments to align with the dex bytecode calling conventions. 161 * With parameter assignments so arranged, the 162 * {@link com.android.dx.dex.code.RopTranslator} sees Rop {@code move-param} 163 * instructions as unnecessary in dex form and eliminates them. 164 * 165 * <h3>mark-local</h3> 166 * 167 * A {@link RegOps.MARK_LOCAL mark-local} instruction indicates that a local 168 * variable becomes live in a specified register specified register for the 169 * purposes of debug information. A {@code mark-local} instruction has 170 * a single source (the register which will now be considered a local variable) 171 * and no results. The instruction has no side effect.<p> 172 * 173 * In a typical case, a local variable's lifetime begins with an 174 * assignment. The local variable whose information is present in a result's 175 * {@link RegisterSpec#getLocalItem LocalItem} is considered to begin (or move 176 * between registers) when the instruction is executed.<p> 177 * 178 * However, sometimes a local variable can begin its life or move without 179 * an assignment occurring. A common example of this is occurs in the Rop 180 * representation of the following code:<p> 181 * 182 * <pre> 183 * try { 184 * Object foo = null; 185 * foo = new Object(); 186 * } catch (Throwable ex) { } 187 * </pre> 188 * 189 * An object's initialization occurs in two steps. First, a 190 * {@code new-instance} instruction is executed, whose result is stored in a 191 * register. However, that register can not yet be considered to contain 192 * "foo". That's because the instance's constructor method must be called 193 * via an {@code invoke} instruction. The constructor method, however, may 194 * throw an exception. And if an exception occurs, then "foo" should remain 195 * null. So "foo" becomes the value of the result of the {@code new-instance} 196 * instruction after the (void) constructor method is invoked and 197 * returns successfully. In such a case, a {@code mark-local} will 198 * typically occur at the beginning of the primary successor block following 199 * the invocation to the constructor. 200 */ 201