Home | History | Annotate | Download | only in internals
      1 
      2 Verification todo
      3 ~~~~~~~~~~~~~~~~~
      4 check that illegal insns on all targets don't cause the _toIR.c's to
      5 assert.  [DONE: amd64 x86 ppc32 ppc64 arm s390]
      6 
      7 check also with --vex-guest-chase-cond=yes
      8 
      9 check that all targets can run their insn set tests with
     10 --vex-guest-max-insns=1.
     11 
     12 all targets: run some tests using --profile-flags=... to exercise  
     13 function patchProfInc_<arch> [DONE: amd64 x86 ppc32 ppc64 arm s390]
     14 
     15 figure out if there is a way to write a test program that checks
     16 that event checks are actually getting triggered
     17 
     18 
     19 Cleanups
     20 ~~~~~~~~
     21 host_arm_isel.c and host_arm_defs.c: get rid of global var arm_hwcaps.
     22 
     23 host_x86_defs.c, host_amd64_defs.c: return proper VexInvalRange
     24 records from the patchers, instead of {0,0}, so that transparent
     25 self hosting works properly.
     26 
     27 host_ppc_defs.h: is RdWrLR still needed?  If not delete.
     28 
     29 ditto ARM, Ld8S
     30 
     31 Comments that used to be in m_scheduler.c:
     32    tchaining tests:
     33    - extensive spinrounds
     34    - with sched quantum = 1  -- check that handle_noredir_jump
     35      doesn't return with INNER_COUNTERZERO
     36    other:
     37    - out of date comment w.r.t. bit 0 set in libvex_trc_values.h
     38    - can VG_TRC_BORING still happen?  if not, rm
     39    - memory leaks in m_transtab (InEdgeArr/OutEdgeArr leaking?)
     40    - move do_cacheflush out of m_transtab
     41    - more economical unchaining when nuking an entire sector
     42    - ditto w.r.t. cache flushes
     43    - verify case of 2 paths from A to B
     44    - check -- is IP_AT_SYSCALL still right?
     45 
     46 
     47 Optimisations
     48 ~~~~~~~~~~~~~
     49 ppc: chain_XDirect: generate short form jumps when possible
     50 
     51 ppc64: immediate generation is terrible .. should be able
     52        to do better
     53 
     54 arm codegen: Generate ORRS for CmpwNEZ32(Or32(x,y))
     55 
     56 all targets: when nuking an entire sector, don't bother to undo the
     57 patching for any translations within the sector (nor with their
     58 invalidations).
     59 
     60 (somewhat implausible) for jumps to disp_cp_indir, have multiple
     61 copies of disp_cp_indir, one for each of the possible registers that
     62 could have held the target guest address before jumping to the stub.
     63 Then disp_cp_indir wouldn't have to reload it from memory each time.
     64 Might also have the effect of spreading out the indirect mispredict
     65 burden somewhat (across the multiple copies.)
     66 
     67 
     68 Implementation notes
     69 ~~~~~~~~~~~~~~~~~~~~
     70 T-chaining changes -- summary
     71 
     72 * The code generators (host_blah_isel.c, host_blah_defs.[ch]) interact
     73   more closely with Valgrind than before.  In particular the
     74   instruction selectors must use one of 3 different kinds of
     75   control-transfer instructions: XDirect, XIndir and XAssisted.
     76   All archs must use these the same; no more ad-hoc control transfer
     77   instructions.
     78   (more detail below)
     79 
     80 
     81 * With T-chaining, translations can jump between each other without
     82   going through the dispatcher loop every time.  This means that the
     83   event check (counter dec, and exit if negative) the dispatcher loop
     84   previously did now needs to be compiled into each translation.
     85 
     86 
     87 * The assembly dispatcher code (dispatch-arch-os.S) is still
     88   present.  It still provides table lookup services for 
     89   indirect branches, but it also provides a new feature: 
     90   dispatch points, to which the generated code jumps.  There
     91   are 5:
     92 
     93   VG_(disp_cp_chain_me_to_slowEP):
     94   VG_(disp_cp_chain_me_to_fastEP):
     95     These are chain-me requests, used for Boring conditional and
     96     unconditional jumps to destinations known at JIT time.  The
     97     generated code calls these (doesn't jump to them) and the
     98     stub recovers the return address.  These calls never return;
     99     instead the call is done so that the stub knows where the
    100     calling point is.  It needs to know this so it can patch
    101     the calling point to the requested destination.
    102   VG_(disp_cp_xindir):
    103     Old-style table lookup and go; used for indirect jumps
    104   VG_(disp_cp_xassisted):
    105     Most general and slowest kind.  Can transfer to anywhere, but
    106     first returns to scheduler to do some other event (eg a syscall)
    107     before continuing.
    108   VG_(disp_cp_evcheck_fail):
    109     Code jumps here when the event check fails.
    110 
    111 
    112 * new instructions in backends: XDirect, XIndir and XAssisted.
    113   XDirect is used for chainable jumps.  It is compiled into a
    114   call to VG_(disp_cp_chain_me_to_slowEP) or
    115   VG_(disp_cp_chain_me_to_fastEP).
    116 
    117   XIndir is used for indirect jumps.  It is compiled into a jump
    118   to VG_(disp_cp_xindir)
    119 
    120   XAssisted is used for "assisted" (do something first, then jump)
    121   transfers.  It is compiled into a jump to VG_(disp_cp_xassisted)
    122 
    123   All 3 of these may be conditional.
    124 
    125   More complexity: in some circumstances (no-redir translations)
    126   all transfers must be done with XAssisted.  In such cases the
    127   instruction selector will be told this.
    128 
    129 
    130 * Patching: XDirect is compiled basically into
    131      %r11 = &VG_(disp_cp_chain_me_to_{slow,fast}EP)
    132      call *%r11
    133   Backends must provide a function (eg) chainXDirect_AMD64
    134   which converts it into a jump to a specified destination
    135      jmp $delta-of-PCs
    136   or
    137      %r11 = 64-bit immediate
    138      jmpq *%r11
    139   depending on branch distance.
    140 
    141   Backends must provide a function (eg) unchainXDirect_AMD64
    142   which restores the original call-to-the-stub version.
    143 
    144 
    145 * Event checks.  Each translation now has two entry points,
    146   the slow one (slowEP) and fast one (fastEP).  Like this:
    147 
    148      slowEP:
    149         counter--
    150         if (counter < 0) goto VG_(disp_cp_evcheck_fail)
    151      fastEP:
    152         (rest of the translation)
    153 
    154   slowEP is used for control flow transfers that are or might be
    155   a back edge in the control flow graph.  Insn selectors are
    156   given the address of the highest guest byte in the block so
    157   they can determine which edges are definitely not back edges.
    158 
    159   The counter is placed in the first 8 bytes of the guest state,
    160   and the address of VG_(disp_cp_evcheck_fail) is placed in
    161   the next 8 bytes.  This allows very compact checks on all
    162   targets, since no immediates need to be synthesised, eg:
    163 
    164     decq 0(%baseblock-pointer)
    165     jns  fastEP
    166     jmpq *8(baseblock-pointer)
    167     fastEP:
    168 
    169   On amd64 a non-failing check is therefore 2 insns; all 3 occupy
    170   just 8 bytes.
    171 
    172   On amd64 the event check is created by a special single
    173   pseudo-instruction AMD64_EvCheck.
    174 
    175 
    176 * BB profiling (for --profile-flags=).  The dispatch assembly
    177   dispatch-arch-os.S no longer deals with this and so is much
    178   simplified.  Instead the profile inc is compiled into each
    179   translation, as the insn immediately following the event
    180   check.  Again, on amd64 a pseudo-insn AMD64_ProfInc is used.
    181   Counters are now 64 bit even on 32 bit hosts, to avoid overflow.
    182 
    183   One complexity is that at JIT time it is not known where the
    184   address of the counter is.  To solve this, VexTranslateResult
    185   now returns the offset of the profile inc in the generated
    186   code.  When the counter address is known, VEX can be called
    187   again to patch it in.  Backends must supply eg
    188   patchProfInc_AMD64 to make this happen.
    189 
    190 
    191 * Front end changes (guest_blah_toIR.c)
    192 
    193   The way the guest program counter is handled has changed
    194   significantly.  Previously, the guest PC was updated (in IR)
    195   at the start of each instruction, except for the first insn
    196   in an IRSB.  This is inconsistent and doesn't work with the
    197   new framework.
    198 
    199   Now, each instruction must update the guest PC as its last
    200   IR statement -- not its first.  And no special exemption for
    201   the first insn in the block.  As before most of these are
    202   optimised out by ir_opt, so no concerns about efficiency.
    203 
    204   As a logical side effect of this, exits (IRStmt_Exit) and the
    205   block-end transfer are both considered to write to the guest state
    206   (the guest PC) and so need to be told the offset of it.
    207 
    208   IR generators (eg disInstr_AMD64) are no longer allowed to set the
    209   IRSB::next, to specify the block-end transfer address.  Instead they
    210   now indicate, to the generic steering logic that drives them (iow,
    211   guest_generic_bb_to_IR.c), that the block has ended.  This then
    212   generates effectively "goto GET(PC)" (which, again, is optimised
    213   away).  What this does mean is that if the IR generator function
    214   ends the IR of the last instruction in the block with an incorrect
    215   assignment to the guest PC, execution will transfer to an incorrect
    216   destination -- making the error obvious quickly.
    217