Home | History | Annotate | Download | only in dhrystone
      1 Dhrystone Benchmark: Rationale for Version 2 and Measurement Rules
      2 
      3                  Reinhold P. Weicker
      4                  Siemens AG, E STE 35
      5                  Postfach 3240
      6                  D-8520 Erlangen
      7                  Germany (West)
      8 
      9 
     10 
     11 
     12 The Dhrystone benchmark program [1] has become a popular benchmark  for
     13 CPU/compiler  performance  measurement,  in  particular  in the area of
     14 minicomputers, workstations, PC's and  microprocesors.   It  apparently
     15 satisfies a need for an easy-to-use integer benchmark; it gives a first
     16 performance indication which  is  more  meaningful  than  MIPS  numbers
     17 which,  in  their  literal  meaning  (million instructions per second),
     18 cannot be used across different instruction sets (e.g. RISC vs.  CISC).
     19 With  the  increasing  use  of  the  benchmark,  it  seems necessary to
     20 reconsider the benchmark and to check whether it can still fulfill this
     21 function.   Version  2  of  Dhrystone  is  the  result  of  such  a re-
     22 evaluation, it has been made for two reasons:
     23 
     24 o Dhrystone has been published in Ada [1], and Versions in Ada,  Pascal
     25   and  C  have  been  distributed  by Reinhold Weicker via floppy disk.
     26   However, the version that was used most often  for  benchmarking  has
     27   been  the version made by Rick Richardson by another translation from
     28   the Ada version into the C programming language, this  has  been  the
     29   version distributed via the UNIX network Usenet [2].
     30 
     31   There is an obvious need for a common C version of Dhrystone, since C
     32   is  at  present  the most popular system programming language for the
     33   class of systems (microcomputers, minicomputers, workstations)  where
     34   Dhrystone  is  used  most.  There should be, as far as possible, only
     35   one C version of Dhrystone such that results can be compared  without
     36   restrictions.  In  the  past,  the  C  versions  distributed  by Rick
     37   Richardson (Version 1.1) and by Reinhold Weicker  had  small  (though
     38   not significant) differences.
     39 
     40   Together with the new C version, the Ada  and  Pascal  versions  have
     41   been updated as well.
     42 
     43 o As far as it is possible without changes to the Dhrystone statistics,
     44   optimizing  compilers  should  be prevented from removing significant
     45   statements.  It has turned out in the past that optimizing  compilers
     46   suppressed  code  generation  for  too many statements (by "dead code
     47   removal" or "dead variable  elimination").   This  has  lead  to  the
     48   danger  that  benchmarking results obtained by a naive application of
     49   Dhrystone - without inspection of the code that was generated - could
     50   become meaningless.
     51 
     52 The overall policiy for version 2 has been  that  the  distribution  of
     53 statements,  operand types and operand locality described in [1] should
     54 remain  unchanged  as  much  as  possible.   (Very  few  changes   were
     55 necessary;  their  impact  should  be  negligible.)  Also, the order of
     56 statements should  remain  unchanged.  Although  I  am  aware  of  some
     57 critical  remarks on the benchmark - I agree with several of them - and
     58 know some suggestions for improvement, I  didn't  want  to  change  the
     59 benchmark  into  something  different  from  what  has  become known as
     60 "Dhrystone"; the confusion generated by such a  change  would  probably
     61 outweight  the  benefits. If I were to write a new benchmark program, I
     62 wouldn't give it the name "Dhrystone" since this  denotes  the  program
     63 published in [1].  However, I do recognize the need for a larger number
     64 of representative programs that can be used as benchmarks; users should
     65 always be encouraged to use more than just one benchmark.
     66 
     67 The  new  versions  (version  2.1  for  C,  Pascal  and  Ada)  will  be
     68 distributed  as  widely as possible.  (Version 2.1 differs from version
     69 2.0 distributed via the UNIX Network Usenet in March 1988 only in a few
     70 corrections  for  minor  deficiencies  found  by users of version 2.0.)
     71 Readers who want to use the benchmark for their  own  measurements  can
     72 obtain  a copy in machine-readable form on floppy disk (MS-DOS or XENIX
     73 format) from the author.
     74 
     75 
     76 In general, version 2 follows - in the parts that are  significant  for
     77 performance  measurement,  i.e.   within  the  measurement  loop  - the
     78 published (Ada) version and  the  C  versions  previously  distributed.
     79 Where  the  versions  distributed  by  Rick Richardson [2] and Reinhold
     80 Weicker have been different, it  follows  the  version  distributed  by
     81 Reinhold  Weicker.  (However,  the  differences have been so small that
     82 their impact on execution time in all likelihood has been  negligible.)
     83 The  initialization  and  UNIX  instrumentation  part  - which had been
     84 omitted in [1] - follows mostly  the  ideas  of  Rick  Richardson  [2].
     85 However,  any changes in the initialization part and in the printing of
     86 the result have no impact on performance  measurement  since  they  are
     87 outside  the  measaurement  loop.   As a concession to older compilers,
     88 names have been made unique within the first 8  characters  for  the  C
     89 version.
     90 
     91 The original publication of Dhrystone did not  contain  any  statements
     92 for  time  measurement  since  they  are  necessarily system-dependent.
     93 However, it turned out that it is not enough just to inclose  the  main
     94 procedure of Dhrystone in a loop and to measure the execution time.  If
     95 the variables that are computed are not  used  somehow,  there  is  the
     96 danger  that  the  compiler  considers  them  as  "dead  variables" and
     97 suppresses code generation for a part of the statements.  Therefore  in
     98 version  2  all  variables  of  "main"  are  printed  at the end of the
     99 program. This  also  permits  some  plausibility  control  for  correct
    100 execution of the benchmark.
    101 
    102 At several places in the benchmark, code has been added,  but  only  in
    103 branches  that  are  not  executed.  The  intention  is that optimizing
    104 compilers should be prevented from moving code out of  the  measurement
    105 loop,  or  from  removing code altogether. Statements that are executed
    106 have been changed in very few places only.  In these  cases,  only  the
    107 role  of  some operands has been changed, and it was made sure that the
    108 numbers  defining  the  "Dhrystone   distribution"   (distribution   of
    109 statements, operand types and locality) still hold as much as possible.
    110 Except for sophisticated  optimizing  compilers,  execution  times  for
    111 version 2.1 should be the same as for previous versions.
    112 
    113 Because of the self-imposed limitation that the order and  distribution
    114 of the executed statements should not be changed, there are still cases
    115 where optimizing compilers may not generate code for  some  statements.
    116 To   a   certain  degree,  this  is  unavoidable  for  small  synthetic
    117 benchmarks.  Users of the benchmark are advised to check code  listings
    118 whether code is generated for all statements of Dhrystone.
    119 
    120 Contrary to the suggestion in the published paper and  its  realization
    121 in  the  versions  previously  distributed, no attempt has been made to
    122 subtract the time for the measurement loop overhead. (This  calculation
    123 has  proven  difficult  to implement in a correct way, and its omission
    124 makes the program simpler.) However, since the loop check is  now  part
    125 of  the benchmark, this does have an impact - though a very minor one -
    126 on the  distribution  statistics  which  have  been  updated  for  this
    127 version.
    128 
    129 
    130 In this section, all changes are described that affect the  measurement
    131 loop and that are not just renamings of variables. All remarks refer to
    132 the C version; the other language versions have been updated similarly.
    133 
    134 In addition to adding the measurement loop and the printout statements,
    135 changes have been made at the following places:
    136 
    137 o In procedure "main", three statements have been  added  in  the  non-
    138   executed "then" part of the statement
    139     if (Enum_Loc == Func_1 (Ch_Index, 'C'))
    140   they are
    141     strcpy (Str_2_Loc, "DHRYSTONE PROGRAM, 3'RD STRING");
    142     Int_2_Loc = Run_Index;
    143     Int_Glob = Run_Index;
    144   The string assignment prevents movement of the  preceding  assignment
    145   to  Str_2_Loc  (5'th statement of "main") out of the measurement loop
    146   (This probably will not happen for the C version, but it  did  happen
    147   with  another  language  and  compiler.)  The assignment to Int_2_Loc
    148   prevents value propagation  for  Int_2_Loc,  and  the  assignment  to
    149   Int_Glob  makes  the  value  of  Int_Glob possibly dependent from the
    150   value of Run_Index.
    151 
    152 o In the three arithmetic computations at the end  of  the  measurement
    153   loop  in  "main  ", the role of some variables has been exchanged, to
    154   prevent the division from just cancelling out the  multiplication  as
    155   it  was in [1].  A very smart compiler might have recognized this and
    156   suppressed code generation for the division.
    157 
    158 o For Proc_2, no code has been changed, but the values  of  the  actual
    159   parameter have changed due to changes in "main".
    160 
    161 o In Proc_4, the second assignment has been changed from
    162     Bool_Loc = Bool_Loc | Bool_Glob;
    163   to
    164     Bool_Glob = Bool_Loc | Bool_Glob;
    165   It now assigns a value to  a  global  variable  instead  of  a  local
    166   variable (Bool_Loc); Bool_Loc would be a "dead variable" which is not
    167   used afterwards.
    168 
    169 o In Func_1, the statement
    170     Ch_1_Glob = Ch_1_Loc;
    171   was added in the non-executed "else" part of the "if"  statement,  to
    172   prevent  the  suppression  of  code  generation for the assignment to
    173   Ch_1_Loc.
    174 
    175 o In Func_2, the second character comparison statement has been changed
    176   to
    177     if (Ch_Loc == 'R')
    178   ('R' instead of 'X') because a comparison with 'X' is implied in  the
    179   preceding "if" statement.
    180 
    181   Also in Func_2, the statement
    182     Int_Glob = Int_Loc;
    183   has been added in the non-executed part of the last  "if"  statement,
    184   in order to prevent Int_Loc from becoming a dead variable.
    185 
    186 o In Func_3, a non-executed "else" part has  been  added  to  the  "if"
    187   statement.   While  the  program  would not be incorrect without this
    188   "else" part, it is considered bad programming practice if a  function
    189   can be left without a return value.
    190 
    191   To compensate for this change, the (non-executed) "else" part in  the
    192   "if" statement of Proc_3 was removed.
    193 
    194 The distribution statistics have been changed only by the  addition  of
    195 the  measurement  loop  iteration (1 additional statement, 4 additional
    196 local integer operands) and  by  the  change  in  Proc_4  (one  operand
    197 changed  from  local  to  global).  The  distribution statistics in the
    198 comment headers have been updated accordingly.
    199 
    200 
    201 The string operations (string assignment and  string  comparison)  have
    202 not  been  changed,  to  keep  the program consistent with the original
    203 version.
    204 
    205 There has been some  concern  that  the  string  operations  are  over-
    206 represented  in  the  program,  and that execution time is dominated by
    207 these  operations.   This  was  true  in  particular  when   optimizing
    208 compilers  removed  too much code in the main part of the program, this
    209 should have been mitigated in version 2.
    210 
    211 It should be noted that this is a language-dependent issue:   Dhrystone
    212 was  first published in Ada, and with Ada or Pascal semantics, the time
    213 spent in the string operations is,  at  least  in  all  implementations
    214 known  to  me, considerably smaller.  In Ada and Pascal, assignment and
    215 comparison of strings are operators defined in the  language,  and  the
    216 upper  bounds of the strings occuring in Dhrystone are part of the type
    217 information known at compilation time.   The  compilers  can  therefore
    218 generate efficient inline code.  In C, string assignemt and comparisons
    219 are not part  of  the  language,  so  the  string  operations  must  be
    220 expressed  in  terms  of the C library functions "strcpy" and "strcmp".
    221 (ANSI  C  allows  an  implementation  to  use  inline  code  for  these
    222 functions.)   In addition to the overhead caused by additional function
    223 calls, these functions are defined for  null-terminated  strings  where
    224 the  length  of  the  strings  is  not  known  at compilation time; the
    225 function has to check every byte for  the  termination  condition  (the
    226 null byte).
    227 
    228 Obviously, a C library which includes efficiently  coded  "strcpy"  and
    229 "strcmp"  functions  helps to obtain good Dhrystone results. However, I
    230 don't think that this is unfair since string functions do  occur  quite
    231 frequently  in real programs (editors, command interpreters, etc.).  If
    232 the strings functions are  implemented  efficiently,  this  helps  real
    233 programs as well as benchmark programs.
    234 
    235 I admit that the string comparison in Dhrystone terminates later (after
    236 scanning  20 characters) than most string comparisons in real programs.
    237 For consistency with  the  original  benchmark,  I  didn't  change  the
    238 program despite this weakness.
    239 
    240 
    241 When Dhrystone is used, the following "ground rules" apply:
    242 
    243 o Separate compilation (Ada and C versions)
    244 
    245   As  mentioned  in  [1],  Dhrystone  was  written  to  reflect  actual
    246   programming  practice  in  systems  programming.   The  division into
    247   several compilation units (5 in the Ada version, 2 in the C  version)
    248   is  intended, as is the distribution of inter-module and intra-module
    249   subprogram  calls.   Although  on  many  systems  there  will  be  no
    250   difference  in  execution  time  to  a  Dhrystone  version  where all
    251   compilation units are merged into one file, the rule is that separate
    252   compilation  should  be used.  The intention is that real programming
    253   practice, where programs consist of  several  independently  compiled
    254   units, should be reflected.  This also has implies that the compiler,
    255   while compiling one  unit,  has  no  information  about  the  use  of
    256   variables,  register  allocation  etc.  occuring in other compilation
    257   units.  Although in real life  compilation  units  will  probably  be
    258   larger,  the  intention is that these effects of separate compilation
    259   are modeled in Dhrystone.
    260 
    261   A few  language  systems  have  post-linkage  optimization  available
    262   (e.g.,  final  register allocation is performed after linkage).  This
    263   is a borderline case: Post-linkage optimization  involves  additional
    264   program  preparation time (although not as much as compilation in one
    265   unit) which may prevent its general use in practical programming.   I
    266   think that since it defeats the intentions given above, it should not
    267   be used for Dhrystone.
    268 
    269   Unfortunately, ISO/ANSI Pascal does not contain language features for
    270   separate  compilation.   Although  most  commercial  Pascal compilers
    271   provide separate compilation in  some  way,  we  cannot  use  it  for
    272   Dhrystone  since such a version would not be portable.  Therefore, no
    273   attempt has been made  to  provide  a  Pascal  version  with  several
    274   compilation units.
    275 
    276 o No procedure merging
    277 
    278   Although  Dhrystone  contains  some  very  short   procedures   where
    279   execution  would  benefit  from  procedure  merging  (inlining, macro
    280   expansion of procedures), procedure merging is not to be  used.   The
    281   reason is that the percentage of procedure and function calls is part
    282   of the "Dhrystone distribution" of statements contained in [1].  This
    283   restriction  does  not hold for the string functions of the C version
    284   since ANSI C allows an implementation to use inline  code  for  these
    285   functions.
    286 
    287 
    288 
    289 o Other optimizations are allowed, but they should be indicated
    290 
    291   It is  often  hard  to  draw  an  exact  line  between  "normal  code
    292   generation"  and  "optimization" in compilers: Some compilers perform
    293   operations by default that are invoked in other compilers  only  when
    294   optimization  is explicitly requested.  Also, we cannot avoid that in
    295   benchmarking people try to achieve  results  that  look  as  good  as
    296   possible.   Therefore,  optimizations  performed by compilers - other
    297   than those listed above - are not forbidden when Dhrystone  execution
    298   times  are measured.  Dhrystone is not intended to be non-optimizable
    299   but is intended to be similarly optimizable as normal programs.   For
    300   example,  there  are  several  places  in Dhrystone where performance
    301   benefits from optimizations like  common  subexpression  elimination,
    302   value propagation etc., but normal programs usually also benefit from
    303   these optimizations.  Therefore, no effort was made  to  artificially
    304   prevent  such  optimizations.   However,  measurement  reports should
    305   indicate which compiler  optimization  levels  have  been  used,  and
    306   reporting  results with different levels of compiler optimization for
    307   the same hardware is encouraged.
    308 
    309 o Default results are those without "register" declarations (C version)
    310 
    311   When Dhrystone results are quoted without  additional  qualification,
    312   they  should  be  understood  as  results obtained without use of the
    313   "register" attribute. Good compilers should be able to make good  use
    314   of  registers  even  without  explicit register declarations ([3], p.
    315   193).
    316 
    317 Of  course,  for  experimental  purposes,  post-linkage   optimization,
    318 procedure  merging  and/or  compilation  in  one  unit  can  be done to
    319 determine their effects.  However,  Dhrystone  numbers  obtained  under
    320 these   conditions  should  be  explicitly  marked  as  such;  "normal"
    321 Dhrystone results should be understood as  results  obtained  following
    322 the ground rules listed above.
    323 
    324 In any case, for serious performance evaluation, users are  advised  to
    325 ask  for  code listings and to check them carefully.  In this way, when
    326 results for different systems  are  compared,  the  reader  can  get  a
    327 feeling how much performance difference is due to compiler optimization
    328 and how much is due to hardware speed.
    329 
    330 
    331 The C version 2.1 of Dhrystone has been developed in  cooperation  with
    332 Rick Richardson (Tinton Falls, NJ), it incorporates many ideas from the
    333 "Version 1.1" distributed previously  by  him  over  the  UNIX  network
    334 Usenet.  Through  his  activity with Usenet, Rick Richardson has made a
    335 very valuable contribution to the dissemination of  the  benchmark.   I
    336 also  thank  Chaim  Benedelac  (National  Semiconductor),  David Ditzel
    337 (SUN), Earl Killian and John  Mashey  (MIPS),  Alan  Smith  and  Rafael
    338 Saavedra-Barrera  (UC  at  Berkeley)  for  their  help with comments on
    339 earlier versions of the benchmark.
    340 
    341 
    342 [1]
    343    Reinhold P. Weicker:  Dhrystone:  A  Synthetic  Systems  Programming
    344    Benchmark.
    345    Communications of the ACM 27, 10 (Oct. 1984), 1013-1030
    346 
    347 [2]
    348    Rick Richardson: Dhrystone 1.1 Benchmark Summary (and Program Text)
    349    Informal Distribution via "Usenet", Last Version Known to me:  Sept.
    350    21, 1987
    351 
    352 [3]
    353    Brian W.  Kernighan  and  Dennis  M.  Ritchie:   The  C  Programming
    354    Language.
    355    Prentice-Hall, Englewood Cliffs (NJ) 1978
    356 
    357 
    358 
    359 
    360 
    361