1 Dhrystone Benchmark: Rationale for Version 2 and Measurement Rules 2 3 Reinhold P. Weicker 4 Siemens AG, E STE 35 5 Postfach 3240 6 D-8520 Erlangen 7 Germany (West) 8 9 10 11 12 The Dhrystone benchmark program [1] has become a popular benchmark for 13 CPU/compiler performance measurement, in particular in the area of 14 minicomputers, workstations, PC's and microprocesors. It apparently 15 satisfies a need for an easy-to-use integer benchmark; it gives a first 16 performance indication which is more meaningful than MIPS numbers 17 which, in their literal meaning (million instructions per second), 18 cannot be used across different instruction sets (e.g. RISC vs. CISC). 19 With the increasing use of the benchmark, it seems necessary to 20 reconsider the benchmark and to check whether it can still fulfill this 21 function. Version 2 of Dhrystone is the result of such a re- 22 evaluation, it has been made for two reasons: 23 24 o Dhrystone has been published in Ada [1], and Versions in Ada, Pascal 25 and C have been distributed by Reinhold Weicker via floppy disk. 26 However, the version that was used most often for benchmarking has 27 been the version made by Rick Richardson by another translation from 28 the Ada version into the C programming language, this has been the 29 version distributed via the UNIX network Usenet [2]. 30 31 There is an obvious need for a common C version of Dhrystone, since C 32 is at present the most popular system programming language for the 33 class of systems (microcomputers, minicomputers, workstations) where 34 Dhrystone is used most. There should be, as far as possible, only 35 one C version of Dhrystone such that results can be compared without 36 restrictions. In the past, the C versions distributed by Rick 37 Richardson (Version 1.1) and by Reinhold Weicker had small (though 38 not significant) differences. 39 40 Together with the new C version, the Ada and Pascal versions have 41 been updated as well. 42 43 o As far as it is possible without changes to the Dhrystone statistics, 44 optimizing compilers should be prevented from removing significant 45 statements. It has turned out in the past that optimizing compilers 46 suppressed code generation for too many statements (by "dead code 47 removal" or "dead variable elimination"). This has lead to the 48 danger that benchmarking results obtained by a naive application of 49 Dhrystone - without inspection of the code that was generated - could 50 become meaningless. 51 52 The overall policiy for version 2 has been that the distribution of 53 statements, operand types and operand locality described in [1] should 54 remain unchanged as much as possible. (Very few changes were 55 necessary; their impact should be negligible.) Also, the order of 56 statements should remain unchanged. Although I am aware of some 57 critical remarks on the benchmark - I agree with several of them - and 58 know some suggestions for improvement, I didn't want to change the 59 benchmark into something different from what has become known as 60 "Dhrystone"; the confusion generated by such a change would probably 61 outweight the benefits. If I were to write a new benchmark program, I 62 wouldn't give it the name "Dhrystone" since this denotes the program 63 published in [1]. However, I do recognize the need for a larger number 64 of representative programs that can be used as benchmarks; users should 65 always be encouraged to use more than just one benchmark. 66 67 The new versions (version 2.1 for C, Pascal and Ada) will be 68 distributed as widely as possible. (Version 2.1 differs from version 69 2.0 distributed via the UNIX Network Usenet in March 1988 only in a few 70 corrections for minor deficiencies found by users of version 2.0.) 71 Readers who want to use the benchmark for their own measurements can 72 obtain a copy in machine-readable form on floppy disk (MS-DOS or XENIX 73 format) from the author. 74 75 76 In general, version 2 follows - in the parts that are significant for 77 performance measurement, i.e. within the measurement loop - the 78 published (Ada) version and the C versions previously distributed. 79 Where the versions distributed by Rick Richardson [2] and Reinhold 80 Weicker have been different, it follows the version distributed by 81 Reinhold Weicker. (However, the differences have been so small that 82 their impact on execution time in all likelihood has been negligible.) 83 The initialization and UNIX instrumentation part - which had been 84 omitted in [1] - follows mostly the ideas of Rick Richardson [2]. 85 However, any changes in the initialization part and in the printing of 86 the result have no impact on performance measurement since they are 87 outside the measaurement loop. As a concession to older compilers, 88 names have been made unique within the first 8 characters for the C 89 version. 90 91 The original publication of Dhrystone did not contain any statements 92 for time measurement since they are necessarily system-dependent. 93 However, it turned out that it is not enough just to inclose the main 94 procedure of Dhrystone in a loop and to measure the execution time. If 95 the variables that are computed are not used somehow, there is the 96 danger that the compiler considers them as "dead variables" and 97 suppresses code generation for a part of the statements. Therefore in 98 version 2 all variables of "main" are printed at the end of the 99 program. This also permits some plausibility control for correct 100 execution of the benchmark. 101 102 At several places in the benchmark, code has been added, but only in 103 branches that are not executed. The intention is that optimizing 104 compilers should be prevented from moving code out of the measurement 105 loop, or from removing code altogether. Statements that are executed 106 have been changed in very few places only. In these cases, only the 107 role of some operands has been changed, and it was made sure that the 108 numbers defining the "Dhrystone distribution" (distribution of 109 statements, operand types and locality) still hold as much as possible. 110 Except for sophisticated optimizing compilers, execution times for 111 version 2.1 should be the same as for previous versions. 112 113 Because of the self-imposed limitation that the order and distribution 114 of the executed statements should not be changed, there are still cases 115 where optimizing compilers may not generate code for some statements. 116 To a certain degree, this is unavoidable for small synthetic 117 benchmarks. Users of the benchmark are advised to check code listings 118 whether code is generated for all statements of Dhrystone. 119 120 Contrary to the suggestion in the published paper and its realization 121 in the versions previously distributed, no attempt has been made to 122 subtract the time for the measurement loop overhead. (This calculation 123 has proven difficult to implement in a correct way, and its omission 124 makes the program simpler.) However, since the loop check is now part 125 of the benchmark, this does have an impact - though a very minor one - 126 on the distribution statistics which have been updated for this 127 version. 128 129 130 In this section, all changes are described that affect the measurement 131 loop and that are not just renamings of variables. All remarks refer to 132 the C version; the other language versions have been updated similarly. 133 134 In addition to adding the measurement loop and the printout statements, 135 changes have been made at the following places: 136 137 o In procedure "main", three statements have been added in the non- 138 executed "then" part of the statement 139 if (Enum_Loc == Func_1 (Ch_Index, 'C')) 140 they are 141 strcpy (Str_2_Loc, "DHRYSTONE PROGRAM, 3'RD STRING"); 142 Int_2_Loc = Run_Index; 143 Int_Glob = Run_Index; 144 The string assignment prevents movement of the preceding assignment 145 to Str_2_Loc (5'th statement of "main") out of the measurement loop 146 (This probably will not happen for the C version, but it did happen 147 with another language and compiler.) The assignment to Int_2_Loc 148 prevents value propagation for Int_2_Loc, and the assignment to 149 Int_Glob makes the value of Int_Glob possibly dependent from the 150 value of Run_Index. 151 152 o In the three arithmetic computations at the end of the measurement 153 loop in "main ", the role of some variables has been exchanged, to 154 prevent the division from just cancelling out the multiplication as 155 it was in [1]. A very smart compiler might have recognized this and 156 suppressed code generation for the division. 157 158 o For Proc_2, no code has been changed, but the values of the actual 159 parameter have changed due to changes in "main". 160 161 o In Proc_4, the second assignment has been changed from 162 Bool_Loc = Bool_Loc | Bool_Glob; 163 to 164 Bool_Glob = Bool_Loc | Bool_Glob; 165 It now assigns a value to a global variable instead of a local 166 variable (Bool_Loc); Bool_Loc would be a "dead variable" which is not 167 used afterwards. 168 169 o In Func_1, the statement 170 Ch_1_Glob = Ch_1_Loc; 171 was added in the non-executed "else" part of the "if" statement, to 172 prevent the suppression of code generation for the assignment to 173 Ch_1_Loc. 174 175 o In Func_2, the second character comparison statement has been changed 176 to 177 if (Ch_Loc == 'R') 178 ('R' instead of 'X') because a comparison with 'X' is implied in the 179 preceding "if" statement. 180 181 Also in Func_2, the statement 182 Int_Glob = Int_Loc; 183 has been added in the non-executed part of the last "if" statement, 184 in order to prevent Int_Loc from becoming a dead variable. 185 186 o In Func_3, a non-executed "else" part has been added to the "if" 187 statement. While the program would not be incorrect without this 188 "else" part, it is considered bad programming practice if a function 189 can be left without a return value. 190 191 To compensate for this change, the (non-executed) "else" part in the 192 "if" statement of Proc_3 was removed. 193 194 The distribution statistics have been changed only by the addition of 195 the measurement loop iteration (1 additional statement, 4 additional 196 local integer operands) and by the change in Proc_4 (one operand 197 changed from local to global). The distribution statistics in the 198 comment headers have been updated accordingly. 199 200 201 The string operations (string assignment and string comparison) have 202 not been changed, to keep the program consistent with the original 203 version. 204 205 There has been some concern that the string operations are over- 206 represented in the program, and that execution time is dominated by 207 these operations. This was true in particular when optimizing 208 compilers removed too much code in the main part of the program, this 209 should have been mitigated in version 2. 210 211 It should be noted that this is a language-dependent issue: Dhrystone 212 was first published in Ada, and with Ada or Pascal semantics, the time 213 spent in the string operations is, at least in all implementations 214 known to me, considerably smaller. In Ada and Pascal, assignment and 215 comparison of strings are operators defined in the language, and the 216 upper bounds of the strings occuring in Dhrystone are part of the type 217 information known at compilation time. The compilers can therefore 218 generate efficient inline code. In C, string assignemt and comparisons 219 are not part of the language, so the string operations must be 220 expressed in terms of the C library functions "strcpy" and "strcmp". 221 (ANSI C allows an implementation to use inline code for these 222 functions.) In addition to the overhead caused by additional function 223 calls, these functions are defined for null-terminated strings where 224 the length of the strings is not known at compilation time; the 225 function has to check every byte for the termination condition (the 226 null byte). 227 228 Obviously, a C library which includes efficiently coded "strcpy" and 229 "strcmp" functions helps to obtain good Dhrystone results. However, I 230 don't think that this is unfair since string functions do occur quite 231 frequently in real programs (editors, command interpreters, etc.). If 232 the strings functions are implemented efficiently, this helps real 233 programs as well as benchmark programs. 234 235 I admit that the string comparison in Dhrystone terminates later (after 236 scanning 20 characters) than most string comparisons in real programs. 237 For consistency with the original benchmark, I didn't change the 238 program despite this weakness. 239 240 241 When Dhrystone is used, the following "ground rules" apply: 242 243 o Separate compilation (Ada and C versions) 244 245 As mentioned in [1], Dhrystone was written to reflect actual 246 programming practice in systems programming. The division into 247 several compilation units (5 in the Ada version, 2 in the C version) 248 is intended, as is the distribution of inter-module and intra-module 249 subprogram calls. Although on many systems there will be no 250 difference in execution time to a Dhrystone version where all 251 compilation units are merged into one file, the rule is that separate 252 compilation should be used. The intention is that real programming 253 practice, where programs consist of several independently compiled 254 units, should be reflected. This also has implies that the compiler, 255 while compiling one unit, has no information about the use of 256 variables, register allocation etc. occuring in other compilation 257 units. Although in real life compilation units will probably be 258 larger, the intention is that these effects of separate compilation 259 are modeled in Dhrystone. 260 261 A few language systems have post-linkage optimization available 262 (e.g., final register allocation is performed after linkage). This 263 is a borderline case: Post-linkage optimization involves additional 264 program preparation time (although not as much as compilation in one 265 unit) which may prevent its general use in practical programming. I 266 think that since it defeats the intentions given above, it should not 267 be used for Dhrystone. 268 269 Unfortunately, ISO/ANSI Pascal does not contain language features for 270 separate compilation. Although most commercial Pascal compilers 271 provide separate compilation in some way, we cannot use it for 272 Dhrystone since such a version would not be portable. Therefore, no 273 attempt has been made to provide a Pascal version with several 274 compilation units. 275 276 o No procedure merging 277 278 Although Dhrystone contains some very short procedures where 279 execution would benefit from procedure merging (inlining, macro 280 expansion of procedures), procedure merging is not to be used. The 281 reason is that the percentage of procedure and function calls is part 282 of the "Dhrystone distribution" of statements contained in [1]. This 283 restriction does not hold for the string functions of the C version 284 since ANSI C allows an implementation to use inline code for these 285 functions. 286 287 288 289 o Other optimizations are allowed, but they should be indicated 290 291 It is often hard to draw an exact line between "normal code 292 generation" and "optimization" in compilers: Some compilers perform 293 operations by default that are invoked in other compilers only when 294 optimization is explicitly requested. Also, we cannot avoid that in 295 benchmarking people try to achieve results that look as good as 296 possible. Therefore, optimizations performed by compilers - other 297 than those listed above - are not forbidden when Dhrystone execution 298 times are measured. Dhrystone is not intended to be non-optimizable 299 but is intended to be similarly optimizable as normal programs. For 300 example, there are several places in Dhrystone where performance 301 benefits from optimizations like common subexpression elimination, 302 value propagation etc., but normal programs usually also benefit from 303 these optimizations. Therefore, no effort was made to artificially 304 prevent such optimizations. However, measurement reports should 305 indicate which compiler optimization levels have been used, and 306 reporting results with different levels of compiler optimization for 307 the same hardware is encouraged. 308 309 o Default results are those without "register" declarations (C version) 310 311 When Dhrystone results are quoted without additional qualification, 312 they should be understood as results obtained without use of the 313 "register" attribute. Good compilers should be able to make good use 314 of registers even without explicit register declarations ([3], p. 315 193). 316 317 Of course, for experimental purposes, post-linkage optimization, 318 procedure merging and/or compilation in one unit can be done to 319 determine their effects. However, Dhrystone numbers obtained under 320 these conditions should be explicitly marked as such; "normal" 321 Dhrystone results should be understood as results obtained following 322 the ground rules listed above. 323 324 In any case, for serious performance evaluation, users are advised to 325 ask for code listings and to check them carefully. In this way, when 326 results for different systems are compared, the reader can get a 327 feeling how much performance difference is due to compiler optimization 328 and how much is due to hardware speed. 329 330 331 The C version 2.1 of Dhrystone has been developed in cooperation with 332 Rick Richardson (Tinton Falls, NJ), it incorporates many ideas from the 333 "Version 1.1" distributed previously by him over the UNIX network 334 Usenet. Through his activity with Usenet, Rick Richardson has made a 335 very valuable contribution to the dissemination of the benchmark. I 336 also thank Chaim Benedelac (National Semiconductor), David Ditzel 337 (SUN), Earl Killian and John Mashey (MIPS), Alan Smith and Rafael 338 Saavedra-Barrera (UC at Berkeley) for their help with comments on 339 earlier versions of the benchmark. 340 341 342 [1] 343 Reinhold P. Weicker: Dhrystone: A Synthetic Systems Programming 344 Benchmark. 345 Communications of the ACM 27, 10 (Oct. 1984), 1013-1030 346 347 [2] 348 Rick Richardson: Dhrystone 1.1 Benchmark Summary (and Program Text) 349 Informal Distribution via "Usenet", Last Version Known to me: Sept. 350 21, 1987 351 352 [3] 353 Brian W. Kernighan and Dennis M. Ritchie: The C Programming 354 Language. 355 Prentice-Hall, Englewood Cliffs (NJ) 1978 356 357 358 359 360 361