Home | History | Annotate | Download | only in pybench
      1 ________________________________________________________________________
      2 
      3 PYBENCH - A Python Benchmark Suite
      4 ________________________________________________________________________
      5 
      6      Extendable suite of low-level benchmarks for measuring
      7           the performance of the Python implementation 
      8                  (interpreter, compiler or VM).
      9 
     10 pybench is a collection of tests that provides a standardized way to
     11 measure the performance of Python implementations. It takes a very
     12 close look at different aspects of Python programs and let's you
     13 decide which factors are more important to you than others, rather
     14 than wrapping everything up in one number, like the other performance
     15 tests do (e.g. pystone which is included in the Python Standard
     16 Library).
     17 
     18 pybench has been used in the past by several Python developers to
     19 track down performance bottlenecks or to demonstrate the impact of
     20 optimizations and new features in Python.
     21 
     22 The command line interface for pybench is the file pybench.py. Run
     23 this script with option '--help' to get a listing of the possible
     24 options. Without options, pybench will simply execute the benchmark
     25 and then print out a report to stdout.
     26 
     27 
     28 Micro-Manual
     29 ------------
     30 
     31 Run 'pybench.py -h' to see the help screen.  Run 'pybench.py' to run
     32 the benchmark suite using default settings and 'pybench.py -f <file>'
     33 to have it store the results in a file too.
     34 
     35 It is usually a good idea to run pybench.py multiple times to see
     36 whether the environment, timers and benchmark run-times are suitable
     37 for doing benchmark tests. 
     38 
     39 You can use the comparison feature of pybench.py ('pybench.py -c
     40 <file>') to check how well the system behaves in comparison to a
     41 reference run. 
     42 
     43 If the differences are well below 10% for each test, then you have a
     44 system that is good for doing benchmark testings.  Of you get random
     45 differences of more than 10% or significant differences between the
     46 values for minimum and average time, then you likely have some
     47 background processes running which cause the readings to become
     48 inconsistent. Examples include: web-browsers, email clients, RSS
     49 readers, music players, backup programs, etc.
     50 
     51 If you are only interested in a few tests of the whole suite, you can
     52 use the filtering option, e.g. 'pybench.py -t string' will only
     53 run/show the tests that have 'string' in their name.
     54 
     55 This is the current output of pybench.py --help:
     56 
     57 """
     58 ------------------------------------------------------------------------
     59 PYBENCH - a benchmark test suite for Python interpreters/compilers.
     60 ------------------------------------------------------------------------
     61 
     62 Synopsis:
     63  pybench.py [option] files...
     64 
     65 Options and default settings:
     66   -n arg           number of rounds (10)
     67   -f arg           save benchmark to file arg ()
     68   -c arg           compare benchmark with the one in file arg ()
     69   -s arg           show benchmark in file arg, then exit ()
     70   -w arg           set warp factor to arg (10)
     71   -t arg           run only tests with names matching arg ()
     72   -C arg           set the number of calibration runs to arg (20)
     73   -d               hide noise in comparisons (0)
     74   -v               verbose output (not recommended) (0)
     75   --with-gc        enable garbage collection (0)
     76   --with-syscheck  use default sys check interval (0)
     77   --timer arg      use given timer (time.time)
     78   -h               show this help text
     79   --help           show this help text
     80   --debug          enable debugging
     81   --copyright      show copyright
     82   --examples       show examples of usage
     83 
     84 Version:
     85  2.0
     86 
     87 The normal operation is to run the suite and display the
     88 results. Use -f to save them for later reuse or comparisons.
     89 
     90 Available timers:
     91 
     92    time.time
     93    time.clock
     94    systimes.processtime
     95 
     96 Examples:
     97 
     98 python2.1 pybench.py -f p21.pybench
     99 python2.5 pybench.py -f p25.pybench
    100 python pybench.py -s p25.pybench -c p21.pybench
    101 """
    102 
    103 License
    104 -------
    105 
    106 See LICENSE file.
    107 
    108 
    109 Sample output
    110 -------------
    111 
    112 """
    113 -------------------------------------------------------------------------------
    114 PYBENCH 2.0
    115 -------------------------------------------------------------------------------
    116 * using Python 2.4.2
    117 * disabled garbage collection
    118 * system check interval set to maximum: 2147483647
    119 * using timer: time.time
    120 
    121 Calibrating tests. Please wait...
    122 
    123 Running 10 round(s) of the suite at warp factor 10:
    124 
    125 * Round 1 done in 6.388 seconds.
    126 * Round 2 done in 6.485 seconds.
    127 * Round 3 done in 6.786 seconds.
    128 ...
    129 * Round 10 done in 6.546 seconds.
    130 
    131 -------------------------------------------------------------------------------
    132 Benchmark: 2006-06-12 12:09:25
    133 -------------------------------------------------------------------------------
    134 
    135     Rounds: 10
    136     Warp:   10
    137     Timer:  time.time
    138 
    139     Machine Details:
    140        Platform ID:  Linux-2.6.8-24.19-default-x86_64-with-SuSE-9.2-x86-64
    141        Processor:    x86_64
    142 
    143     Python:
    144        Executable:   /usr/local/bin/python
    145        Version:      2.4.2
    146        Compiler:     GCC 3.3.4 (pre 3.3.5 20040809)
    147        Bits:         64bit
    148        Build:        Oct  1 2005 15:24:35 (#1)
    149        Unicode:      UCS2
    150 
    151 
    152 Test                             minimum  average  operation  overhead
    153 -------------------------------------------------------------------------------
    154           BuiltinFunctionCalls:    126ms    145ms    0.28us    0.274ms
    155            BuiltinMethodLookup:    124ms    130ms    0.12us    0.316ms
    156                  CompareFloats:    109ms    110ms    0.09us    0.361ms
    157          CompareFloatsIntegers:    100ms    104ms    0.12us    0.271ms
    158                CompareIntegers:    137ms    138ms    0.08us    0.542ms
    159         CompareInternedStrings:    124ms    127ms    0.08us    1.367ms
    160                   CompareLongs:    100ms    104ms    0.10us    0.316ms
    161                 CompareStrings:    111ms    115ms    0.12us    0.929ms
    162                 CompareUnicode:    108ms    128ms    0.17us    0.693ms
    163                  ConcatStrings:    142ms    155ms    0.31us    0.562ms
    164                  ConcatUnicode:    119ms    127ms    0.42us    0.384ms
    165                CreateInstances:    123ms    128ms    1.14us    0.367ms
    166             CreateNewInstances:    121ms    126ms    1.49us    0.335ms
    167        CreateStringsWithConcat:    130ms    135ms    0.14us    0.916ms
    168        CreateUnicodeWithConcat:    130ms    135ms    0.34us    0.361ms
    169                   DictCreation:    108ms    109ms    0.27us    0.361ms
    170              DictWithFloatKeys:    149ms    153ms    0.17us    0.678ms
    171            DictWithIntegerKeys:    124ms    126ms    0.11us    0.915ms
    172             DictWithStringKeys:    114ms    117ms    0.10us    0.905ms
    173                       ForLoops:    110ms    111ms    4.46us    0.063ms
    174                     IfThenElse:    118ms    119ms    0.09us    0.685ms
    175                    ListSlicing:    116ms    120ms    8.59us    0.103ms
    176                 NestedForLoops:    125ms    137ms    0.09us    0.019ms
    177           NormalClassAttribute:    124ms    136ms    0.11us    0.457ms
    178        NormalInstanceAttribute:    110ms    117ms    0.10us    0.454ms
    179            PythonFunctionCalls:    107ms    113ms    0.34us    0.271ms
    180              PythonMethodCalls:    140ms    149ms    0.66us    0.141ms
    181                      Recursion:    156ms    166ms    3.32us    0.452ms
    182                   SecondImport:    112ms    118ms    1.18us    0.180ms
    183            SecondPackageImport:    118ms    127ms    1.27us    0.180ms
    184          SecondSubmoduleImport:    140ms    151ms    1.51us    0.180ms
    185        SimpleComplexArithmetic:    128ms    139ms    0.16us    0.361ms
    186         SimpleDictManipulation:    134ms    136ms    0.11us    0.452ms
    187          SimpleFloatArithmetic:    110ms    113ms    0.09us    0.571ms
    188       SimpleIntFloatArithmetic:    106ms    111ms    0.08us    0.548ms
    189        SimpleIntegerArithmetic:    106ms    109ms    0.08us    0.544ms
    190         SimpleListManipulation:    103ms    113ms    0.10us    0.587ms
    191           SimpleLongArithmetic:    112ms    118ms    0.18us    0.271ms
    192                     SmallLists:    105ms    116ms    0.17us    0.366ms
    193                    SmallTuples:    108ms    128ms    0.24us    0.406ms
    194          SpecialClassAttribute:    119ms    136ms    0.11us    0.453ms
    195       SpecialInstanceAttribute:    143ms    155ms    0.13us    0.454ms
    196                 StringMappings:    115ms    121ms    0.48us    0.405ms
    197               StringPredicates:    120ms    129ms    0.18us    2.064ms
    198                  StringSlicing:    111ms    127ms    0.23us    0.781ms
    199                      TryExcept:    125ms    126ms    0.06us    0.681ms
    200                 TryRaiseExcept:    133ms    137ms    2.14us    0.361ms
    201                   TupleSlicing:    117ms    120ms    0.46us    0.066ms
    202                UnicodeMappings:    156ms    160ms    4.44us    0.429ms
    203              UnicodePredicates:    117ms    121ms    0.22us    2.487ms
    204              UnicodeProperties:    115ms    153ms    0.38us    2.070ms
    205                 UnicodeSlicing:    126ms    129ms    0.26us    0.689ms
    206 -------------------------------------------------------------------------------
    207 Totals:                           6283ms   6673ms
    208 """
    209 ________________________________________________________________________
    210 
    211 Writing New Tests
    212 ________________________________________________________________________
    213 
    214 pybench tests are simple modules defining one or more pybench.Test
    215 subclasses.
    216 
    217 Writing a test essentially boils down to providing two methods:
    218 .test() which runs .rounds number of .operations test operations each
    219 and .calibrate() which does the same except that it doesn't actually
    220 execute the operations.
    221 
    222 
    223 Here's an example:
    224 ------------------
    225 
    226 from pybench import Test
    227 
    228 class IntegerCounting(Test):
    229 
    230     # Version number of the test as float (x.yy); this is important
    231     # for comparisons of benchmark runs - tests with unequal version
    232     # number will not get compared.
    233     version = 1.0
    234     
    235     # The number of abstract operations done in each round of the
    236     # test. An operation is the basic unit of what you want to
    237     # measure. The benchmark will output the amount of run-time per
    238     # operation. Note that in order to raise the measured timings
    239     # significantly above noise level, it is often required to repeat
    240     # sets of operations more than once per test round. The measured
    241     # overhead per test round should be less than 1 second.
    242     operations = 20
    243 
    244     # Number of rounds to execute per test run. This should be
    245     # adjusted to a figure that results in a test run-time of between
    246     # 1-2 seconds (at warp 1).
    247     rounds = 100000
    248 
    249     def test(self):
    250 
    251 	""" Run the test.
    252 
    253 	    The test needs to run self.rounds executing
    254 	    self.operations number of operations each.
    255 
    256         """
    257         # Init the test
    258         a = 1
    259 
    260         # Run test rounds
    261 	#
    262         # NOTE: Use xrange() for all test loops unless you want to face
    263 	# a 20MB process !
    264 	#
    265         for i in xrange(self.rounds):
    266 
    267             # Repeat the operations per round to raise the run-time
    268             # per operation significantly above the noise level of the
    269             # for-loop overhead. 
    270 
    271 	    # Execute 20 operations (a += 1):
    272             a += 1
    273             a += 1
    274             a += 1
    275             a += 1
    276             a += 1
    277             a += 1
    278             a += 1
    279             a += 1
    280             a += 1
    281             a += 1
    282             a += 1
    283             a += 1
    284             a += 1
    285             a += 1
    286             a += 1
    287             a += 1
    288             a += 1
    289             a += 1
    290             a += 1
    291             a += 1
    292 
    293     def calibrate(self):
    294 
    295 	""" Calibrate the test.
    296 
    297 	    This method should execute everything that is needed to
    298 	    setup and run the test - except for the actual operations
    299 	    that you intend to measure. pybench uses this method to
    300             measure the test implementation overhead.
    301 
    302         """
    303         # Init the test
    304         a = 1
    305 
    306         # Run test rounds (without actually doing any operation)
    307         for i in xrange(self.rounds):
    308 
    309 	    # Skip the actual execution of the operations, since we
    310 	    # only want to measure the test's administration overhead.
    311             pass
    312 
    313 Registering a new test module
    314 -----------------------------
    315 
    316 To register a test module with pybench, the classes need to be
    317 imported into the pybench.Setup module. pybench will then scan all the
    318 symbols defined in that module for subclasses of pybench.Test and
    319 automatically add them to the benchmark suite.
    320 
    321 
    322 Breaking Comparability
    323 ----------------------
    324 
    325 If a change is made to any individual test that means it is no
    326 longer strictly comparable with previous runs, the '.version' class
    327 variable should be updated. Therefafter, comparisons with previous
    328 versions of the test will list as "n/a" to reflect the change.
    329 
    330 
    331 Version History
    332 ---------------
    333 
    334   2.0: rewrote parts of pybench which resulted in more repeatable
    335        timings:
    336         - made timer a parameter
    337         - changed the platform default timer to use high-resolution
    338           timers rather than process timers (which have a much lower
    339           resolution)
    340         - added option to select timer
    341         - added process time timer (using systimes.py)
    342         - changed to use min() as timing estimator (average
    343           is still taken as well to provide an idea of the difference)
    344         - garbage collection is turned off per default
    345         - sys check interval is set to the highest possible value
    346         - calibration is now a separate step and done using
    347           a different strategy that allows measuring the test
    348           overhead more accurately
    349         - modified the tests to each give a run-time of between
    350           100-200ms using warp 10
    351         - changed default warp factor to 10 (from 20)
    352         - compared results with timeit.py and confirmed measurements
    353         - bumped all test versions to 2.0
    354         - updated platform.py to the latest version
    355         - changed the output format a bit to make it look
    356           nicer
    357         - refactored the APIs somewhat
    358   1.3+: Steve Holden added the NewInstances test and the filtering 
    359        option during the NeedForSpeed sprint; this also triggered a long 
    360        discussion on how to improve benchmark timing and finally
    361        resulted in the release of 2.0
    362   1.3: initial checkin into the Python SVN repository
    363 
    364 
    365 Have fun,
    366 --
    367 Marc-Andre Lemburg
    368 mal (a] lemburg.com
    369