Home | History | Annotate | Download | only in testing
      1 Correctness Testing
      2 ===================
      3 
      4 Skia correctness testing is primarily served by a tool named DM.
      5 This is a quickstart to building and running DM.
      6 
      7 <!--?prettify lang=sh?-->
      8 
      9     python tools/git-sync-deps
     10     bin/gn gen out/Debug
     11     ninja -C out/Debug dm
     12     out/Debug/dm -v -w dm_output
     13 
     14 When you run this, you may notice your CPU peg to 100% for a while, then taper
     15 off to 1 or 2 active cores as the run finishes.  This is intentional.  DM is
     16 very multithreaded, but some of the work, particularly GPU-backed work, is
     17 still forced to run on a single thread.  You can use `--threads N` to limit DM to
     18 N threads if you like.  This can sometimes be helpful on machines that have
     19 relatively more CPU available than RAM.
     20 
     21 As DM runs, you ought to see a giant spew of output that looks something like this.
     22 ~~~
     23 Skipping nonrendering: Don't understand 'nonrendering'.
     24 Skipping angle: Don't understand 'angle'.
     25 Skipping nvprmsaa4: Could not create a surface.
     26 492 srcs * 3 sinks + 382 tests == 1858 tasks
     27 
     28 (  25MB  1857) 1.36ms   8888 image mandrill_132x132_12x12.astc-5-subsets
     29 (  25MB  1856) 1.41ms   8888 image mandrill_132x132_6x6.astc-5-subsets
     30 (  25MB  1855) 1.35ms   8888 image mandrill_132x130_6x5.astc-5-subsets
     31 (  25MB  1854) 1.41ms   8888 image mandrill_132x130_12x10.astc-5-subsets
     32 (  25MB  1853) 151s    8888 image mandrill_130x132_10x6.astc-5-subsets
     33 (  25MB  1852) 154s    8888 image mandrill_130x130_5x5.astc-5-subsets
     34                                   ...
     35 ( 748MB     5) 9.43ms   unit test GLInterfaceValidation
     36 ( 748MB     4) 30.3ms   unit test HalfFloatTextureTest
     37 ( 748MB     3) 31.2ms   unit test FloatingPointTextureTest
     38 ( 748MB     2) 32.9ms   unit test DeferredCanvas_GPU
     39 ( 748MB     1) 49.4ms   unit test ClipCache
     40 ( 748MB     0) 37.2ms   unit test Blur
     41 ~~~
     42 Do not panic.
     43 
     44 As you become more familiar with DM, this spew may be a bit annoying. If you
     45 remove -v from the command line, DM will spin its progress on a single line
     46 rather than print a new line for each status update.
     47 
     48 Don't worry about the "Skipping something: Here's why." lines at startup.  DM
     49 supports many test configurations, which are not all appropriate for all
     50 machines.  These lines are a sort of FYI, mostly in case DM can't run some
     51 configuration you might be expecting it to run.
     52 
     53 Don't worry about the "skps: Couldn't read skps." messages either, you won't
     54 have those by default and can do without them. If you wish to test with them
     55 too, you can download them separately.
     56 
     57 The next line is an overview of the work DM is about to do.
     58 ~~~
     59 492 srcs * 3 sinks + 382 tests == 1858 tasks
     60 ~~~
     61 
     62 DM has found 382 unit tests (code linked in from tests/), and 492 other drawing
     63 sources.  These drawing sources may be GM integration tests (code linked in
     64 from gm/), image files (from `--images`, which defaults to "resources") or .skp
     65 files (from `--skps`, which defaults to "skps").  You can control the types of
     66 sources DM will use with `--src` (default, "tests gm image skp").
     67 
     68 DM has found 3 usable ways to draw those 492 sources.  This is controlled by
     69 `--config`. The defaults are operating system dependent. On Linux they are "8888 gl nonrendering".
     70 DM has skipped nonrendering leaving two usable configs:
     71 8888 and gl.  These two name different ways to draw using Skia:
     72 
     73   -    8888: draw using the software backend into a 32-bit RGBA bitmap
     74   -    gl:  draw using the OpenGL backend (Ganesh) into a 32-bit RGBA bitmap
     75 
     76 Sometimes DM calls these configs, sometimes sinks.  Sorry.  There are many
     77 possible configs but generally we pay most attention to 8888 and gl.
     78 
     79 DM always tries to draw all sources into all sinks, which is why we multiply
     80 492 by 3.  The unit tests don't really fit into this source-sink model, so they
     81 stand alone.  A couple thousand tasks is pretty normal.  Let's look at the
     82 status line for one of those tasks.
     83 ~~~
     84 (  25MB  1857) 1.36ms   8888 image mandrill_132x132_12x12.astc-5-subsets
     85    [1]   [2]   [3]      [4]
     86 ~~~
     87 
     88 This status line tells us several things.
     89 
     90   1. The maximum amount of memory DM had ever used was 25MB. Note this is a 
     91   high water mark, not the current memory usage.  This is mostly useful for us 
     92   to track on our buildbots, some of which run perilously close to the system 
     93   memory limit.
     94 
     95   2. The number of unfinished tasks, in this example there are 1857, either
     96   currently running or waiting to run.  We generally run one task per hardware
     97   thread available, so on a typical laptop there are probably 4 or 8 running at
     98   once.  Sometimes the counts appear to show up out of order, particularly at DM
     99   startup; it's harmless, and doesn't affect the correctness of the run.
    100 
    101   3. Next, we see this task took 1.36 milliseconds to run.  Generally, the 
    102   precision of this timer is around 1 microsecond.  The time is purely there for
    103   informational purposes, to make it easier for us to find slow tests.
    104 
    105   4. The configuration and name of the test we ran.  We drew the test
    106   "mandrill_132x132_12x12.astc-5-subsets", which is an "image" source, into an
    107   "8888" sink.
    108 
    109 When DM finishes running, you should find a directory with file named `dm.json`,
    110 and some nested directories filled with lots of images.
    111 ~~~
    112 $ ls dm_output
    113 8888    dm.json gl
    114 
    115 $ find dm_output -name '*.png'
    116 dm_output/8888/gm/3x3bitmaprect.png
    117 dm_output/8888/gm/aaclip.png
    118 dm_output/8888/gm/aarectmodes.png
    119 dm_output/8888/gm/alphagradients.png
    120 dm_output/8888/gm/arcofzorro.png
    121 dm_output/8888/gm/arithmode.png
    122 dm_output/8888/gm/astcbitmap.png
    123 dm_output/8888/gm/bezier_conic_effects.png
    124 dm_output/8888/gm/bezier_cubic_effects.png
    125 dm_output/8888/gm/bezier_quad_effects.png
    126                 ...
    127 ~~~
    128 
    129 The directories are nested first by sink type (`--config`), then by source type (`--src`).
    130 The image from the task we just looked at, "8888 image mandrill_132x132_12x12.astc-5-subsets",
    131 can be found at `dm_output/8888/image/mandrill_132x132_12x12.astc-5-subsets.png`.
    132 
    133 `dm.json` is used by our automated testing system, so you can ignore it if you
    134 like.  It contains a listing of each test run and a checksum of the image
    135 generated for that run.
    136 
    137 ### Detail <a name="digests"></a>
    138 Boring technical detail: The checksum is not a checksum of the
    139 .png file, but rather a checksum of the raw pixels used to create that .png.
    140 That means it is possible for two different configurations to produce
    141 the same exact .png, but have their checksums differ.
    142 
    143 Unit tests don't generally output anything but a status update when they pass.
    144 If a test fails, DM will print out its assertion failures, both at the time
    145 they happen and then again all together after everything is done running.
    146 These failures are also included in the `dm.json` file.
    147 
    148 DM has a simple facility to compare against the results of a previous run:
    149 
    150 <!--?prettify lang=sh?-->
    151 
    152     ninja -C out/Debug dm
    153     out/Debug/dm -w good
    154 
    155     # do some work
    156 
    157     ninja -C out/Debug dm
    158     out/Debug/dm -r good -w bad
    159 
    160 When using `-r`, DM will display a failure for any test that didn't produce the
    161 same image as the `good` run.
    162 
    163 For anything fancier, I suggest using skdiff:
    164 
    165 <!--?prettify lang=sh?-->
    166 
    167     ninja -C out/Debug dm
    168     out/Debug/dm -w good
    169 
    170     # do some work
    171 
    172     ninja -C out/Debug dm
    173     out/Debug/dm -w bad
    174 
    175     ninja -C out/Debug skdiff
    176     mkdir diff
    177     out/Debug/skdiff good bad diff
    178 
    179     # open diff/index.html in your web browser
    180 
    181 That's the basics of DM.  DM supports many other modes and flags.  Here are a
    182 few examples you might find handy.
    183 
    184 <!--?prettify lang=sh?-->
    185 
    186     out/Debug/dm --help        # Print all flags, their defaults, and a brief explanation of each.
    187     out/Debug/dm --src tests   # Run only unit tests.
    188     out/Debug/dm --nocpu       # Test only GPU-backed work.
    189     out/Debug/dm --nogpu       # Test only CPU-backed work.
    190     out/Debug/dm --match blur  # Run only work with "blur" in its name.
    191     out/Debug/dm --dryRun      # Don't really do anything, just print out what we'd do.
    192 
    193