Home | History | Annotate | Download | only in testing
      1 Correctness Testing
      2 ===================
      3 
      4 Skia correctness testing is primarily served by a tool named DM.
      5 This is a quickstart to building and running DM.
      6 
      7 ~~~
      8 $ python bin/sync-and-gyp
      9 $ ninja -C out/Debug dm
     10 $ out/Debug/dm -v -w dm_output
     11 ~~~
     12 
     13 When you run this, you may notice your CPU peg to 100% for a while, then taper
     14 off to 1 or 2 active cores as the run finishes.  This is intentional.  DM is
     15 very multithreaded, but some of the work, particularly GPU-backed work, is
     16 still forced to run on a single thread.  You can use `--threads N` to limit DM to
     17 N threads if you like.  This can sometimes be helpful on machines that have
     18 relatively more CPU available than RAM.
     19 
     20 As DM runs, you ought to see a giant spew of output that looks something like this.
     21 ~~~
     22 Skipping nonrendering: Don't understand 'nonrendering'.
     23 Skipping angle: Don't understand 'angle'.
     24 Skipping nvprmsaa4: Could not create a surface.
     25 492 srcs * 3 sinks + 382 tests == 1858 tasks
     26 
     27 (  25MB  1857) 1.36ms   8888 image mandrill_132x132_12x12.astc-5-subsets
     28 (  25MB  1856) 1.41ms   8888 image mandrill_132x132_6x6.astc-5-subsets
     29 (  25MB  1855) 1.35ms   8888 image mandrill_132x130_6x5.astc-5-subsets
     30 (  25MB  1854) 1.41ms   8888 image mandrill_132x130_12x10.astc-5-subsets
     31 (  25MB  1853) 151s    8888 image mandrill_130x132_10x6.astc-5-subsets
     32 (  25MB  1852) 154s    8888 image mandrill_130x130_5x5.astc-5-subsets
     33                                   ...
     34 ( 748MB     5) 9.43ms   unit test GLInterfaceValidation
     35 ( 748MB     4) 30.3ms   unit test HalfFloatTextureTest
     36 ( 748MB     3) 31.2ms   unit test FloatingPointTextureTest
     37 ( 748MB     2) 32.9ms   unit test DeferredCanvas_GPU
     38 ( 748MB     1) 49.4ms   unit test ClipCache
     39 ( 748MB     0) 37.2ms   unit test Blur
     40 ~~~
     41 Do not panic.
     42 
     43 As you become more familiar with DM, this spew may be a bit annoying. If you
     44 remove -v from the command line, DM will spin its progress on a single line
     45 rather than print a new line for each status update.
     46 
     47 Don't worry about the "Skipping something: Here's why." lines at startup.  DM
     48 supports many test configurations, which are not all appropriate for all
     49 machines.  These lines are a sort of FYI, mostly in case DM can't run some
     50 configuration you might be expecting it to run.
     51 
     52 The next line is an overview of the work DM is about to do.
     53 ~~~
     54 492 srcs * 3 sinks + 382 tests == 1858 tasks
     55 ~~~
     56 
     57 DM has found 382 unit tests (code linked in from tests/), and 492 other drawing
     58 sources.  These drawing sources may be GM integration tests (code linked in
     59 from gm/), image files (from `--images`, which defaults to "resources") or .skp
     60 files (from `--skps`, which defaults to "skps").  You can control the types of
     61 sources DM will use with `--src` (default, "tests gm image skp").
     62 
     63 DM has found 3 usable ways to draw those 492 sources.  This is controlled by
     64 `--config`, which today defaults to "565 8888 gpu nonrendering angle nvprmsaa4".
     65 DM has skipped nonrendering, angle, and nvprmssa4, leaving three usable configs:
     66 565, 8888, and gpu.  These three name different ways to draw using Skia:
     67 
     68   -    565:  draw using the software backend into a 16-bit RGB bitmap
     69   -    8888: draw using the software backend into a 32-bit RGBA bitmap
     70   -    gpu:  draw using the GPU backend (Ganesh) into a 32-bit RGBA bitmap
     71 
     72 Sometimes DM calls these configs, sometimes sinks.  Sorry.  There are many
     73 possible configs but generally we pay most attention to 8888 and gpu.
     74 
     75 DM always tries to draw all sources into all sinks, which is why we multiply
     76 492 by 3.  The unit tests don't really fit into this source-sink model, so they
     77 stand alone.  A couple thousand tasks is pretty normal.  Let's look at the
     78 status line for one of those tasks.
     79 ~~~
     80 (  25MB  1857) 1.36ms   8888 image mandrill_132x132_12x12.astc-5-subsets
     81 ~~~
     82 
     83 This status line tells us several things.
     84 
     85 First, it tells us that at the time we wrote the status line, the maximum
     86 amount of memory DM had ever used was 25MB.  Note this is a high water mark,
     87 not the current memory usage.  This is mostly useful for us to track on our
     88 buildbots, some of which run perilously close to the system memory limit.
     89 
     90 Next, the status line tells us that there are 1857 unfinished tasks, either
     91 currently running or waiting to run.  We generally run one task per hardware
     92 thread available, so on a typical laptop there are probably 4 or 8 running at
     93 once.  Sometimes the counts appear to show up out of order, particularly at DM
     94 startup; it's harmless, and doesn't affect the correctness of the run.
     95 
     96 Next, we see this task took 1.36 milliseconds to run.  Generally, the precision
     97 of this timer is around 1 microsecond.  The time is purely there for
     98 informational purposes, to make it easier for us to find slow tests.
     99 
    100 Finally we see the configuration and name of the test we ran.  We drew the test
    101 "mandrill_132x132_12x12.astc-5-subsets", which is an "image" source, into an
    102 "8888" sink.
    103 
    104 When DM finishes running, you should find a directory with file named dm.json,
    105 and some nested directories filled with lots of images.
    106 ~~~
    107 $ ls dm_output
    108 565     8888    dm.json gpu
    109 
    110 $ find dm_output -name '*.png'
    111 dm_output/565/gm/3x3bitmaprect.png
    112 dm_output/565/gm/aaclip.png
    113 dm_output/565/gm/aarectmodes.png
    114 dm_output/565/gm/alphagradients.png
    115 dm_output/565/gm/arcofzorro.png
    116 dm_output/565/gm/arithmode.png
    117 dm_output/565/gm/astcbitmap.png
    118 dm_output/565/gm/bezier_conic_effects.png
    119 dm_output/565/gm/bezier_cubic_effects.png
    120 dm_output/565/gm/bezier_quad_effects.png
    121                 ...
    122 ~~~
    123 
    124 The directories are nested first by sink type (`--config`), then by source type (`--src`).
    125 The image from the task we just looked at, "8888 image mandrill_132x132_12x12.astc-5-subsets",
    126 can be found at dm_output/8888/image/mandrill_132x132_12x12.astc-5-subsets.png.
    127 
    128 dm.json is used by our automated testing system, so you can ignore it if you
    129 like.  It contains a listing of each test run and a checksum of the image
    130 generated for that run.
    131 
    132 ### Detail <a name="digests"></a>
    133 Boring technical detail: The checksum is not a checksum of the
    134 .png file, but rather a checksum of the raw pixels used to create that .png.
    135 That means it is possible for two different configurations to produce
    136 the same exact .png, but have their checksums differ.
    137 
    138 Unit tests don't generally output anything but a status update when they pass.
    139 If a test fails, DM will print out its assertion failures, both at the time
    140 they happen and then again all together after everything is done running.
    141 These failures are also included in the dm.json file.
    142 
    143 DM has a simple facility to compare against the results of a previous run:
    144 ~~~
    145 $ python bin/sync-and-gyp
    146 $ ninja -C out/Debug dm
    147 $ out/Debug/dm -w good
    148 
    149   # do some work
    150 
    151 $ python bin/sync-and-gyp
    152 $ ninja -C out/Debug dm
    153 $ out/Debug/dm -r good -w bad
    154 ~~~
    155 When using `-r`, DM will display a failure for any test that didn't produce the
    156 same image as the `good` run.
    157 
    158 For anything fancier, I suggest using skdiff:
    159 ~~~
    160 $ python bin/sync-and-gyp
    161 $ ninja -C out/Debug dm
    162 $ out/Debug/dm -w good
    163 
    164   # do some work
    165 
    166 $ python bin/sync-and-gyp
    167 $ ninja -C out/Debug dm
    168 $ out/Debug/dm -w bad
    169 
    170 $ ninja -C out/Debug skdiff
    171 $ mkdir diff
    172 $ out/Debug/skdiff good bad diff
    173 
    174   # open diff/index.html in your web browser
    175 ~~~
    176 
    177 That's the basics of DM.  DM supports many other modes and flags.  Here are a
    178 few examples you might find handy.
    179 ~~~
    180 $ out/Debug/dm --help        # Print all flags, their defaults, and a brief explanation of each.
    181 $ out/Debug/dm --src tests   # Run only unit tests.
    182 $ out/Debug/dm --nocpu       # Test only GPU-backed work.
    183 $ out/Debug/dm --nogpu       # Test only CPU-backed work.
    184 $ out/Debug/dm --match blur  # Run only work with "blur" in its name.
    185 $ out/Debug/dm --dryRun      # Don't really do anything, just print out what we'd do.
    186 ~~~
    187