Cross Reference: /external/skia/site/dev/contrib/simd.md

Lines Matching full:backend
37 As a convenience, `SkNf<N>` has two default implementations: `SkNf<1>` performs all these operations on a single float, and the generic `SkNf<N>` simply recurses onto two `SkNf<N/2>`.  This allows our different backends to inject specialiations where most natural: the portable backend does nothing, so all `SkNf<N>` recurse down to the default `SkNf<1>`;  the NEON backend specializes `SkNf<2>` with `float32x2_t` and 64-bit SIMD methods, and `SkNf<4>` with `float32x4_t` and 128-bit SIMD methods; the SSE backend specializes both `SkNf<4>` and `SkNf<2>` to use the full or lower half of an `__m128` vector, respectively.  A future AVX backend could simply drop in an `SkNf<8>` specialization.
64   1. `SkPx` itself represents between 1 and `SkPx::N` 8888 ARGB pixels, where `SkPx::N` is a backend-specific compile-time power of 2.
99 We allow each `SkPx` backend to choose how it physically represents `SkPx`, `SkPx::Wide`, and `SkPx::Alpha` and to choose any power of two as its `SkPx::N` sweet spot.  Code working with SkPx typically runs a loop like this:
111 The portable code is of course the simplest place to start looking at implementation details: its `SkPx` is just `uint8_t[4]`, its `SkPx::Wide` `uint16_t[4]`, and its `SkPx::Alpha` just `uint8_t`.  Its preferred number of pixels to work with is `SkPx::N = 1`.  (Amusingly, GCC and Clang seem pretty good about autovectorizing this backend using 32-bit math, which typically ends up within ~2x of the best we can do ourselves.)
115 So `SkPx`'s SSE backend sets N to 4 pixels, stores them interlaced in an `__m128i`, representing `Wide` as two `__m128i` and `Alpha` as an `__m128i` with each pixel's alpha component replicated four times.  SkPx's NEON backend works with 8 planar pixels, loading them with `vld4_u8` into an `uint8x8x4_t` struct of 4 8-component `uint8x8_t` planes.  `Alpha` is just a single `uint8x8_t` 8-component plane, and `Wide` is NEON's natural choice, `uint16x8x4_t`.
117 (It's fun to speculate what an AVX2 backend might look like.  Do we make `SkPx` declare it wants to work with 8 pixels at a time, or leave it at 4?  Does `SkPx` become `__m256i`, or maybe only `SkPx::Wide` does?  What's the best way to represent `Alpha`?  And of course, what about AVX-512?)
119 Keeping `Alpha` as a single dense `uint8x8_t` plane allows the NEON backend to be much more efficient with operations involving `Alpha`.  We'd love to do this in SSE too, where we store `Alpha` somewhat inefficiently with each alpha component replicated 4 times, but SSE simply doesn't expose efficient ways to transpose interlaced pixels into planar pixels and vice versa.  We could write them ourselves, but only as rather complex compound operations that slow things down more than they help.
121 These details will inevitably change over time.  The important takeaway here is, to really work at peak throughput in SIMD fixed point, you need to work with the idiom of the instruction set, and `SkPx` is a design that can present a consistent interface to abstract away backend details for you.
OpenGrok