Cross Reference: /external/skia/site/dev/contrib/simd.md

Lines Matching full:alpha
60 `SkPx`, our latest approach (there have been alpha `Sk16b` and beta `Sk4px` predecessors) to 8- and 16-bit SIMD  tries to abstract over those idioms to again allow Skia developers to write one piece of clear graphics code that different backends can translate into their native intrinsics idiomatically.
66   3.  `SkPx::Alpha` represents the alpha channels of those same pixels.
68 `SkPx`, `Wide` and `Alpha` create a somewhat complicated algebra of operations entirely motivated by the graphical operations we need to perform.  Here are some examples:
80     SkPx.alpha() -> Alpha   // Extract alpha channels.
81     Alpha::LoadN(const uint8_t*)   -> Alpha  // Like SkPx loads, in 8-bit steps.
82     Alpha::Load(n, const uint8_t*) -> Alpha
93     SkPx * Alpha -> Wide    // 8 x 8 -> 16 bit
96     // A faster approximation of (SkPx * Alpha).div255().
97     SkPx.approxMulDiv255(Alpha) -> SkPx
99 We allow each `SkPx` backend to choose how it physically represents `SkPx`, `SkPx::Wide`, and `SkPx::Alpha` and to choose any power of two as its `SkPx::N` sweet spot.  Code working with SkPx typically runs a loop like this:
111 The portable code is of course the simplest place to start looking at implementation details: its `SkPx` is just `uint8_t[4]`, its `SkPx::Wide` `uint16_t[4]`, and its `SkPx::Alpha` just `uint8_t`.  Its preferred number of pixels to work with is `SkPx::N = 1`.  (Amusingly, GCC and Clang seem pretty good about autovectorizing this backend using 32-bit math, which typically ends up within ~2x of the best we can do ourselves.)
115 So `SkPx`'s SSE backend sets N to 4 pixels, stores them interlaced in an `__m128i`, representing `Wide` as two `__m128i` and `Alpha` as an `__m128i` with each pixel's alpha component replicated four times.  SkPx's NEON backend works with 8 planar pixels, loading them with `vld4_u8` into an `uint8x8x4_t` struct of 4 8-component `uint8x8_t` planes.  `Alpha` is just a single `uint8x8_t` 8-component plane, and `Wide` is NEON's natural choice, `uint16x8x4_t`.
117 (It's fun to speculate what an AVX2 backend might look like.  Do we make `SkPx` declare it wants to work with 8 pixels at a time, or leave it at 4?  Does `SkPx` become `__m256i`, or maybe only `SkPx::Wide` does?  What's the best way to represent `Alpha`?  And of course, what about AVX-512?)
119 Keeping `Alpha` as a single dense `uint8x8_t` plane allows the NEON backend to be much more efficient with operations involving `Alpha`.  We'd love to do this in SSE too, where we store `Alpha` somewhat inefficiently with each alpha component replicated 4 times, but SSE simply doesn't expose efficient ways to transpose interlaced pixels into planar pixels and vice versa.  We could write them ourselves, but only as rather complex compound operations that slow things down more than they help.
OpenGrok