Home | History | Annotate | Download | only in source
      1 TGSI
      2 ====
      3 
      4 TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language
      5 for describing shaders. Since Gallium is inherently shaderful, shaders are
      6 an important part of the API. TGSI is the only intermediate representation
      7 used by all drivers.
      8 
      9 Basics
     10 ------
     11 
     12 All TGSI instructions, known as *opcodes*, operate on arbitrary-precision
     13 floating-point four-component vectors. An opcode may have up to one
     14 destination register, known as *dst*, and between zero and three source
     15 registers, called *src0* through *src2*, or simply *src* if there is only
     16 one.
     17 
     18 Some instructions, like :opcode:`I2F`, permit re-interpretation of vector
     19 components as integers. Other instructions permit using registers as
     20 two-component vectors with double precision; see :ref:`Double Opcodes`.
     21 
     22 When an instruction has a scalar result, the result is usually copied into
     23 each of the components of *dst*. When this happens, the result is said to be
     24 *replicated* to *dst*. :opcode:`RCP` is one such instruction.
     25 
     26 Instruction Set
     27 ---------------
     28 
     29 Core ISA
     30 ^^^^^^^^^^^^^^^^^^^^^^^^^
     31 
     32 These opcodes are guaranteed to be available regardless of the driver being
     33 used.
     34 
     35 .. opcode:: ARL - Address Register Load
     36 
     37 .. math::
     38 
     39   dst.x = \lfloor src.x\rfloor
     40 
     41   dst.y = \lfloor src.y\rfloor
     42 
     43   dst.z = \lfloor src.z\rfloor
     44 
     45   dst.w = \lfloor src.w\rfloor
     46 
     47 
     48 .. opcode:: MOV - Move
     49 
     50 .. math::
     51 
     52   dst.x = src.x
     53 
     54   dst.y = src.y
     55 
     56   dst.z = src.z
     57 
     58   dst.w = src.w
     59 
     60 
     61 .. opcode:: LIT - Light Coefficients
     62 
     63 .. math::
     64 
     65   dst.x = 1
     66 
     67   dst.y = max(src.x, 0)
     68 
     69   dst.z = (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0
     70 
     71   dst.w = 1
     72 
     73 
     74 .. opcode:: RCP - Reciprocal
     75 
     76 This instruction replicates its result.
     77 
     78 .. math::
     79 
     80   dst = \frac{1}{src.x}
     81 
     82 
     83 .. opcode:: RSQ - Reciprocal Square Root
     84 
     85 This instruction replicates its result.
     86 
     87 .. math::
     88 
     89   dst = \frac{1}{\sqrt{|src.x|}}
     90 
     91 
     92 .. opcode:: EXP - Approximate Exponential Base 2
     93 
     94 .. math::
     95 
     96   dst.x = 2^{\lfloor src.x\rfloor}
     97 
     98   dst.y = src.x - \lfloor src.x\rfloor
     99 
    100   dst.z = 2^{src.x}
    101 
    102   dst.w = 1
    103 
    104 
    105 .. opcode:: LOG - Approximate Logarithm Base 2
    106 
    107 .. math::
    108 
    109   dst.x = \lfloor\log_2{|src.x|}\rfloor
    110 
    111   dst.y = \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}}
    112 
    113   dst.z = \log_2{|src.x|}
    114 
    115   dst.w = 1
    116 
    117 
    118 .. opcode:: MUL - Multiply
    119 
    120 .. math::
    121 
    122   dst.x = src0.x \times src1.x
    123 
    124   dst.y = src0.y \times src1.y
    125 
    126   dst.z = src0.z \times src1.z
    127 
    128   dst.w = src0.w \times src1.w
    129 
    130 
    131 .. opcode:: ADD - Add
    132 
    133 .. math::
    134 
    135   dst.x = src0.x + src1.x
    136 
    137   dst.y = src0.y + src1.y
    138 
    139   dst.z = src0.z + src1.z
    140 
    141   dst.w = src0.w + src1.w
    142 
    143 
    144 .. opcode:: DP3 - 3-component Dot Product
    145 
    146 This instruction replicates its result.
    147 
    148 .. math::
    149 
    150   dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
    151 
    152 
    153 .. opcode:: DP4 - 4-component Dot Product
    154 
    155 This instruction replicates its result.
    156 
    157 .. math::
    158 
    159   dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
    160 
    161 
    162 .. opcode:: DST - Distance Vector
    163 
    164 .. math::
    165 
    166   dst.x = 1
    167 
    168   dst.y = src0.y \times src1.y
    169 
    170   dst.z = src0.z
    171 
    172   dst.w = src1.w
    173 
    174 
    175 .. opcode:: MIN - Minimum
    176 
    177 .. math::
    178 
    179   dst.x = min(src0.x, src1.x)
    180 
    181   dst.y = min(src0.y, src1.y)
    182 
    183   dst.z = min(src0.z, src1.z)
    184 
    185   dst.w = min(src0.w, src1.w)
    186 
    187 
    188 .. opcode:: MAX - Maximum
    189 
    190 .. math::
    191 
    192   dst.x = max(src0.x, src1.x)
    193 
    194   dst.y = max(src0.y, src1.y)
    195 
    196   dst.z = max(src0.z, src1.z)
    197 
    198   dst.w = max(src0.w, src1.w)
    199 
    200 
    201 .. opcode:: SLT - Set On Less Than
    202 
    203 .. math::
    204 
    205   dst.x = (src0.x < src1.x) ? 1 : 0
    206 
    207   dst.y = (src0.y < src1.y) ? 1 : 0
    208 
    209   dst.z = (src0.z < src1.z) ? 1 : 0
    210 
    211   dst.w = (src0.w < src1.w) ? 1 : 0
    212 
    213 
    214 .. opcode:: SGE - Set On Greater Equal Than
    215 
    216 .. math::
    217 
    218   dst.x = (src0.x >= src1.x) ? 1 : 0
    219 
    220   dst.y = (src0.y >= src1.y) ? 1 : 0
    221 
    222   dst.z = (src0.z >= src1.z) ? 1 : 0
    223 
    224   dst.w = (src0.w >= src1.w) ? 1 : 0
    225 
    226 
    227 .. opcode:: MAD - Multiply And Add
    228 
    229 .. math::
    230 
    231   dst.x = src0.x \times src1.x + src2.x
    232 
    233   dst.y = src0.y \times src1.y + src2.y
    234 
    235   dst.z = src0.z \times src1.z + src2.z
    236 
    237   dst.w = src0.w \times src1.w + src2.w
    238 
    239 
    240 .. opcode:: SUB - Subtract
    241 
    242 .. math::
    243 
    244   dst.x = src0.x - src1.x
    245 
    246   dst.y = src0.y - src1.y
    247 
    248   dst.z = src0.z - src1.z
    249 
    250   dst.w = src0.w - src1.w
    251 
    252 
    253 .. opcode:: LRP - Linear Interpolate
    254 
    255 .. math::
    256 
    257   dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x
    258 
    259   dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y
    260 
    261   dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z
    262 
    263   dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
    264 
    265 
    266 .. opcode:: CND - Condition
    267 
    268 .. math::
    269 
    270   dst.x = (src2.x > 0.5) ? src0.x : src1.x
    271 
    272   dst.y = (src2.y > 0.5) ? src0.y : src1.y
    273 
    274   dst.z = (src2.z > 0.5) ? src0.z : src1.z
    275 
    276   dst.w = (src2.w > 0.5) ? src0.w : src1.w
    277 
    278 
    279 .. opcode:: DP2A - 2-component Dot Product And Add
    280 
    281 .. math::
    282 
    283   dst.x = src0.x \times src1.x + src0.y \times src1.y + src2.x
    284 
    285   dst.y = src0.x \times src1.x + src0.y \times src1.y + src2.x
    286 
    287   dst.z = src0.x \times src1.x + src0.y \times src1.y + src2.x
    288 
    289   dst.w = src0.x \times src1.x + src0.y \times src1.y + src2.x
    290 
    291 
    292 .. opcode:: FRC - Fraction
    293 
    294 .. math::
    295 
    296   dst.x = src.x - \lfloor src.x\rfloor
    297 
    298   dst.y = src.y - \lfloor src.y\rfloor
    299 
    300   dst.z = src.z - \lfloor src.z\rfloor
    301 
    302   dst.w = src.w - \lfloor src.w\rfloor
    303 
    304 
    305 .. opcode:: CLAMP - Clamp
    306 
    307 .. math::
    308 
    309   dst.x = clamp(src0.x, src1.x, src2.x)
    310 
    311   dst.y = clamp(src0.y, src1.y, src2.y)
    312 
    313   dst.z = clamp(src0.z, src1.z, src2.z)
    314 
    315   dst.w = clamp(src0.w, src1.w, src2.w)
    316 
    317 
    318 .. opcode:: FLR - Floor
    319 
    320 This is identical to :opcode:`ARL`.
    321 
    322 .. math::
    323 
    324   dst.x = \lfloor src.x\rfloor
    325 
    326   dst.y = \lfloor src.y\rfloor
    327 
    328   dst.z = \lfloor src.z\rfloor
    329 
    330   dst.w = \lfloor src.w\rfloor
    331 
    332 
    333 .. opcode:: ROUND - Round
    334 
    335 .. math::
    336 
    337   dst.x = round(src.x)
    338 
    339   dst.y = round(src.y)
    340 
    341   dst.z = round(src.z)
    342 
    343   dst.w = round(src.w)
    344 
    345 
    346 .. opcode:: EX2 - Exponential Base 2
    347 
    348 This instruction replicates its result.
    349 
    350 .. math::
    351 
    352   dst = 2^{src.x}
    353 
    354 
    355 .. opcode:: LG2 - Logarithm Base 2
    356 
    357 This instruction replicates its result.
    358 
    359 .. math::
    360 
    361   dst = \log_2{src.x}
    362 
    363 
    364 .. opcode:: POW - Power
    365 
    366 This instruction replicates its result.
    367 
    368 .. math::
    369 
    370   dst = src0.x^{src1.x}
    371 
    372 .. opcode:: XPD - Cross Product
    373 
    374 .. math::
    375 
    376   dst.x = src0.y \times src1.z - src1.y \times src0.z
    377 
    378   dst.y = src0.z \times src1.x - src1.z \times src0.x
    379 
    380   dst.z = src0.x \times src1.y - src1.x \times src0.y
    381 
    382   dst.w = 1
    383 
    384 
    385 .. opcode:: ABS - Absolute
    386 
    387 .. math::
    388 
    389   dst.x = |src.x|
    390 
    391   dst.y = |src.y|
    392 
    393   dst.z = |src.z|
    394 
    395   dst.w = |src.w|
    396 
    397 
    398 .. opcode:: RCC - Reciprocal Clamped
    399 
    400 This instruction replicates its result.
    401 
    402 XXX cleanup on aisle three
    403 
    404 .. math::
    405 
    406   dst = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020)
    407 
    408 
    409 .. opcode:: DPH - Homogeneous Dot Product
    410 
    411 This instruction replicates its result.
    412 
    413 .. math::
    414 
    415   dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w
    416 
    417 
    418 .. opcode:: COS - Cosine
    419 
    420 This instruction replicates its result.
    421 
    422 .. math::
    423 
    424   dst = \cos{src.x}
    425 
    426 
    427 .. opcode:: DDX - Derivative Relative To X
    428 
    429 .. math::
    430 
    431   dst.x = partialx(src.x)
    432 
    433   dst.y = partialx(src.y)
    434 
    435   dst.z = partialx(src.z)
    436 
    437   dst.w = partialx(src.w)
    438 
    439 
    440 .. opcode:: DDY - Derivative Relative To Y
    441 
    442 .. math::
    443 
    444   dst.x = partialy(src.x)
    445 
    446   dst.y = partialy(src.y)
    447 
    448   dst.z = partialy(src.z)
    449 
    450   dst.w = partialy(src.w)
    451 
    452 
    453 .. opcode:: KILP - Predicated Discard
    454 
    455   discard
    456 
    457 
    458 .. opcode:: PK2H - Pack Two 16-bit Floats
    459 
    460   TBD
    461 
    462 
    463 .. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars
    464 
    465   TBD
    466 
    467 
    468 .. opcode:: PK4B - Pack Four Signed 8-bit Scalars
    469 
    470   TBD
    471 
    472 
    473 .. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars
    474 
    475   TBD
    476 
    477 
    478 .. opcode:: RFL - Reflection Vector
    479 
    480 .. math::
    481 
    482   dst.x = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.x - src1.x
    483 
    484   dst.y = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.y - src1.y
    485 
    486   dst.z = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.z - src1.z
    487 
    488   dst.w = 1
    489 
    490 .. note::
    491 
    492    Considered for removal.
    493 
    494 
    495 .. opcode:: SEQ - Set On Equal
    496 
    497 .. math::
    498 
    499   dst.x = (src0.x == src1.x) ? 1 : 0
    500 
    501   dst.y = (src0.y == src1.y) ? 1 : 0
    502 
    503   dst.z = (src0.z == src1.z) ? 1 : 0
    504 
    505   dst.w = (src0.w == src1.w) ? 1 : 0
    506 
    507 
    508 .. opcode:: SFL - Set On False
    509 
    510 This instruction replicates its result.
    511 
    512 .. math::
    513 
    514   dst = 0
    515 
    516 .. note::
    517 
    518    Considered for removal.
    519 
    520 
    521 .. opcode:: SGT - Set On Greater Than
    522 
    523 .. math::
    524 
    525   dst.x = (src0.x > src1.x) ? 1 : 0
    526 
    527   dst.y = (src0.y > src1.y) ? 1 : 0
    528 
    529   dst.z = (src0.z > src1.z) ? 1 : 0
    530 
    531   dst.w = (src0.w > src1.w) ? 1 : 0
    532 
    533 
    534 .. opcode:: SIN - Sine
    535 
    536 This instruction replicates its result.
    537 
    538 .. math::
    539 
    540   dst = \sin{src.x}
    541 
    542 
    543 .. opcode:: SLE - Set On Less Equal Than
    544 
    545 .. math::
    546 
    547   dst.x = (src0.x <= src1.x) ? 1 : 0
    548 
    549   dst.y = (src0.y <= src1.y) ? 1 : 0
    550 
    551   dst.z = (src0.z <= src1.z) ? 1 : 0
    552 
    553   dst.w = (src0.w <= src1.w) ? 1 : 0
    554 
    555 
    556 .. opcode:: SNE - Set On Not Equal
    557 
    558 .. math::
    559 
    560   dst.x = (src0.x != src1.x) ? 1 : 0
    561 
    562   dst.y = (src0.y != src1.y) ? 1 : 0
    563 
    564   dst.z = (src0.z != src1.z) ? 1 : 0
    565 
    566   dst.w = (src0.w != src1.w) ? 1 : 0
    567 
    568 
    569 .. opcode:: STR - Set On True
    570 
    571 This instruction replicates its result.
    572 
    573 .. math::
    574 
    575   dst = 1
    576 
    577 
    578 .. opcode:: TEX - Texture Lookup
    579 
    580 .. math::
    581 
    582   coord = src0
    583 
    584   bias = 0.0
    585 
    586   dst = texture_sample(unit, coord, bias)
    587 
    588   for array textures src0.y contains the slice for 1D,
    589   and src0.z contain the slice for 2D.
    590   for shadow textures with no arrays, src0.z contains
    591   the reference value.
    592   for shadow textures with arrays, src0.z contains
    593   the reference value for 1D arrays, and src0.w contains
    594   the reference value for 2D arrays.
    595   There is no way to pass a bias in the .w value for
    596   shadow arrays, and GLSL doesn't allow this.
    597   GLSL does allow cube shadows maps to take a bias value,
    598   and we have to determine how this will look in TGSI.
    599 
    600 .. opcode:: TXD - Texture Lookup with Derivatives
    601 
    602 .. math::
    603 
    604   coord = src0
    605 
    606   ddx = src1
    607 
    608   ddy = src2
    609 
    610   bias = 0.0
    611 
    612   dst = texture_sample_deriv(unit, coord, bias, ddx, ddy)
    613 
    614 
    615 .. opcode:: TXP - Projective Texture Lookup
    616 
    617 .. math::
    618 
    619   coord.x = src0.x / src.w
    620 
    621   coord.y = src0.y / src.w
    622 
    623   coord.z = src0.z / src.w
    624 
    625   coord.w = src0.w
    626 
    627   bias = 0.0
    628 
    629   dst = texture_sample(unit, coord, bias)
    630 
    631 
    632 .. opcode:: UP2H - Unpack Two 16-Bit Floats
    633 
    634   TBD
    635 
    636 .. note::
    637 
    638    Considered for removal.
    639 
    640 .. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars
    641 
    642   TBD
    643 
    644 .. note::
    645 
    646    Considered for removal.
    647 
    648 .. opcode:: UP4B - Unpack Four Signed 8-Bit Values
    649 
    650   TBD
    651 
    652 .. note::
    653 
    654    Considered for removal.
    655 
    656 .. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars
    657 
    658   TBD
    659 
    660 .. note::
    661 
    662    Considered for removal.
    663 
    664 .. opcode:: X2D - 2D Coordinate Transformation
    665 
    666 .. math::
    667 
    668   dst.x = src0.x + src1.x \times src2.x + src1.y \times src2.y
    669 
    670   dst.y = src0.y + src1.x \times src2.z + src1.y \times src2.w
    671 
    672   dst.z = src0.x + src1.x \times src2.x + src1.y \times src2.y
    673 
    674   dst.w = src0.y + src1.x \times src2.z + src1.y \times src2.w
    675 
    676 .. note::
    677 
    678    Considered for removal.
    679 
    680 
    681 .. opcode:: ARA - Address Register Add
    682 
    683   TBD
    684 
    685 .. note::
    686 
    687    Considered for removal.
    688 
    689 .. opcode:: ARR - Address Register Load With Round
    690 
    691 .. math::
    692 
    693   dst.x = round(src.x)
    694 
    695   dst.y = round(src.y)
    696 
    697   dst.z = round(src.z)
    698 
    699   dst.w = round(src.w)
    700 
    701 
    702 .. opcode:: BRA - Branch
    703 
    704   pc = target
    705 
    706 .. note::
    707 
    708    Considered for removal.
    709 
    710 .. opcode:: CAL - Subroutine Call
    711 
    712   push(pc)
    713   pc = target
    714 
    715 
    716 .. opcode:: RET - Subroutine Call Return
    717 
    718   pc = pop()
    719 
    720 
    721 .. opcode:: SSG - Set Sign
    722 
    723 .. math::
    724 
    725   dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0
    726 
    727   dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0
    728 
    729   dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0
    730 
    731   dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0
    732 
    733 
    734 .. opcode:: CMP - Compare
    735 
    736 .. math::
    737 
    738   dst.x = (src0.x < 0) ? src1.x : src2.x
    739 
    740   dst.y = (src0.y < 0) ? src1.y : src2.y
    741 
    742   dst.z = (src0.z < 0) ? src1.z : src2.z
    743 
    744   dst.w = (src0.w < 0) ? src1.w : src2.w
    745 
    746 
    747 .. opcode:: KIL - Conditional Discard
    748 
    749 .. math::
    750 
    751   if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0)
    752     discard
    753   endif
    754 
    755 
    756 .. opcode:: SCS - Sine Cosine
    757 
    758 .. math::
    759 
    760   dst.x = \cos{src.x}
    761 
    762   dst.y = \sin{src.x}
    763 
    764   dst.z = 0
    765 
    766   dst.w = 1
    767 
    768 
    769 .. opcode:: TXB - Texture Lookup With Bias
    770 
    771 .. math::
    772 
    773   coord.x = src.x
    774 
    775   coord.y = src.y
    776 
    777   coord.z = src.z
    778 
    779   coord.w = 1.0
    780 
    781   bias = src.z
    782 
    783   dst = texture_sample(unit, coord, bias)
    784 
    785 
    786 .. opcode:: NRM - 3-component Vector Normalise
    787 
    788 .. math::
    789 
    790   dst.x = src.x / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
    791 
    792   dst.y = src.y / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
    793 
    794   dst.z = src.z / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
    795 
    796   dst.w = 1
    797 
    798 
    799 .. opcode:: DIV - Divide
    800 
    801 .. math::
    802 
    803   dst.x = \frac{src0.x}{src1.x}
    804 
    805   dst.y = \frac{src0.y}{src1.y}
    806 
    807   dst.z = \frac{src0.z}{src1.z}
    808 
    809   dst.w = \frac{src0.w}{src1.w}
    810 
    811 
    812 .. opcode:: DP2 - 2-component Dot Product
    813 
    814 This instruction replicates its result.
    815 
    816 .. math::
    817 
    818   dst = src0.x \times src1.x + src0.y \times src1.y
    819 
    820 
    821 .. opcode:: TXL - Texture Lookup With explicit LOD
    822 
    823 .. math::
    824 
    825   coord.x = src0.x
    826 
    827   coord.y = src0.y
    828 
    829   coord.z = src0.z
    830 
    831   coord.w = 1.0
    832 
    833   lod = src0.w
    834 
    835   dst = texture_sample(unit, coord, lod)
    836 
    837 
    838 .. opcode:: BRK - Break
    839 
    840   TBD
    841 
    842 
    843 .. opcode:: IF - If
    844 
    845   TBD
    846 
    847 
    848 .. opcode:: ELSE - Else
    849 
    850   TBD
    851 
    852 
    853 .. opcode:: ENDIF - End If
    854 
    855   TBD
    856 
    857 
    858 .. opcode:: PUSHA - Push Address Register On Stack
    859 
    860   push(src.x)
    861   push(src.y)
    862   push(src.z)
    863   push(src.w)
    864 
    865 .. note::
    866 
    867    Considered for cleanup.
    868 
    869 .. note::
    870 
    871    Considered for removal.
    872 
    873 .. opcode:: POPA - Pop Address Register From Stack
    874 
    875   dst.w = pop()
    876   dst.z = pop()
    877   dst.y = pop()
    878   dst.x = pop()
    879 
    880 .. note::
    881 
    882    Considered for cleanup.
    883 
    884 .. note::
    885 
    886    Considered for removal.
    887 
    888 
    889 Compute ISA
    890 ^^^^^^^^^^^^^^^^^^^^^^^^
    891 
    892 These opcodes are primarily provided for special-use computational shaders.
    893 Support for these opcodes indicated by a special pipe capability bit (TBD).
    894 
    895 XXX so let's discuss it, yeah?
    896 
    897 .. opcode:: CEIL - Ceiling
    898 
    899 .. math::
    900 
    901   dst.x = \lceil src.x\rceil
    902 
    903   dst.y = \lceil src.y\rceil
    904 
    905   dst.z = \lceil src.z\rceil
    906 
    907   dst.w = \lceil src.w\rceil
    908 
    909 
    910 .. opcode:: I2F - Integer To Float
    911 
    912 .. math::
    913 
    914   dst.x = (float) src.x
    915 
    916   dst.y = (float) src.y
    917 
    918   dst.z = (float) src.z
    919 
    920   dst.w = (float) src.w
    921 
    922 
    923 .. opcode:: NOT - Bitwise Not
    924 
    925 .. math::
    926 
    927   dst.x = ~src.x
    928 
    929   dst.y = ~src.y
    930 
    931   dst.z = ~src.z
    932 
    933   dst.w = ~src.w
    934 
    935 
    936 .. opcode:: TRUNC - Truncate
    937 
    938 .. math::
    939 
    940   dst.x = trunc(src.x)
    941 
    942   dst.y = trunc(src.y)
    943 
    944   dst.z = trunc(src.z)
    945 
    946   dst.w = trunc(src.w)
    947 
    948 
    949 .. opcode:: SHL - Shift Left
    950 
    951 .. math::
    952 
    953   dst.x = src0.x << src1.x
    954 
    955   dst.y = src0.y << src1.x
    956 
    957   dst.z = src0.z << src1.x
    958 
    959   dst.w = src0.w << src1.x
    960 
    961 
    962 .. opcode:: SHR - Shift Right
    963 
    964 .. math::
    965 
    966   dst.x = src0.x >> src1.x
    967 
    968   dst.y = src0.y >> src1.x
    969 
    970   dst.z = src0.z >> src1.x
    971 
    972   dst.w = src0.w >> src1.x
    973 
    974 
    975 .. opcode:: AND - Bitwise And
    976 
    977 .. math::
    978 
    979   dst.x = src0.x & src1.x
    980 
    981   dst.y = src0.y & src1.y
    982 
    983   dst.z = src0.z & src1.z
    984 
    985   dst.w = src0.w & src1.w
    986 
    987 
    988 .. opcode:: OR - Bitwise Or
    989 
    990 .. math::
    991 
    992   dst.x = src0.x | src1.x
    993 
    994   dst.y = src0.y | src1.y
    995 
    996   dst.z = src0.z | src1.z
    997 
    998   dst.w = src0.w | src1.w
    999 
   1000 
   1001 .. opcode:: MOD - Modulus
   1002 
   1003 .. math::
   1004 
   1005   dst.x = src0.x \bmod src1.x
   1006 
   1007   dst.y = src0.y \bmod src1.y
   1008 
   1009   dst.z = src0.z \bmod src1.z
   1010 
   1011   dst.w = src0.w \bmod src1.w
   1012 
   1013 
   1014 .. opcode:: XOR - Bitwise Xor
   1015 
   1016 .. math::
   1017 
   1018   dst.x = src0.x \oplus src1.x
   1019 
   1020   dst.y = src0.y \oplus src1.y
   1021 
   1022   dst.z = src0.z \oplus src1.z
   1023 
   1024   dst.w = src0.w \oplus src1.w
   1025 
   1026 
   1027 .. opcode:: UCMP - Integer Conditional Move
   1028 
   1029 .. math::
   1030 
   1031   dst.x = src0.x ? src1.x : src2.x
   1032 
   1033   dst.y = src0.y ? src1.y : src2.y
   1034 
   1035   dst.z = src0.z ? src1.z : src2.z
   1036 
   1037   dst.w = src0.w ? src1.w : src2.w
   1038 
   1039 
   1040 .. opcode:: UARL - Integer Address Register Load
   1041 
   1042   Moves the contents of the source register, assumed to be an integer, into the
   1043   destination register, which is assumed to be an address (ADDR) register.
   1044 
   1045 
   1046 .. opcode:: IABS - Integer Absolute Value
   1047 
   1048 .. math::
   1049 
   1050   dst.x = |src.x|
   1051 
   1052   dst.y = |src.y|
   1053 
   1054   dst.z = |src.z|
   1055 
   1056   dst.w = |src.w|
   1057 
   1058 
   1059 .. opcode:: SAD - Sum Of Absolute Differences
   1060 
   1061 .. math::
   1062 
   1063   dst.x = |src0.x - src1.x| + src2.x
   1064 
   1065   dst.y = |src0.y - src1.y| + src2.y
   1066 
   1067   dst.z = |src0.z - src1.z| + src2.z
   1068 
   1069   dst.w = |src0.w - src1.w| + src2.w
   1070 
   1071 
   1072 .. opcode:: TXF - Texel Fetch (as per NV_gpu_shader4), extract a single texel
   1073                   from a specified texture image. The source sampler may
   1074 		  not be a CUBE or SHADOW.
   1075                   src 0 is a four-component signed integer vector used to
   1076 		  identify the single texel accessed. 3 components + level.
   1077 		  src 1 is a 3 component constant signed integer vector,
   1078 		  with each component only have a range of
   1079 		  -8..+8 (hw only seems to deal with this range, interface
   1080 		  allows for up to unsigned int).
   1081 		  TXF(uint_vec coord, int_vec offset).
   1082 
   1083 
   1084 .. opcode:: TXQ - Texture Size Query (as per NV_gpu_program4)
   1085                   retrieve the dimensions of the texture
   1086                   depending on the target. For 1D (width), 2D/RECT/CUBE
   1087 		  (width, height), 3D (width, height, depth),
   1088 		  1D array (width, layers), 2D array (width, height, layers)
   1089 
   1090 .. math::
   1091 
   1092   lod = src0
   1093 
   1094   dst.x = texture_width(unit, lod)
   1095 
   1096   dst.y = texture_height(unit, lod)
   1097 
   1098   dst.z = texture_depth(unit, lod)
   1099 
   1100 
   1101 .. opcode:: CONT - Continue
   1102 
   1103   TBD
   1104 
   1105 .. note::
   1106 
   1107    Support for CONT is determined by a special capability bit,
   1108    ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
   1109 
   1110 
   1111 Geometry ISA
   1112 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   1113 
   1114 These opcodes are only supported in geometry shaders; they have no meaning
   1115 in any other type of shader.
   1116 
   1117 .. opcode:: EMIT - Emit
   1118 
   1119   TBD
   1120 
   1121 
   1122 .. opcode:: ENDPRIM - End Primitive
   1123 
   1124   TBD
   1125 
   1126 
   1127 GLSL ISA
   1128 ^^^^^^^^^^
   1129 
   1130 These opcodes are part of :term:`GLSL`'s opcode set. Support for these
   1131 opcodes is determined by a special capability bit, ``GLSL``.
   1132 
   1133 .. opcode:: BGNLOOP - Begin a Loop
   1134 
   1135   TBD
   1136 
   1137 
   1138 .. opcode:: BGNSUB - Begin Subroutine
   1139 
   1140   TBD
   1141 
   1142 
   1143 .. opcode:: ENDLOOP - End a Loop
   1144 
   1145   TBD
   1146 
   1147 
   1148 .. opcode:: ENDSUB - End Subroutine
   1149 
   1150   TBD
   1151 
   1152 
   1153 .. opcode:: NOP - No Operation
   1154 
   1155   Do nothing.
   1156 
   1157 
   1158 .. opcode:: NRM4 - 4-component Vector Normalise
   1159 
   1160 This instruction replicates its result.
   1161 
   1162 .. math::
   1163 
   1164   dst = \frac{src.x}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w}
   1165 
   1166 
   1167 ps_2_x
   1168 ^^^^^^^^^^^^
   1169 
   1170 XXX wait what
   1171 
   1172 .. opcode:: CALLNZ - Subroutine Call If Not Zero
   1173 
   1174   TBD
   1175 
   1176 
   1177 .. opcode:: IFC - If
   1178 
   1179   TBD
   1180 
   1181 
   1182 .. opcode:: BREAKC - Break Conditional
   1183 
   1184   TBD
   1185 
   1186 .. _doubleopcodes:
   1187 
   1188 Double ISA
   1189 ^^^^^^^^^^^^^^^
   1190 
   1191 The double-precision opcodes reinterpret four-component vectors into
   1192 two-component vectors with doubled precision in each component.
   1193 
   1194 Support for these opcodes is XXX undecided. :T
   1195 
   1196 .. opcode:: DADD - Add
   1197 
   1198 .. math::
   1199 
   1200   dst.xy = src0.xy + src1.xy
   1201 
   1202   dst.zw = src0.zw + src1.zw
   1203 
   1204 
   1205 .. opcode:: DDIV - Divide
   1206 
   1207 .. math::
   1208 
   1209   dst.xy = src0.xy / src1.xy
   1210 
   1211   dst.zw = src0.zw / src1.zw
   1212 
   1213 .. opcode:: DSEQ - Set on Equal
   1214 
   1215 .. math::
   1216 
   1217   dst.xy = src0.xy == src1.xy ? 1.0F : 0.0F
   1218 
   1219   dst.zw = src0.zw == src1.zw ? 1.0F : 0.0F
   1220 
   1221 .. opcode:: DSLT - Set on Less than
   1222 
   1223 .. math::
   1224 
   1225   dst.xy = src0.xy < src1.xy ? 1.0F : 0.0F
   1226 
   1227   dst.zw = src0.zw < src1.zw ? 1.0F : 0.0F
   1228 
   1229 .. opcode:: DFRAC - Fraction
   1230 
   1231 .. math::
   1232 
   1233   dst.xy = src.xy - \lfloor src.xy\rfloor
   1234 
   1235   dst.zw = src.zw - \lfloor src.zw\rfloor
   1236 
   1237 
   1238 .. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components
   1239 
   1240 Like the ``frexp()`` routine in many math libraries, this opcode stores the
   1241 exponent of its source to ``dst0``, and the significand to ``dst1``, such that
   1242 :math:`dst1 \times 2^{dst0} = src` .
   1243 
   1244 .. math::
   1245 
   1246   dst0.xy = exp(src.xy)
   1247 
   1248   dst1.xy = frac(src.xy)
   1249 
   1250   dst0.zw = exp(src.zw)
   1251 
   1252   dst1.zw = frac(src.zw)
   1253 
   1254 .. opcode:: DLDEXP - Multiply Number by Integral Power of 2
   1255 
   1256 This opcode is the inverse of :opcode:`DFRACEXP`.
   1257 
   1258 .. math::
   1259 
   1260   dst.xy = src0.xy \times 2^{src1.xy}
   1261 
   1262   dst.zw = src0.zw \times 2^{src1.zw}
   1263 
   1264 .. opcode:: DMIN - Minimum
   1265 
   1266 .. math::
   1267 
   1268   dst.xy = min(src0.xy, src1.xy)
   1269 
   1270   dst.zw = min(src0.zw, src1.zw)
   1271 
   1272 .. opcode:: DMAX - Maximum
   1273 
   1274 .. math::
   1275 
   1276   dst.xy = max(src0.xy, src1.xy)
   1277 
   1278   dst.zw = max(src0.zw, src1.zw)
   1279 
   1280 .. opcode:: DMUL - Multiply
   1281 
   1282 .. math::
   1283 
   1284   dst.xy = src0.xy \times src1.xy
   1285 
   1286   dst.zw = src0.zw \times src1.zw
   1287 
   1288 
   1289 .. opcode:: DMAD - Multiply And Add
   1290 
   1291 .. math::
   1292 
   1293   dst.xy = src0.xy \times src1.xy + src2.xy
   1294 
   1295   dst.zw = src0.zw \times src1.zw + src2.zw
   1296 
   1297 
   1298 .. opcode:: DRCP - Reciprocal
   1299 
   1300 .. math::
   1301 
   1302    dst.xy = \frac{1}{src.xy}
   1303 
   1304    dst.zw = \frac{1}{src.zw}
   1305 
   1306 .. opcode:: DSQRT - Square Root
   1307 
   1308 .. math::
   1309 
   1310    dst.xy = \sqrt{src.xy}
   1311 
   1312    dst.zw = \sqrt{src.zw}
   1313 
   1314 
   1315 .. _samplingopcodes:
   1316 
   1317 Resource Sampling Opcodes
   1318 ^^^^^^^^^^^^^^^^^^^^^^^^^
   1319 
   1320 Those opcodes follow very closely semantics of the respective Direct3D
   1321 instructions. If in doubt double check Direct3D documentation.
   1322 
   1323 .. opcode:: SAMPLE - Using provided address, sample data from the
   1324                specified texture using the filtering mode identified
   1325                by the gven sampler. The source data may come from
   1326                any resource type other than buffers.
   1327                SAMPLE dst, address, sampler_view, sampler
   1328                e.g.
   1329                SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0]
   1330 
   1331 .. opcode:: SAMPLE_I - Simplified alternative to the SAMPLE instruction.
   1332                Using the provided integer address, SAMPLE_I fetches data
   1333                from the specified sampler view without any filtering.
   1334                The source data may come from any resource type other
   1335                than CUBE.
   1336                SAMPLE_I dst, address, sampler_view
   1337                e.g.
   1338                SAMPLE_I TEMP[0], TEMP[1], SVIEW[0]
   1339                The 'address' is specified as unsigned integers. If the
   1340                'address' is out of range [0...(# texels - 1)] the
   1341                result of the fetch is always 0 in all components.
   1342                As such the instruction doesn't honor address wrap
   1343                modes, in cases where that behavior is desirable
   1344                'SAMPLE' instruction should be used.
   1345                address.w always provides an unsigned integer mipmap
   1346                level. If the value is out of the range then the
   1347                instruction always returns 0 in all components.
   1348                address.yz are ignored for buffers and 1d textures.
   1349                address.z is ignored for 1d texture arrays and 2d
   1350                textures.
   1351                For 1D texture arrays address.y provides the array
   1352                index (also as unsigned integer). If the value is
   1353                out of the range of available array indices
   1354                [0... (array size - 1)] then the opcode always returns
   1355                0 in all components.
   1356                For 2D texture arrays address.z provides the array
   1357                index, otherwise it exhibits the same behavior as in
   1358                the case for 1D texture arrays.
   1359                The exact semantics of the source address are presented
   1360                in the table below:
   1361                resource type         X     Y     Z       W
   1362                -------------         ------------------------
   1363                PIPE_BUFFER           x                ignored
   1364                PIPE_TEXTURE_1D       x                  mpl
   1365                PIPE_TEXTURE_2D       x     y            mpl
   1366                PIPE_TEXTURE_3D       x     y     z      mpl
   1367                PIPE_TEXTURE_RECT     x     y            mpl
   1368                PIPE_TEXTURE_CUBE     not allowed as source
   1369                PIPE_TEXTURE_1D_ARRAY x    idx           mpl
   1370                PIPE_TEXTURE_2D_ARRAY x     y    idx     mpl
   1371 
   1372                Where 'mpl' is a mipmap level and 'idx' is the
   1373                array index.
   1374 
   1375 .. opcode:: SAMPLE_I_MS - Just like SAMPLE_I but allows fetch data from
   1376                multi-sampled surfaces.
   1377 
   1378 .. opcode:: SAMPLE_B - Just like the SAMPLE instruction with the
   1379                exception that an additiona bias is applied to the
   1380                level of detail computed as part of the instruction
   1381                execution.
   1382                SAMPLE_B dst, address, sampler_view, sampler, lod_bias
   1383                e.g.
   1384                SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x
   1385 
   1386 .. opcode:: SAMPLE_C - Similar to the SAMPLE instruction but it
   1387                performs a comparison filter. The operands to SAMPLE_C
   1388                are identical to SAMPLE, except that tere is an additional
   1389                float32 operand, reference value, which must be a register
   1390                with single-component, or a scalar literal.
   1391                SAMPLE_C makes the hardware use the current samplers
   1392                compare_func (in pipe_sampler_state) to compare
   1393                reference value against the red component value for the
   1394                surce resource at each texel that the currently configured
   1395                texture filter covers based on the provided coordinates.
   1396                SAMPLE_C dst, address, sampler_view.r, sampler, ref_value
   1397                e.g.
   1398                SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x
   1399 
   1400 .. opcode:: SAMPLE_C_LZ - Same as SAMPLE_C, but LOD is 0 and derivatives
   1401                are ignored. The LZ stands for level-zero.
   1402                SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value
   1403                e.g.
   1404                SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x
   1405 
   1406 
   1407 .. opcode:: SAMPLE_D - SAMPLE_D is identical to the SAMPLE opcode except
   1408                that the derivatives for the source address in the x
   1409                direction and the y direction are provided by extra
   1410                parameters.
   1411                SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y
   1412                e.g.
   1413                SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3]
   1414 
   1415 .. opcode:: SAMPLE_L - SAMPLE_L is identical to the SAMPLE opcode except
   1416                that the LOD is provided directly as a scalar value,
   1417                representing no anisotropy. Source addresses A channel
   1418                is used as the LOD.
   1419                SAMPLE_L dst, address, sampler_view, sampler
   1420                e.g.
   1421                SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0]
   1422 
   1423 .. opcode:: GATHER4 - Gathers the four texels to be used in a bi-linear
   1424                filtering operation and packs them into a single register.
   1425                Only works with 2D, 2D array, cubemaps, and cubemaps arrays.
   1426                For 2D textures, only the addressing modes of the sampler and
   1427                the top level of any mip pyramid are used. Set W to zero.
   1428                It behaves like the SAMPLE instruction, but a filtered
   1429                sample is not generated. The four samples that contribute
   1430                to filtering are placed into xyzw in counter-clockwise order,
   1431                starting with the (u,v) texture coordinate delta at the
   1432                following locations (-, +), (+, +), (+, -), (-, -), where
   1433                the magnitude of the deltas are half a texel.
   1434 
   1435 
   1436 .. opcode:: SVIEWINFO - query the dimensions of a given sampler view.
   1437                dst receives width, height, depth or array size and
   1438                number of mipmap levels. The dst can have a writemask
   1439                which will specify what info is the caller interested
   1440                in.
   1441                SVIEWINFO dst, src_mip_level, sampler_view
   1442                e.g.
   1443                SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0]
   1444                src_mip_level is an unsigned integer scalar. If it's
   1445                out of range then returns 0 for width, height and
   1446                depth/array size but the total number of mipmap is
   1447                still returned correctly for the given sampler view.
   1448                The returned width, height and depth values are for
   1449                the mipmap level selected by the src_mip_level and
   1450                are in the number of texels.
   1451                For 1d texture array width is in dst.x, array size
   1452                is in dst.y and dst.zw are always 0.
   1453 
   1454 .. opcode:: SAMPLE_POS - query the position of a given sample.
   1455                dst receives float4 (x, y, 0, 0) indicated where the
   1456                sample is located. If the resource is not a multi-sample
   1457                resource and not a render target, the result is 0.
   1458 
   1459 .. opcode:: SAMPLE_INFO - dst receives number of samples in x.
   1460                If the resource is not a multi-sample resource and
   1461                not a render target, the result is 0.
   1462 
   1463 
   1464 .. _resourceopcodes:
   1465 
   1466 Resource Access Opcodes
   1467 ^^^^^^^^^^^^^^^^^^^^^^^
   1468 
   1469 .. opcode:: LOAD - Fetch data from a shader resource
   1470 
   1471                Syntax: ``LOAD dst, resource, address``
   1472 
   1473                Example: ``LOAD TEMP[0], RES[0], TEMP[1]``
   1474 
   1475                Using the provided integer address, LOAD fetches data
   1476                from the specified buffer or texture without any
   1477                filtering.
   1478 
   1479                The 'address' is specified as a vector of unsigned
   1480                integers.  If the 'address' is out of range the result
   1481                is unspecified.
   1482 
   1483                Only the first mipmap level of a resource can be read
   1484                from using this instruction.
   1485 
   1486                For 1D or 2D texture arrays, the array index is
   1487                provided as an unsigned integer in address.y or
   1488                address.z, respectively.  address.yz are ignored for
   1489                buffers and 1D textures.  address.z is ignored for 1D
   1490                texture arrays and 2D textures.  address.w is always
   1491                ignored.
   1492 
   1493 .. opcode:: STORE - Write data to a shader resource
   1494 
   1495                Syntax: ``STORE resource, address, src``
   1496 
   1497                Example: ``STORE RES[0], TEMP[0], TEMP[1]``
   1498 
   1499                Using the provided integer address, STORE writes data
   1500                to the specified buffer or texture.
   1501 
   1502                The 'address' is specified as a vector of unsigned
   1503                integers.  If the 'address' is out of range the result
   1504                is unspecified.
   1505 
   1506                Only the first mipmap level of a resource can be
   1507                written to using this instruction.
   1508 
   1509                For 1D or 2D texture arrays, the array index is
   1510                provided as an unsigned integer in address.y or
   1511                address.z, respectively.  address.yz are ignored for
   1512                buffers and 1D textures.  address.z is ignored for 1D
   1513                texture arrays and 2D textures.  address.w is always
   1514                ignored.
   1515 
   1516 
   1517 .. _threadsyncopcodes:
   1518 
   1519 Inter-thread synchronization opcodes
   1520 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   1521 
   1522 These opcodes are intended for communication between threads running
   1523 within the same compute grid.  For now they're only valid in compute
   1524 programs.
   1525 
   1526 .. opcode:: MFENCE - Memory fence
   1527 
   1528   Syntax: ``MFENCE resource``
   1529 
   1530   Example: ``MFENCE RES[0]``
   1531 
   1532   This opcode forces strong ordering between any memory access
   1533   operations that affect the specified resource.  This means that
   1534   previous loads and stores (and only those) will be performed and
   1535   visible to other threads before the program execution continues.
   1536 
   1537 
   1538 .. opcode:: LFENCE - Load memory fence
   1539 
   1540   Syntax: ``LFENCE resource``
   1541 
   1542   Example: ``LFENCE RES[0]``
   1543 
   1544   Similar to MFENCE, but it only affects the ordering of memory loads.
   1545 
   1546 
   1547 .. opcode:: SFENCE - Store memory fence
   1548 
   1549   Syntax: ``SFENCE resource``
   1550 
   1551   Example: ``SFENCE RES[0]``
   1552 
   1553   Similar to MFENCE, but it only affects the ordering of memory stores.
   1554 
   1555 
   1556 .. opcode:: BARRIER - Thread group barrier
   1557 
   1558   ``BARRIER``
   1559 
   1560   This opcode suspends the execution of the current thread until all
   1561   the remaining threads in the working group reach the same point of
   1562   the program.  Results are unspecified if any of the remaining
   1563   threads terminates or never reaches an executed BARRIER instruction.
   1564 
   1565 
   1566 .. _atomopcodes:
   1567 
   1568 Atomic opcodes
   1569 ^^^^^^^^^^^^^^
   1570 
   1571 These opcodes provide atomic variants of some common arithmetic and
   1572 logical operations.  In this context atomicity means that another
   1573 concurrent memory access operation that affects the same memory
   1574 location is guaranteed to be performed strictly before or after the
   1575 entire execution of the atomic operation.
   1576 
   1577 For the moment they're only valid in compute programs.
   1578 
   1579 .. opcode:: ATOMUADD - Atomic integer addition
   1580 
   1581   Syntax: ``ATOMUADD dst, resource, offset, src``
   1582 
   1583   Example: ``ATOMUADD TEMP[0], RES[0], TEMP[1], TEMP[2]``
   1584 
   1585   The following operation is performed atomically on each component:
   1586 
   1587 .. math::
   1588 
   1589   dst_i = resource[offset]_i
   1590 
   1591   resource[offset]_i = dst_i + src_i
   1592 
   1593 
   1594 .. opcode:: ATOMXCHG - Atomic exchange
   1595 
   1596   Syntax: ``ATOMXCHG dst, resource, offset, src``
   1597 
   1598   Example: ``ATOMXCHG TEMP[0], RES[0], TEMP[1], TEMP[2]``
   1599 
   1600   The following operation is performed atomically on each component:
   1601 
   1602 .. math::
   1603 
   1604   dst_i = resource[offset]_i
   1605 
   1606   resource[offset]_i = src_i
   1607 
   1608 
   1609 .. opcode:: ATOMCAS - Atomic compare-and-exchange
   1610 
   1611   Syntax: ``ATOMCAS dst, resource, offset, cmp, src``
   1612 
   1613   Example: ``ATOMCAS TEMP[0], RES[0], TEMP[1], TEMP[2], TEMP[3]``
   1614 
   1615   The following operation is performed atomically on each component:
   1616 
   1617 .. math::
   1618 
   1619   dst_i = resource[offset]_i
   1620 
   1621   resource[offset]_i = (dst_i == cmp_i ? src_i : dst_i)
   1622 
   1623 
   1624 .. opcode:: ATOMAND - Atomic bitwise And
   1625 
   1626   Syntax: ``ATOMAND dst, resource, offset, src``
   1627 
   1628   Example: ``ATOMAND TEMP[0], RES[0], TEMP[1], TEMP[2]``
   1629 
   1630   The following operation is performed atomically on each component:
   1631 
   1632 .. math::
   1633 
   1634   dst_i = resource[offset]_i
   1635 
   1636   resource[offset]_i = dst_i \& src_i
   1637 
   1638 
   1639 .. opcode:: ATOMOR - Atomic bitwise Or
   1640 
   1641   Syntax: ``ATOMOR dst, resource, offset, src``
   1642 
   1643   Example: ``ATOMOR TEMP[0], RES[0], TEMP[1], TEMP[2]``
   1644 
   1645   The following operation is performed atomically on each component:
   1646 
   1647 .. math::
   1648 
   1649   dst_i = resource[offset]_i
   1650 
   1651   resource[offset]_i = dst_i | src_i
   1652 
   1653 
   1654 .. opcode:: ATOMXOR - Atomic bitwise Xor
   1655 
   1656   Syntax: ``ATOMXOR dst, resource, offset, src``
   1657 
   1658   Example: ``ATOMXOR TEMP[0], RES[0], TEMP[1], TEMP[2]``
   1659 
   1660   The following operation is performed atomically on each component:
   1661 
   1662 .. math::
   1663 
   1664   dst_i = resource[offset]_i
   1665 
   1666   resource[offset]_i = dst_i \oplus src_i
   1667 
   1668 
   1669 .. opcode:: ATOMUMIN - Atomic unsigned minimum
   1670 
   1671   Syntax: ``ATOMUMIN dst, resource, offset, src``
   1672 
   1673   Example: ``ATOMUMIN TEMP[0], RES[0], TEMP[1], TEMP[2]``
   1674 
   1675   The following operation is performed atomically on each component:
   1676 
   1677 .. math::
   1678 
   1679   dst_i = resource[offset]_i
   1680 
   1681   resource[offset]_i = (dst_i < src_i ? dst_i : src_i)
   1682 
   1683 
   1684 .. opcode:: ATOMUMAX - Atomic unsigned maximum
   1685 
   1686   Syntax: ``ATOMUMAX dst, resource, offset, src``
   1687 
   1688   Example: ``ATOMUMAX TEMP[0], RES[0], TEMP[1], TEMP[2]``
   1689 
   1690   The following operation is performed atomically on each component:
   1691 
   1692 .. math::
   1693 
   1694   dst_i = resource[offset]_i
   1695 
   1696   resource[offset]_i = (dst_i > src_i ? dst_i : src_i)
   1697 
   1698 
   1699 .. opcode:: ATOMIMIN - Atomic signed minimum
   1700 
   1701   Syntax: ``ATOMIMIN dst, resource, offset, src``
   1702 
   1703   Example: ``ATOMIMIN TEMP[0], RES[0], TEMP[1], TEMP[2]``
   1704 
   1705   The following operation is performed atomically on each component:
   1706 
   1707 .. math::
   1708 
   1709   dst_i = resource[offset]_i
   1710 
   1711   resource[offset]_i = (dst_i < src_i ? dst_i : src_i)
   1712 
   1713 
   1714 .. opcode:: ATOMIMAX - Atomic signed maximum
   1715 
   1716   Syntax: ``ATOMIMAX dst, resource, offset, src``
   1717 
   1718   Example: ``ATOMIMAX TEMP[0], RES[0], TEMP[1], TEMP[2]``
   1719 
   1720   The following operation is performed atomically on each component:
   1721 
   1722 .. math::
   1723 
   1724   dst_i = resource[offset]_i
   1725 
   1726   resource[offset]_i = (dst_i > src_i ? dst_i : src_i)
   1727 
   1728 
   1729 
   1730 Explanation of symbols used
   1731 ------------------------------
   1732 
   1733 
   1734 Functions
   1735 ^^^^^^^^^^^^^^
   1736 
   1737 
   1738   :math:`|x|`       Absolute value of `x`.
   1739 
   1740   :math:`\lceil x \rceil` Ceiling of `x`.
   1741 
   1742   clamp(x,y,z)      Clamp x between y and z.
   1743                     (x < y) ? y : (x > z) ? z : x
   1744 
   1745   :math:`\lfloor x\rfloor` Floor of `x`.
   1746 
   1747   :math:`\log_2{x}` Logarithm of `x`, base 2.
   1748 
   1749   max(x,y)          Maximum of x and y.
   1750                     (x > y) ? x : y
   1751 
   1752   min(x,y)          Minimum of x and y.
   1753                     (x < y) ? x : y
   1754 
   1755   partialx(x)       Derivative of x relative to fragment's X.
   1756 
   1757   partialy(x)       Derivative of x relative to fragment's Y.
   1758 
   1759   pop()             Pop from stack.
   1760 
   1761   :math:`x^y`       `x` to the power `y`.
   1762 
   1763   push(x)           Push x on stack.
   1764 
   1765   round(x)          Round x.
   1766 
   1767   trunc(x)          Truncate x, i.e. drop the fraction bits.
   1768 
   1769 
   1770 Keywords
   1771 ^^^^^^^^^^^^^
   1772 
   1773 
   1774   discard           Discard fragment.
   1775 
   1776   pc                Program counter.
   1777 
   1778   target            Label of target instruction.
   1779 
   1780 
   1781 Other tokens
   1782 ---------------
   1783 
   1784 
   1785 Declaration
   1786 ^^^^^^^^^^^
   1787 
   1788 
   1789 Declares a register that is will be referenced as an operand in Instruction
   1790 tokens.
   1791 
   1792 File field contains register file that is being declared and is one
   1793 of TGSI_FILE.
   1794 
   1795 UsageMask field specifies which of the register components can be accessed
   1796 and is one of TGSI_WRITEMASK.
   1797 
   1798 The Local flag specifies that a given value isn't intended for
   1799 subroutine parameter passing and, as a result, the implementation
   1800 isn't required to give any guarantees of it being preserved across
   1801 subroutine boundaries.  As it's merely a compiler hint, the
   1802 implementation is free to ignore it.
   1803 
   1804 If Dimension flag is set to 1, a Declaration Dimension token follows.
   1805 
   1806 If Semantic flag is set to 1, a Declaration Semantic token follows.
   1807 
   1808 If Interpolate flag is set to 1, a Declaration Interpolate token follows.
   1809 
   1810 If file is TGSI_FILE_RESOURCE, a Declaration Resource token follows.
   1811 
   1812 
   1813 Declaration Semantic
   1814 ^^^^^^^^^^^^^^^^^^^^^^^^
   1815 
   1816   Vertex and fragment shader input and output registers may be labeled
   1817   with semantic information consisting of a name and index.
   1818 
   1819   Follows Declaration token if Semantic bit is set.
   1820 
   1821   Since its purpose is to link a shader with other stages of the pipeline,
   1822   it is valid to follow only those Declaration tokens that declare a register
   1823   either in INPUT or OUTPUT file.
   1824 
   1825   SemanticName field contains the semantic name of the register being declared.
   1826   There is no default value.
   1827 
   1828   SemanticIndex is an optional subscript that can be used to distinguish
   1829   different register declarations with the same semantic name. The default value
   1830   is 0.
   1831 
   1832   The meanings of the individual semantic names are explained in the following
   1833   sections.
   1834 
   1835 TGSI_SEMANTIC_POSITION
   1836 """"""""""""""""""""""
   1837 
   1838 For vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader
   1839 output register which contains the homogeneous vertex position in the clip
   1840 space coordinate system.  After clipping, the X, Y and Z components of the
   1841 vertex will be divided by the W value to get normalized device coordinates.
   1842 
   1843 For fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that
   1844 fragment shader input contains the fragment's window position.  The X
   1845 component starts at zero and always increases from left to right.
   1846 The Y component starts at zero and always increases but Y=0 may either
   1847 indicate the top of the window or the bottom depending on the fragment
   1848 coordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN).
   1849 The Z coordinate ranges from 0 to 1 to represent depth from the front
   1850 to the back of the Z buffer.  The W component contains the reciprocol
   1851 of the interpolated vertex position W component.
   1852 
   1853 Fragment shaders may also declare an output register with
   1854 TGSI_SEMANTIC_POSITION.  Only the Z component is writable.  This allows
   1855 the fragment shader to change the fragment's Z position.
   1856 
   1857 
   1858 
   1859 TGSI_SEMANTIC_COLOR
   1860 """""""""""""""""""
   1861 
   1862 For vertex shader outputs or fragment shader inputs/outputs, this
   1863 label indicates that the resister contains an R,G,B,A color.
   1864 
   1865 Several shader inputs/outputs may contain colors so the semantic index
   1866 is used to distinguish them.  For example, color[0] may be the diffuse
   1867 color while color[1] may be the specular color.
   1868 
   1869 This label is needed so that the flat/smooth shading can be applied
   1870 to the right interpolants during rasterization.
   1871 
   1872 
   1873 
   1874 TGSI_SEMANTIC_BCOLOR
   1875 """"""""""""""""""""
   1876 
   1877 Back-facing colors are only used for back-facing polygons, and are only valid
   1878 in vertex shader outputs. After rasterization, all polygons are front-facing
   1879 and COLOR and BCOLOR end up occupying the same slots in the fragment shader,
   1880 so all BCOLORs effectively become regular COLORs in the fragment shader.
   1881 
   1882 
   1883 TGSI_SEMANTIC_FOG
   1884 """""""""""""""""
   1885 
   1886 Vertex shader inputs and outputs and fragment shader inputs may be
   1887 labeled with TGSI_SEMANTIC_FOG to indicate that the register contains
   1888 a fog coordinate in the form (F, 0, 0, 1).  Typically, the fragment
   1889 shader will use the fog coordinate to compute a fog blend factor which
   1890 is used to blend the normal fragment color with a constant fog color.
   1891 
   1892 Only the first component matters when writing from the vertex shader;
   1893 the driver will ensure that the coordinate is in this format when used
   1894 as a fragment shader input.
   1895 
   1896 
   1897 TGSI_SEMANTIC_PSIZE
   1898 """""""""""""""""""
   1899 
   1900 Vertex shader input and output registers may be labeled with
   1901 TGIS_SEMANTIC_PSIZE to indicate that the register contains a point size
   1902 in the form (S, 0, 0, 1).  The point size controls the width or diameter
   1903 of points for rasterization.  This label cannot be used in fragment
   1904 shaders.
   1905 
   1906 When using this semantic, be sure to set the appropriate state in the
   1907 :ref:`rasterizer` first.
   1908 
   1909 
   1910 TGSI_SEMANTIC_GENERIC
   1911 """""""""""""""""""""
   1912 
   1913 All vertex/fragment shader inputs/outputs not labeled with any other
   1914 semantic label can be considered to be generic attributes.  Typical
   1915 uses of generic inputs/outputs are texcoords and user-defined values.
   1916 
   1917 
   1918 TGSI_SEMANTIC_NORMAL
   1919 """"""""""""""""""""
   1920 
   1921 Indicates that a vertex shader input is a normal vector.  This is
   1922 typically only used for legacy graphics APIs.
   1923 
   1924 
   1925 TGSI_SEMANTIC_FACE
   1926 """"""""""""""""""
   1927 
   1928 This label applies to fragment shader inputs only and indicates that
   1929 the register contains front/back-face information of the form (F, 0,
   1930 0, 1).  The first component will be positive when the fragment belongs
   1931 to a front-facing polygon, and negative when the fragment belongs to a
   1932 back-facing polygon.
   1933 
   1934 
   1935 TGSI_SEMANTIC_EDGEFLAG
   1936 """"""""""""""""""""""
   1937 
   1938 For vertex shaders, this sematic label indicates that an input or
   1939 output is a boolean edge flag.  The register layout is [F, x, x, x]
   1940 where F is 0.0 or 1.0 and x = don't care.  Normally, the vertex shader
   1941 simply copies the edge flag input to the edgeflag output.
   1942 
   1943 Edge flags are used to control which lines or points are actually
   1944 drawn when the polygon mode converts triangles/quads/polygons into
   1945 points or lines.
   1946 
   1947 TGSI_SEMANTIC_STENCIL
   1948 """"""""""""""""""""""
   1949 
   1950 For fragment shaders, this semantic label indicates than an output
   1951 is a writable stencil reference value. Only the Y component is writable.
   1952 This allows the fragment shader to change the fragments stencilref value.
   1953 
   1954 
   1955 Declaration Interpolate
   1956 ^^^^^^^^^^^^^^^^^^^^^^^
   1957 
   1958 This token is only valid for fragment shader INPUT declarations.
   1959 
   1960 The Interpolate field specifes the way input is being interpolated by
   1961 the rasteriser and is one of TGSI_INTERPOLATE_*.
   1962 
   1963 The CylindricalWrap bitfield specifies which register components
   1964 should be subject to cylindrical wrapping when interpolating by the
   1965 rasteriser. If TGSI_CYLINDRICAL_WRAP_X is set to 1, the X component
   1966 should be interpolated according to cylindrical wrapping rules.
   1967 
   1968 
   1969 Declaration Sampler View
   1970 ^^^^^^^^^^^^^^^^^^^^^^^^
   1971 
   1972    Follows Declaration token if file is TGSI_FILE_SAMPLER_VIEW.
   1973 
   1974    DCL SVIEW[#], resource, type(s)
   1975 
   1976    Declares a shader input sampler view and assigns it to a SVIEW[#]
   1977    register.
   1978 
   1979    resource can be one of BUFFER, 1D, 2D, 3D, 1DArray and 2DArray.
   1980 
   1981    type must be 1 or 4 entries (if specifying on a per-component
   1982    level) out of UNORM, SNORM, SINT, UINT and FLOAT.
   1983 
   1984 
   1985 Declaration Resource
   1986 ^^^^^^^^^^^^^^^^^^^^
   1987 
   1988    Follows Declaration token if file is TGSI_FILE_RESOURCE.
   1989 
   1990    DCL RES[#], resource [, WR] [, RAW]
   1991 
   1992    Declares a shader input resource and assigns it to a RES[#]
   1993    register.
   1994 
   1995    resource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1DArray and
   1996    2DArray.
   1997 
   1998    If the RAW keyword is not specified, the texture data will be
   1999    subject to conversion, swizzling and scaling as required to yield
   2000    the specified data type from the physical data format of the bound
   2001    resource.
   2002 
   2003    If the RAW keyword is specified, no channel conversion will be
   2004    performed: the values read for each of the channels (X,Y,Z,W) will
   2005    correspond to consecutive words in the same order and format
   2006    they're found in memory.  No element-to-address conversion will be
   2007    performed either: the value of the provided X coordinate will be
   2008    interpreted in byte units instead of texel units.  The result of
   2009    accessing a misaligned address is undefined.
   2010 
   2011    Usage of the STORE opcode is only allowed if the WR (writable) flag
   2012    is set.
   2013 
   2014 
   2015 Properties
   2016 ^^^^^^^^^^^^^^^^^^^^^^^^
   2017 
   2018 
   2019   Properties are general directives that apply to the whole TGSI program.
   2020 
   2021 FS_COORD_ORIGIN
   2022 """""""""""""""
   2023 
   2024 Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin.
   2025 The default value is UPPER_LEFT.
   2026 
   2027 If UPPER_LEFT, the position will be (0,0) at the upper left corner and
   2028 increase downward and rightward.
   2029 If LOWER_LEFT, the position will be (0,0) at the lower left corner and
   2030 increase upward and rightward.
   2031 
   2032 OpenGL defaults to LOWER_LEFT, and is configurable with the
   2033 GL_ARB_fragment_coord_conventions extension.
   2034 
   2035 DirectX 9/10 use UPPER_LEFT.
   2036 
   2037 FS_COORD_PIXEL_CENTER
   2038 """""""""""""""""""""
   2039 
   2040 Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention.
   2041 The default value is HALF_INTEGER.
   2042 
   2043 If HALF_INTEGER, the fractionary part of the position will be 0.5
   2044 If INTEGER, the fractionary part of the position will be 0.0
   2045 
   2046 Note that this does not affect the set of fragments generated by
   2047 rasterization, which is instead controlled by gl_rasterization_rules in the
   2048 rasterizer.
   2049 
   2050 OpenGL defaults to HALF_INTEGER, and is configurable with the
   2051 GL_ARB_fragment_coord_conventions extension.
   2052 
   2053 DirectX 9 uses INTEGER.
   2054 DirectX 10 uses HALF_INTEGER.
   2055 
   2056 FS_COLOR0_WRITES_ALL_CBUFS
   2057 """"""""""""""""""""""""""
   2058 Specifies that writes to the fragment shader color 0 are replicated to all
   2059 bound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where
   2060 fragData is directed to a single color buffer, but fragColor is broadcast.
   2061 
   2062 VS_PROHIBIT_UCPS
   2063 """"""""""""""""""""""""""
   2064 If this property is set on the program bound to the shader stage before the
   2065 fragment shader, user clip planes should have no effect (be disabled) even if
   2066 that shader does not write to any clip distance outputs and the rasterizer's
   2067 clip_plane_enable is non-zero.
   2068 This property is only supported by drivers that also support shader clip
   2069 distance outputs.
   2070 This is useful for APIs that don't have UCPs and where clip distances written
   2071 by a shader cannot be disabled.
   2072 
   2073 
   2074 Texture Sampling and Texture Formats
   2075 ------------------------------------
   2076 
   2077 This table shows how texture image components are returned as (x,y,z,w) tuples
   2078 by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and
   2079 :opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as
   2080 well.
   2081 
   2082 +--------------------+--------------+--------------------+--------------+
   2083 | Texture Components | Gallium      | OpenGL             | Direct3D 9   |
   2084 +====================+==============+====================+==============+
   2085 | R                  | (r, 0, 0, 1) | (r, 0, 0, 1)       | (r, 1, 1, 1) |
   2086 +--------------------+--------------+--------------------+--------------+
   2087 | RG                 | (r, g, 0, 1) | (r, g, 0, 1)       | (r, g, 1, 1) |
   2088 +--------------------+--------------+--------------------+--------------+
   2089 | RGB                | (r, g, b, 1) | (r, g, b, 1)       | (r, g, b, 1) |
   2090 +--------------------+--------------+--------------------+--------------+
   2091 | RGBA               | (r, g, b, a) | (r, g, b, a)       | (r, g, b, a) |
   2092 +--------------------+--------------+--------------------+--------------+
   2093 | A                  | (0, 0, 0, a) | (0, 0, 0, a)       | (0, 0, 0, a) |
   2094 +--------------------+--------------+--------------------+--------------+
   2095 | L                  | (l, l, l, 1) | (l, l, l, 1)       | (l, l, l, 1) |
   2096 +--------------------+--------------+--------------------+--------------+
   2097 | LA                 | (l, l, l, a) | (l, l, l, a)       | (l, l, l, a) |
   2098 +--------------------+--------------+--------------------+--------------+
   2099 | I                  | (i, i, i, i) | (i, i, i, i)       | N/A          |
   2100 +--------------------+--------------+--------------------+--------------+
   2101 | UV                 | XXX TBD      | (0, 0, 0, 1)       | (u, v, 1, 1) |
   2102 |                    |              | [#envmap-bumpmap]_ |              |
   2103 +--------------------+--------------+--------------------+--------------+
   2104 | Z                  | XXX TBD      | (z, z, z, 1)       | (0, z, 0, 1) |
   2105 |                    |              | [#depth-tex-mode]_ |              |
   2106 +--------------------+--------------+--------------------+--------------+
   2107 | S                  | (s, s, s, s) | unknown            | unknown      |
   2108 +--------------------+--------------+--------------------+--------------+
   2109 
   2110 .. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt
   2111 .. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z)
   2112    or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE.
   2113