1 TGSI 2 ==== 3 4 TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language 5 for describing shaders. Since Gallium is inherently shaderful, shaders are 6 an important part of the API. TGSI is the only intermediate representation 7 used by all drivers. 8 9 Basics 10 ------ 11 12 All TGSI instructions, known as *opcodes*, operate on arbitrary-precision 13 floating-point four-component vectors. An opcode may have up to one 14 destination register, known as *dst*, and between zero and three source 15 registers, called *src0* through *src2*, or simply *src* if there is only 16 one. 17 18 Some instructions, like :opcode:`I2F`, permit re-interpretation of vector 19 components as integers. Other instructions permit using registers as 20 two-component vectors with double precision; see :ref:`Double Opcodes`. 21 22 When an instruction has a scalar result, the result is usually copied into 23 each of the components of *dst*. When this happens, the result is said to be 24 *replicated* to *dst*. :opcode:`RCP` is one such instruction. 25 26 Instruction Set 27 --------------- 28 29 Core ISA 30 ^^^^^^^^^^^^^^^^^^^^^^^^^ 31 32 These opcodes are guaranteed to be available regardless of the driver being 33 used. 34 35 .. opcode:: ARL - Address Register Load 36 37 .. math:: 38 39 dst.x = \lfloor src.x\rfloor 40 41 dst.y = \lfloor src.y\rfloor 42 43 dst.z = \lfloor src.z\rfloor 44 45 dst.w = \lfloor src.w\rfloor 46 47 48 .. opcode:: MOV - Move 49 50 .. math:: 51 52 dst.x = src.x 53 54 dst.y = src.y 55 56 dst.z = src.z 57 58 dst.w = src.w 59 60 61 .. opcode:: LIT - Light Coefficients 62 63 .. math:: 64 65 dst.x = 1 66 67 dst.y = max(src.x, 0) 68 69 dst.z = (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0 70 71 dst.w = 1 72 73 74 .. opcode:: RCP - Reciprocal 75 76 This instruction replicates its result. 77 78 .. math:: 79 80 dst = \frac{1}{src.x} 81 82 83 .. opcode:: RSQ - Reciprocal Square Root 84 85 This instruction replicates its result. 86 87 .. math:: 88 89 dst = \frac{1}{\sqrt{|src.x|}} 90 91 92 .. opcode:: EXP - Approximate Exponential Base 2 93 94 .. math:: 95 96 dst.x = 2^{\lfloor src.x\rfloor} 97 98 dst.y = src.x - \lfloor src.x\rfloor 99 100 dst.z = 2^{src.x} 101 102 dst.w = 1 103 104 105 .. opcode:: LOG - Approximate Logarithm Base 2 106 107 .. math:: 108 109 dst.x = \lfloor\log_2{|src.x|}\rfloor 110 111 dst.y = \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}} 112 113 dst.z = \log_2{|src.x|} 114 115 dst.w = 1 116 117 118 .. opcode:: MUL - Multiply 119 120 .. math:: 121 122 dst.x = src0.x \times src1.x 123 124 dst.y = src0.y \times src1.y 125 126 dst.z = src0.z \times src1.z 127 128 dst.w = src0.w \times src1.w 129 130 131 .. opcode:: ADD - Add 132 133 .. math:: 134 135 dst.x = src0.x + src1.x 136 137 dst.y = src0.y + src1.y 138 139 dst.z = src0.z + src1.z 140 141 dst.w = src0.w + src1.w 142 143 144 .. opcode:: DP3 - 3-component Dot Product 145 146 This instruction replicates its result. 147 148 .. math:: 149 150 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z 151 152 153 .. opcode:: DP4 - 4-component Dot Product 154 155 This instruction replicates its result. 156 157 .. math:: 158 159 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w 160 161 162 .. opcode:: DST - Distance Vector 163 164 .. math:: 165 166 dst.x = 1 167 168 dst.y = src0.y \times src1.y 169 170 dst.z = src0.z 171 172 dst.w = src1.w 173 174 175 .. opcode:: MIN - Minimum 176 177 .. math:: 178 179 dst.x = min(src0.x, src1.x) 180 181 dst.y = min(src0.y, src1.y) 182 183 dst.z = min(src0.z, src1.z) 184 185 dst.w = min(src0.w, src1.w) 186 187 188 .. opcode:: MAX - Maximum 189 190 .. math:: 191 192 dst.x = max(src0.x, src1.x) 193 194 dst.y = max(src0.y, src1.y) 195 196 dst.z = max(src0.z, src1.z) 197 198 dst.w = max(src0.w, src1.w) 199 200 201 .. opcode:: SLT - Set On Less Than 202 203 .. math:: 204 205 dst.x = (src0.x < src1.x) ? 1 : 0 206 207 dst.y = (src0.y < src1.y) ? 1 : 0 208 209 dst.z = (src0.z < src1.z) ? 1 : 0 210 211 dst.w = (src0.w < src1.w) ? 1 : 0 212 213 214 .. opcode:: SGE - Set On Greater Equal Than 215 216 .. math:: 217 218 dst.x = (src0.x >= src1.x) ? 1 : 0 219 220 dst.y = (src0.y >= src1.y) ? 1 : 0 221 222 dst.z = (src0.z >= src1.z) ? 1 : 0 223 224 dst.w = (src0.w >= src1.w) ? 1 : 0 225 226 227 .. opcode:: MAD - Multiply And Add 228 229 .. math:: 230 231 dst.x = src0.x \times src1.x + src2.x 232 233 dst.y = src0.y \times src1.y + src2.y 234 235 dst.z = src0.z \times src1.z + src2.z 236 237 dst.w = src0.w \times src1.w + src2.w 238 239 240 .. opcode:: SUB - Subtract 241 242 .. math:: 243 244 dst.x = src0.x - src1.x 245 246 dst.y = src0.y - src1.y 247 248 dst.z = src0.z - src1.z 249 250 dst.w = src0.w - src1.w 251 252 253 .. opcode:: LRP - Linear Interpolate 254 255 .. math:: 256 257 dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x 258 259 dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y 260 261 dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z 262 263 dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w 264 265 266 .. opcode:: CND - Condition 267 268 .. math:: 269 270 dst.x = (src2.x > 0.5) ? src0.x : src1.x 271 272 dst.y = (src2.y > 0.5) ? src0.y : src1.y 273 274 dst.z = (src2.z > 0.5) ? src0.z : src1.z 275 276 dst.w = (src2.w > 0.5) ? src0.w : src1.w 277 278 279 .. opcode:: DP2A - 2-component Dot Product And Add 280 281 .. math:: 282 283 dst.x = src0.x \times src1.x + src0.y \times src1.y + src2.x 284 285 dst.y = src0.x \times src1.x + src0.y \times src1.y + src2.x 286 287 dst.z = src0.x \times src1.x + src0.y \times src1.y + src2.x 288 289 dst.w = src0.x \times src1.x + src0.y \times src1.y + src2.x 290 291 292 .. opcode:: FRC - Fraction 293 294 .. math:: 295 296 dst.x = src.x - \lfloor src.x\rfloor 297 298 dst.y = src.y - \lfloor src.y\rfloor 299 300 dst.z = src.z - \lfloor src.z\rfloor 301 302 dst.w = src.w - \lfloor src.w\rfloor 303 304 305 .. opcode:: CLAMP - Clamp 306 307 .. math:: 308 309 dst.x = clamp(src0.x, src1.x, src2.x) 310 311 dst.y = clamp(src0.y, src1.y, src2.y) 312 313 dst.z = clamp(src0.z, src1.z, src2.z) 314 315 dst.w = clamp(src0.w, src1.w, src2.w) 316 317 318 .. opcode:: FLR - Floor 319 320 This is identical to :opcode:`ARL`. 321 322 .. math:: 323 324 dst.x = \lfloor src.x\rfloor 325 326 dst.y = \lfloor src.y\rfloor 327 328 dst.z = \lfloor src.z\rfloor 329 330 dst.w = \lfloor src.w\rfloor 331 332 333 .. opcode:: ROUND - Round 334 335 .. math:: 336 337 dst.x = round(src.x) 338 339 dst.y = round(src.y) 340 341 dst.z = round(src.z) 342 343 dst.w = round(src.w) 344 345 346 .. opcode:: EX2 - Exponential Base 2 347 348 This instruction replicates its result. 349 350 .. math:: 351 352 dst = 2^{src.x} 353 354 355 .. opcode:: LG2 - Logarithm Base 2 356 357 This instruction replicates its result. 358 359 .. math:: 360 361 dst = \log_2{src.x} 362 363 364 .. opcode:: POW - Power 365 366 This instruction replicates its result. 367 368 .. math:: 369 370 dst = src0.x^{src1.x} 371 372 .. opcode:: XPD - Cross Product 373 374 .. math:: 375 376 dst.x = src0.y \times src1.z - src1.y \times src0.z 377 378 dst.y = src0.z \times src1.x - src1.z \times src0.x 379 380 dst.z = src0.x \times src1.y - src1.x \times src0.y 381 382 dst.w = 1 383 384 385 .. opcode:: ABS - Absolute 386 387 .. math:: 388 389 dst.x = |src.x| 390 391 dst.y = |src.y| 392 393 dst.z = |src.z| 394 395 dst.w = |src.w| 396 397 398 .. opcode:: RCC - Reciprocal Clamped 399 400 This instruction replicates its result. 401 402 XXX cleanup on aisle three 403 404 .. math:: 405 406 dst = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020) 407 408 409 .. opcode:: DPH - Homogeneous Dot Product 410 411 This instruction replicates its result. 412 413 .. math:: 414 415 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w 416 417 418 .. opcode:: COS - Cosine 419 420 This instruction replicates its result. 421 422 .. math:: 423 424 dst = \cos{src.x} 425 426 427 .. opcode:: DDX - Derivative Relative To X 428 429 .. math:: 430 431 dst.x = partialx(src.x) 432 433 dst.y = partialx(src.y) 434 435 dst.z = partialx(src.z) 436 437 dst.w = partialx(src.w) 438 439 440 .. opcode:: DDY - Derivative Relative To Y 441 442 .. math:: 443 444 dst.x = partialy(src.x) 445 446 dst.y = partialy(src.y) 447 448 dst.z = partialy(src.z) 449 450 dst.w = partialy(src.w) 451 452 453 .. opcode:: KILP - Predicated Discard 454 455 discard 456 457 458 .. opcode:: PK2H - Pack Two 16-bit Floats 459 460 TBD 461 462 463 .. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars 464 465 TBD 466 467 468 .. opcode:: PK4B - Pack Four Signed 8-bit Scalars 469 470 TBD 471 472 473 .. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars 474 475 TBD 476 477 478 .. opcode:: RFL - Reflection Vector 479 480 .. math:: 481 482 dst.x = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.x - src1.x 483 484 dst.y = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.y - src1.y 485 486 dst.z = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.z - src1.z 487 488 dst.w = 1 489 490 .. note:: 491 492 Considered for removal. 493 494 495 .. opcode:: SEQ - Set On Equal 496 497 .. math:: 498 499 dst.x = (src0.x == src1.x) ? 1 : 0 500 501 dst.y = (src0.y == src1.y) ? 1 : 0 502 503 dst.z = (src0.z == src1.z) ? 1 : 0 504 505 dst.w = (src0.w == src1.w) ? 1 : 0 506 507 508 .. opcode:: SFL - Set On False 509 510 This instruction replicates its result. 511 512 .. math:: 513 514 dst = 0 515 516 .. note:: 517 518 Considered for removal. 519 520 521 .. opcode:: SGT - Set On Greater Than 522 523 .. math:: 524 525 dst.x = (src0.x > src1.x) ? 1 : 0 526 527 dst.y = (src0.y > src1.y) ? 1 : 0 528 529 dst.z = (src0.z > src1.z) ? 1 : 0 530 531 dst.w = (src0.w > src1.w) ? 1 : 0 532 533 534 .. opcode:: SIN - Sine 535 536 This instruction replicates its result. 537 538 .. math:: 539 540 dst = \sin{src.x} 541 542 543 .. opcode:: SLE - Set On Less Equal Than 544 545 .. math:: 546 547 dst.x = (src0.x <= src1.x) ? 1 : 0 548 549 dst.y = (src0.y <= src1.y) ? 1 : 0 550 551 dst.z = (src0.z <= src1.z) ? 1 : 0 552 553 dst.w = (src0.w <= src1.w) ? 1 : 0 554 555 556 .. opcode:: SNE - Set On Not Equal 557 558 .. math:: 559 560 dst.x = (src0.x != src1.x) ? 1 : 0 561 562 dst.y = (src0.y != src1.y) ? 1 : 0 563 564 dst.z = (src0.z != src1.z) ? 1 : 0 565 566 dst.w = (src0.w != src1.w) ? 1 : 0 567 568 569 .. opcode:: STR - Set On True 570 571 This instruction replicates its result. 572 573 .. math:: 574 575 dst = 1 576 577 578 .. opcode:: TEX - Texture Lookup 579 580 .. math:: 581 582 coord = src0 583 584 bias = 0.0 585 586 dst = texture_sample(unit, coord, bias) 587 588 for array textures src0.y contains the slice for 1D, 589 and src0.z contain the slice for 2D. 590 for shadow textures with no arrays, src0.z contains 591 the reference value. 592 for shadow textures with arrays, src0.z contains 593 the reference value for 1D arrays, and src0.w contains 594 the reference value for 2D arrays. 595 There is no way to pass a bias in the .w value for 596 shadow arrays, and GLSL doesn't allow this. 597 GLSL does allow cube shadows maps to take a bias value, 598 and we have to determine how this will look in TGSI. 599 600 .. opcode:: TXD - Texture Lookup with Derivatives 601 602 .. math:: 603 604 coord = src0 605 606 ddx = src1 607 608 ddy = src2 609 610 bias = 0.0 611 612 dst = texture_sample_deriv(unit, coord, bias, ddx, ddy) 613 614 615 .. opcode:: TXP - Projective Texture Lookup 616 617 .. math:: 618 619 coord.x = src0.x / src.w 620 621 coord.y = src0.y / src.w 622 623 coord.z = src0.z / src.w 624 625 coord.w = src0.w 626 627 bias = 0.0 628 629 dst = texture_sample(unit, coord, bias) 630 631 632 .. opcode:: UP2H - Unpack Two 16-Bit Floats 633 634 TBD 635 636 .. note:: 637 638 Considered for removal. 639 640 .. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars 641 642 TBD 643 644 .. note:: 645 646 Considered for removal. 647 648 .. opcode:: UP4B - Unpack Four Signed 8-Bit Values 649 650 TBD 651 652 .. note:: 653 654 Considered for removal. 655 656 .. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars 657 658 TBD 659 660 .. note:: 661 662 Considered for removal. 663 664 .. opcode:: X2D - 2D Coordinate Transformation 665 666 .. math:: 667 668 dst.x = src0.x + src1.x \times src2.x + src1.y \times src2.y 669 670 dst.y = src0.y + src1.x \times src2.z + src1.y \times src2.w 671 672 dst.z = src0.x + src1.x \times src2.x + src1.y \times src2.y 673 674 dst.w = src0.y + src1.x \times src2.z + src1.y \times src2.w 675 676 .. note:: 677 678 Considered for removal. 679 680 681 .. opcode:: ARA - Address Register Add 682 683 TBD 684 685 .. note:: 686 687 Considered for removal. 688 689 .. opcode:: ARR - Address Register Load With Round 690 691 .. math:: 692 693 dst.x = round(src.x) 694 695 dst.y = round(src.y) 696 697 dst.z = round(src.z) 698 699 dst.w = round(src.w) 700 701 702 .. opcode:: BRA - Branch 703 704 pc = target 705 706 .. note:: 707 708 Considered for removal. 709 710 .. opcode:: CAL - Subroutine Call 711 712 push(pc) 713 pc = target 714 715 716 .. opcode:: RET - Subroutine Call Return 717 718 pc = pop() 719 720 721 .. opcode:: SSG - Set Sign 722 723 .. math:: 724 725 dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0 726 727 dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0 728 729 dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0 730 731 dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0 732 733 734 .. opcode:: CMP - Compare 735 736 .. math:: 737 738 dst.x = (src0.x < 0) ? src1.x : src2.x 739 740 dst.y = (src0.y < 0) ? src1.y : src2.y 741 742 dst.z = (src0.z < 0) ? src1.z : src2.z 743 744 dst.w = (src0.w < 0) ? src1.w : src2.w 745 746 747 .. opcode:: KIL - Conditional Discard 748 749 .. math:: 750 751 if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0) 752 discard 753 endif 754 755 756 .. opcode:: SCS - Sine Cosine 757 758 .. math:: 759 760 dst.x = \cos{src.x} 761 762 dst.y = \sin{src.x} 763 764 dst.z = 0 765 766 dst.w = 1 767 768 769 .. opcode:: TXB - Texture Lookup With Bias 770 771 .. math:: 772 773 coord.x = src.x 774 775 coord.y = src.y 776 777 coord.z = src.z 778 779 coord.w = 1.0 780 781 bias = src.z 782 783 dst = texture_sample(unit, coord, bias) 784 785 786 .. opcode:: NRM - 3-component Vector Normalise 787 788 .. math:: 789 790 dst.x = src.x / (src.x \times src.x + src.y \times src.y + src.z \times src.z) 791 792 dst.y = src.y / (src.x \times src.x + src.y \times src.y + src.z \times src.z) 793 794 dst.z = src.z / (src.x \times src.x + src.y \times src.y + src.z \times src.z) 795 796 dst.w = 1 797 798 799 .. opcode:: DIV - Divide 800 801 .. math:: 802 803 dst.x = \frac{src0.x}{src1.x} 804 805 dst.y = \frac{src0.y}{src1.y} 806 807 dst.z = \frac{src0.z}{src1.z} 808 809 dst.w = \frac{src0.w}{src1.w} 810 811 812 .. opcode:: DP2 - 2-component Dot Product 813 814 This instruction replicates its result. 815 816 .. math:: 817 818 dst = src0.x \times src1.x + src0.y \times src1.y 819 820 821 .. opcode:: TXL - Texture Lookup With explicit LOD 822 823 .. math:: 824 825 coord.x = src0.x 826 827 coord.y = src0.y 828 829 coord.z = src0.z 830 831 coord.w = 1.0 832 833 lod = src0.w 834 835 dst = texture_sample(unit, coord, lod) 836 837 838 .. opcode:: BRK - Break 839 840 TBD 841 842 843 .. opcode:: IF - If 844 845 TBD 846 847 848 .. opcode:: ELSE - Else 849 850 TBD 851 852 853 .. opcode:: ENDIF - End If 854 855 TBD 856 857 858 .. opcode:: PUSHA - Push Address Register On Stack 859 860 push(src.x) 861 push(src.y) 862 push(src.z) 863 push(src.w) 864 865 .. note:: 866 867 Considered for cleanup. 868 869 .. note:: 870 871 Considered for removal. 872 873 .. opcode:: POPA - Pop Address Register From Stack 874 875 dst.w = pop() 876 dst.z = pop() 877 dst.y = pop() 878 dst.x = pop() 879 880 .. note:: 881 882 Considered for cleanup. 883 884 .. note:: 885 886 Considered for removal. 887 888 889 Compute ISA 890 ^^^^^^^^^^^^^^^^^^^^^^^^ 891 892 These opcodes are primarily provided for special-use computational shaders. 893 Support for these opcodes indicated by a special pipe capability bit (TBD). 894 895 XXX so let's discuss it, yeah? 896 897 .. opcode:: CEIL - Ceiling 898 899 .. math:: 900 901 dst.x = \lceil src.x\rceil 902 903 dst.y = \lceil src.y\rceil 904 905 dst.z = \lceil src.z\rceil 906 907 dst.w = \lceil src.w\rceil 908 909 910 .. opcode:: I2F - Integer To Float 911 912 .. math:: 913 914 dst.x = (float) src.x 915 916 dst.y = (float) src.y 917 918 dst.z = (float) src.z 919 920 dst.w = (float) src.w 921 922 923 .. opcode:: NOT - Bitwise Not 924 925 .. math:: 926 927 dst.x = ~src.x 928 929 dst.y = ~src.y 930 931 dst.z = ~src.z 932 933 dst.w = ~src.w 934 935 936 .. opcode:: TRUNC - Truncate 937 938 .. math:: 939 940 dst.x = trunc(src.x) 941 942 dst.y = trunc(src.y) 943 944 dst.z = trunc(src.z) 945 946 dst.w = trunc(src.w) 947 948 949 .. opcode:: SHL - Shift Left 950 951 .. math:: 952 953 dst.x = src0.x << src1.x 954 955 dst.y = src0.y << src1.x 956 957 dst.z = src0.z << src1.x 958 959 dst.w = src0.w << src1.x 960 961 962 .. opcode:: SHR - Shift Right 963 964 .. math:: 965 966 dst.x = src0.x >> src1.x 967 968 dst.y = src0.y >> src1.x 969 970 dst.z = src0.z >> src1.x 971 972 dst.w = src0.w >> src1.x 973 974 975 .. opcode:: AND - Bitwise And 976 977 .. math:: 978 979 dst.x = src0.x & src1.x 980 981 dst.y = src0.y & src1.y 982 983 dst.z = src0.z & src1.z 984 985 dst.w = src0.w & src1.w 986 987 988 .. opcode:: OR - Bitwise Or 989 990 .. math:: 991 992 dst.x = src0.x | src1.x 993 994 dst.y = src0.y | src1.y 995 996 dst.z = src0.z | src1.z 997 998 dst.w = src0.w | src1.w 999 1000 1001 .. opcode:: MOD - Modulus 1002 1003 .. math:: 1004 1005 dst.x = src0.x \bmod src1.x 1006 1007 dst.y = src0.y \bmod src1.y 1008 1009 dst.z = src0.z \bmod src1.z 1010 1011 dst.w = src0.w \bmod src1.w 1012 1013 1014 .. opcode:: XOR - Bitwise Xor 1015 1016 .. math:: 1017 1018 dst.x = src0.x \oplus src1.x 1019 1020 dst.y = src0.y \oplus src1.y 1021 1022 dst.z = src0.z \oplus src1.z 1023 1024 dst.w = src0.w \oplus src1.w 1025 1026 1027 .. opcode:: UCMP - Integer Conditional Move 1028 1029 .. math:: 1030 1031 dst.x = src0.x ? src1.x : src2.x 1032 1033 dst.y = src0.y ? src1.y : src2.y 1034 1035 dst.z = src0.z ? src1.z : src2.z 1036 1037 dst.w = src0.w ? src1.w : src2.w 1038 1039 1040 .. opcode:: UARL - Integer Address Register Load 1041 1042 Moves the contents of the source register, assumed to be an integer, into the 1043 destination register, which is assumed to be an address (ADDR) register. 1044 1045 1046 .. opcode:: IABS - Integer Absolute Value 1047 1048 .. math:: 1049 1050 dst.x = |src.x| 1051 1052 dst.y = |src.y| 1053 1054 dst.z = |src.z| 1055 1056 dst.w = |src.w| 1057 1058 1059 .. opcode:: SAD - Sum Of Absolute Differences 1060 1061 .. math:: 1062 1063 dst.x = |src0.x - src1.x| + src2.x 1064 1065 dst.y = |src0.y - src1.y| + src2.y 1066 1067 dst.z = |src0.z - src1.z| + src2.z 1068 1069 dst.w = |src0.w - src1.w| + src2.w 1070 1071 1072 .. opcode:: TXF - Texel Fetch (as per NV_gpu_shader4), extract a single texel 1073 from a specified texture image. The source sampler may 1074 not be a CUBE or SHADOW. 1075 src 0 is a four-component signed integer vector used to 1076 identify the single texel accessed. 3 components + level. 1077 src 1 is a 3 component constant signed integer vector, 1078 with each component only have a range of 1079 -8..+8 (hw only seems to deal with this range, interface 1080 allows for up to unsigned int). 1081 TXF(uint_vec coord, int_vec offset). 1082 1083 1084 .. opcode:: TXQ - Texture Size Query (as per NV_gpu_program4) 1085 retrieve the dimensions of the texture 1086 depending on the target. For 1D (width), 2D/RECT/CUBE 1087 (width, height), 3D (width, height, depth), 1088 1D array (width, layers), 2D array (width, height, layers) 1089 1090 .. math:: 1091 1092 lod = src0 1093 1094 dst.x = texture_width(unit, lod) 1095 1096 dst.y = texture_height(unit, lod) 1097 1098 dst.z = texture_depth(unit, lod) 1099 1100 1101 .. opcode:: CONT - Continue 1102 1103 TBD 1104 1105 .. note:: 1106 1107 Support for CONT is determined by a special capability bit, 1108 ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information. 1109 1110 1111 Geometry ISA 1112 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1113 1114 These opcodes are only supported in geometry shaders; they have no meaning 1115 in any other type of shader. 1116 1117 .. opcode:: EMIT - Emit 1118 1119 TBD 1120 1121 1122 .. opcode:: ENDPRIM - End Primitive 1123 1124 TBD 1125 1126 1127 GLSL ISA 1128 ^^^^^^^^^^ 1129 1130 These opcodes are part of :term:`GLSL`'s opcode set. Support for these 1131 opcodes is determined by a special capability bit, ``GLSL``. 1132 1133 .. opcode:: BGNLOOP - Begin a Loop 1134 1135 TBD 1136 1137 1138 .. opcode:: BGNSUB - Begin Subroutine 1139 1140 TBD 1141 1142 1143 .. opcode:: ENDLOOP - End a Loop 1144 1145 TBD 1146 1147 1148 .. opcode:: ENDSUB - End Subroutine 1149 1150 TBD 1151 1152 1153 .. opcode:: NOP - No Operation 1154 1155 Do nothing. 1156 1157 1158 .. opcode:: NRM4 - 4-component Vector Normalise 1159 1160 This instruction replicates its result. 1161 1162 .. math:: 1163 1164 dst = \frac{src.x}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w} 1165 1166 1167 ps_2_x 1168 ^^^^^^^^^^^^ 1169 1170 XXX wait what 1171 1172 .. opcode:: CALLNZ - Subroutine Call If Not Zero 1173 1174 TBD 1175 1176 1177 .. opcode:: IFC - If 1178 1179 TBD 1180 1181 1182 .. opcode:: BREAKC - Break Conditional 1183 1184 TBD 1185 1186 .. _doubleopcodes: 1187 1188 Double ISA 1189 ^^^^^^^^^^^^^^^ 1190 1191 The double-precision opcodes reinterpret four-component vectors into 1192 two-component vectors with doubled precision in each component. 1193 1194 Support for these opcodes is XXX undecided. :T 1195 1196 .. opcode:: DADD - Add 1197 1198 .. math:: 1199 1200 dst.xy = src0.xy + src1.xy 1201 1202 dst.zw = src0.zw + src1.zw 1203 1204 1205 .. opcode:: DDIV - Divide 1206 1207 .. math:: 1208 1209 dst.xy = src0.xy / src1.xy 1210 1211 dst.zw = src0.zw / src1.zw 1212 1213 .. opcode:: DSEQ - Set on Equal 1214 1215 .. math:: 1216 1217 dst.xy = src0.xy == src1.xy ? 1.0F : 0.0F 1218 1219 dst.zw = src0.zw == src1.zw ? 1.0F : 0.0F 1220 1221 .. opcode:: DSLT - Set on Less than 1222 1223 .. math:: 1224 1225 dst.xy = src0.xy < src1.xy ? 1.0F : 0.0F 1226 1227 dst.zw = src0.zw < src1.zw ? 1.0F : 0.0F 1228 1229 .. opcode:: DFRAC - Fraction 1230 1231 .. math:: 1232 1233 dst.xy = src.xy - \lfloor src.xy\rfloor 1234 1235 dst.zw = src.zw - \lfloor src.zw\rfloor 1236 1237 1238 .. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components 1239 1240 Like the ``frexp()`` routine in many math libraries, this opcode stores the 1241 exponent of its source to ``dst0``, and the significand to ``dst1``, such that 1242 :math:`dst1 \times 2^{dst0} = src` . 1243 1244 .. math:: 1245 1246 dst0.xy = exp(src.xy) 1247 1248 dst1.xy = frac(src.xy) 1249 1250 dst0.zw = exp(src.zw) 1251 1252 dst1.zw = frac(src.zw) 1253 1254 .. opcode:: DLDEXP - Multiply Number by Integral Power of 2 1255 1256 This opcode is the inverse of :opcode:`DFRACEXP`. 1257 1258 .. math:: 1259 1260 dst.xy = src0.xy \times 2^{src1.xy} 1261 1262 dst.zw = src0.zw \times 2^{src1.zw} 1263 1264 .. opcode:: DMIN - Minimum 1265 1266 .. math:: 1267 1268 dst.xy = min(src0.xy, src1.xy) 1269 1270 dst.zw = min(src0.zw, src1.zw) 1271 1272 .. opcode:: DMAX - Maximum 1273 1274 .. math:: 1275 1276 dst.xy = max(src0.xy, src1.xy) 1277 1278 dst.zw = max(src0.zw, src1.zw) 1279 1280 .. opcode:: DMUL - Multiply 1281 1282 .. math:: 1283 1284 dst.xy = src0.xy \times src1.xy 1285 1286 dst.zw = src0.zw \times src1.zw 1287 1288 1289 .. opcode:: DMAD - Multiply And Add 1290 1291 .. math:: 1292 1293 dst.xy = src0.xy \times src1.xy + src2.xy 1294 1295 dst.zw = src0.zw \times src1.zw + src2.zw 1296 1297 1298 .. opcode:: DRCP - Reciprocal 1299 1300 .. math:: 1301 1302 dst.xy = \frac{1}{src.xy} 1303 1304 dst.zw = \frac{1}{src.zw} 1305 1306 .. opcode:: DSQRT - Square Root 1307 1308 .. math:: 1309 1310 dst.xy = \sqrt{src.xy} 1311 1312 dst.zw = \sqrt{src.zw} 1313 1314 1315 .. _samplingopcodes: 1316 1317 Resource Sampling Opcodes 1318 ^^^^^^^^^^^^^^^^^^^^^^^^^ 1319 1320 Those opcodes follow very closely semantics of the respective Direct3D 1321 instructions. If in doubt double check Direct3D documentation. 1322 1323 .. opcode:: SAMPLE - Using provided address, sample data from the 1324 specified texture using the filtering mode identified 1325 by the gven sampler. The source data may come from 1326 any resource type other than buffers. 1327 SAMPLE dst, address, sampler_view, sampler 1328 e.g. 1329 SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0] 1330 1331 .. opcode:: SAMPLE_I - Simplified alternative to the SAMPLE instruction. 1332 Using the provided integer address, SAMPLE_I fetches data 1333 from the specified sampler view without any filtering. 1334 The source data may come from any resource type other 1335 than CUBE. 1336 SAMPLE_I dst, address, sampler_view 1337 e.g. 1338 SAMPLE_I TEMP[0], TEMP[1], SVIEW[0] 1339 The 'address' is specified as unsigned integers. If the 1340 'address' is out of range [0...(# texels - 1)] the 1341 result of the fetch is always 0 in all components. 1342 As such the instruction doesn't honor address wrap 1343 modes, in cases where that behavior is desirable 1344 'SAMPLE' instruction should be used. 1345 address.w always provides an unsigned integer mipmap 1346 level. If the value is out of the range then the 1347 instruction always returns 0 in all components. 1348 address.yz are ignored for buffers and 1d textures. 1349 address.z is ignored for 1d texture arrays and 2d 1350 textures. 1351 For 1D texture arrays address.y provides the array 1352 index (also as unsigned integer). If the value is 1353 out of the range of available array indices 1354 [0... (array size - 1)] then the opcode always returns 1355 0 in all components. 1356 For 2D texture arrays address.z provides the array 1357 index, otherwise it exhibits the same behavior as in 1358 the case for 1D texture arrays. 1359 The exact semantics of the source address are presented 1360 in the table below: 1361 resource type X Y Z W 1362 ------------- ------------------------ 1363 PIPE_BUFFER x ignored 1364 PIPE_TEXTURE_1D x mpl 1365 PIPE_TEXTURE_2D x y mpl 1366 PIPE_TEXTURE_3D x y z mpl 1367 PIPE_TEXTURE_RECT x y mpl 1368 PIPE_TEXTURE_CUBE not allowed as source 1369 PIPE_TEXTURE_1D_ARRAY x idx mpl 1370 PIPE_TEXTURE_2D_ARRAY x y idx mpl 1371 1372 Where 'mpl' is a mipmap level and 'idx' is the 1373 array index. 1374 1375 .. opcode:: SAMPLE_I_MS - Just like SAMPLE_I but allows fetch data from 1376 multi-sampled surfaces. 1377 1378 .. opcode:: SAMPLE_B - Just like the SAMPLE instruction with the 1379 exception that an additiona bias is applied to the 1380 level of detail computed as part of the instruction 1381 execution. 1382 SAMPLE_B dst, address, sampler_view, sampler, lod_bias 1383 e.g. 1384 SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x 1385 1386 .. opcode:: SAMPLE_C - Similar to the SAMPLE instruction but it 1387 performs a comparison filter. The operands to SAMPLE_C 1388 are identical to SAMPLE, except that tere is an additional 1389 float32 operand, reference value, which must be a register 1390 with single-component, or a scalar literal. 1391 SAMPLE_C makes the hardware use the current samplers 1392 compare_func (in pipe_sampler_state) to compare 1393 reference value against the red component value for the 1394 surce resource at each texel that the currently configured 1395 texture filter covers based on the provided coordinates. 1396 SAMPLE_C dst, address, sampler_view.r, sampler, ref_value 1397 e.g. 1398 SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x 1399 1400 .. opcode:: SAMPLE_C_LZ - Same as SAMPLE_C, but LOD is 0 and derivatives 1401 are ignored. The LZ stands for level-zero. 1402 SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value 1403 e.g. 1404 SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x 1405 1406 1407 .. opcode:: SAMPLE_D - SAMPLE_D is identical to the SAMPLE opcode except 1408 that the derivatives for the source address in the x 1409 direction and the y direction are provided by extra 1410 parameters. 1411 SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y 1412 e.g. 1413 SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3] 1414 1415 .. opcode:: SAMPLE_L - SAMPLE_L is identical to the SAMPLE opcode except 1416 that the LOD is provided directly as a scalar value, 1417 representing no anisotropy. Source addresses A channel 1418 is used as the LOD. 1419 SAMPLE_L dst, address, sampler_view, sampler 1420 e.g. 1421 SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0] 1422 1423 .. opcode:: GATHER4 - Gathers the four texels to be used in a bi-linear 1424 filtering operation and packs them into a single register. 1425 Only works with 2D, 2D array, cubemaps, and cubemaps arrays. 1426 For 2D textures, only the addressing modes of the sampler and 1427 the top level of any mip pyramid are used. Set W to zero. 1428 It behaves like the SAMPLE instruction, but a filtered 1429 sample is not generated. The four samples that contribute 1430 to filtering are placed into xyzw in counter-clockwise order, 1431 starting with the (u,v) texture coordinate delta at the 1432 following locations (-, +), (+, +), (+, -), (-, -), where 1433 the magnitude of the deltas are half a texel. 1434 1435 1436 .. opcode:: SVIEWINFO - query the dimensions of a given sampler view. 1437 dst receives width, height, depth or array size and 1438 number of mipmap levels. The dst can have a writemask 1439 which will specify what info is the caller interested 1440 in. 1441 SVIEWINFO dst, src_mip_level, sampler_view 1442 e.g. 1443 SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0] 1444 src_mip_level is an unsigned integer scalar. If it's 1445 out of range then returns 0 for width, height and 1446 depth/array size but the total number of mipmap is 1447 still returned correctly for the given sampler view. 1448 The returned width, height and depth values are for 1449 the mipmap level selected by the src_mip_level and 1450 are in the number of texels. 1451 For 1d texture array width is in dst.x, array size 1452 is in dst.y and dst.zw are always 0. 1453 1454 .. opcode:: SAMPLE_POS - query the position of a given sample. 1455 dst receives float4 (x, y, 0, 0) indicated where the 1456 sample is located. If the resource is not a multi-sample 1457 resource and not a render target, the result is 0. 1458 1459 .. opcode:: SAMPLE_INFO - dst receives number of samples in x. 1460 If the resource is not a multi-sample resource and 1461 not a render target, the result is 0. 1462 1463 1464 .. _resourceopcodes: 1465 1466 Resource Access Opcodes 1467 ^^^^^^^^^^^^^^^^^^^^^^^ 1468 1469 .. opcode:: LOAD - Fetch data from a shader resource 1470 1471 Syntax: ``LOAD dst, resource, address`` 1472 1473 Example: ``LOAD TEMP[0], RES[0], TEMP[1]`` 1474 1475 Using the provided integer address, LOAD fetches data 1476 from the specified buffer or texture without any 1477 filtering. 1478 1479 The 'address' is specified as a vector of unsigned 1480 integers. If the 'address' is out of range the result 1481 is unspecified. 1482 1483 Only the first mipmap level of a resource can be read 1484 from using this instruction. 1485 1486 For 1D or 2D texture arrays, the array index is 1487 provided as an unsigned integer in address.y or 1488 address.z, respectively. address.yz are ignored for 1489 buffers and 1D textures. address.z is ignored for 1D 1490 texture arrays and 2D textures. address.w is always 1491 ignored. 1492 1493 .. opcode:: STORE - Write data to a shader resource 1494 1495 Syntax: ``STORE resource, address, src`` 1496 1497 Example: ``STORE RES[0], TEMP[0], TEMP[1]`` 1498 1499 Using the provided integer address, STORE writes data 1500 to the specified buffer or texture. 1501 1502 The 'address' is specified as a vector of unsigned 1503 integers. If the 'address' is out of range the result 1504 is unspecified. 1505 1506 Only the first mipmap level of a resource can be 1507 written to using this instruction. 1508 1509 For 1D or 2D texture arrays, the array index is 1510 provided as an unsigned integer in address.y or 1511 address.z, respectively. address.yz are ignored for 1512 buffers and 1D textures. address.z is ignored for 1D 1513 texture arrays and 2D textures. address.w is always 1514 ignored. 1515 1516 1517 .. _threadsyncopcodes: 1518 1519 Inter-thread synchronization opcodes 1520 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1521 1522 These opcodes are intended for communication between threads running 1523 within the same compute grid. For now they're only valid in compute 1524 programs. 1525 1526 .. opcode:: MFENCE - Memory fence 1527 1528 Syntax: ``MFENCE resource`` 1529 1530 Example: ``MFENCE RES[0]`` 1531 1532 This opcode forces strong ordering between any memory access 1533 operations that affect the specified resource. This means that 1534 previous loads and stores (and only those) will be performed and 1535 visible to other threads before the program execution continues. 1536 1537 1538 .. opcode:: LFENCE - Load memory fence 1539 1540 Syntax: ``LFENCE resource`` 1541 1542 Example: ``LFENCE RES[0]`` 1543 1544 Similar to MFENCE, but it only affects the ordering of memory loads. 1545 1546 1547 .. opcode:: SFENCE - Store memory fence 1548 1549 Syntax: ``SFENCE resource`` 1550 1551 Example: ``SFENCE RES[0]`` 1552 1553 Similar to MFENCE, but it only affects the ordering of memory stores. 1554 1555 1556 .. opcode:: BARRIER - Thread group barrier 1557 1558 ``BARRIER`` 1559 1560 This opcode suspends the execution of the current thread until all 1561 the remaining threads in the working group reach the same point of 1562 the program. Results are unspecified if any of the remaining 1563 threads terminates or never reaches an executed BARRIER instruction. 1564 1565 1566 .. _atomopcodes: 1567 1568 Atomic opcodes 1569 ^^^^^^^^^^^^^^ 1570 1571 These opcodes provide atomic variants of some common arithmetic and 1572 logical operations. In this context atomicity means that another 1573 concurrent memory access operation that affects the same memory 1574 location is guaranteed to be performed strictly before or after the 1575 entire execution of the atomic operation. 1576 1577 For the moment they're only valid in compute programs. 1578 1579 .. opcode:: ATOMUADD - Atomic integer addition 1580 1581 Syntax: ``ATOMUADD dst, resource, offset, src`` 1582 1583 Example: ``ATOMUADD TEMP[0], RES[0], TEMP[1], TEMP[2]`` 1584 1585 The following operation is performed atomically on each component: 1586 1587 .. math:: 1588 1589 dst_i = resource[offset]_i 1590 1591 resource[offset]_i = dst_i + src_i 1592 1593 1594 .. opcode:: ATOMXCHG - Atomic exchange 1595 1596 Syntax: ``ATOMXCHG dst, resource, offset, src`` 1597 1598 Example: ``ATOMXCHG TEMP[0], RES[0], TEMP[1], TEMP[2]`` 1599 1600 The following operation is performed atomically on each component: 1601 1602 .. math:: 1603 1604 dst_i = resource[offset]_i 1605 1606 resource[offset]_i = src_i 1607 1608 1609 .. opcode:: ATOMCAS - Atomic compare-and-exchange 1610 1611 Syntax: ``ATOMCAS dst, resource, offset, cmp, src`` 1612 1613 Example: ``ATOMCAS TEMP[0], RES[0], TEMP[1], TEMP[2], TEMP[3]`` 1614 1615 The following operation is performed atomically on each component: 1616 1617 .. math:: 1618 1619 dst_i = resource[offset]_i 1620 1621 resource[offset]_i = (dst_i == cmp_i ? src_i : dst_i) 1622 1623 1624 .. opcode:: ATOMAND - Atomic bitwise And 1625 1626 Syntax: ``ATOMAND dst, resource, offset, src`` 1627 1628 Example: ``ATOMAND TEMP[0], RES[0], TEMP[1], TEMP[2]`` 1629 1630 The following operation is performed atomically on each component: 1631 1632 .. math:: 1633 1634 dst_i = resource[offset]_i 1635 1636 resource[offset]_i = dst_i \& src_i 1637 1638 1639 .. opcode:: ATOMOR - Atomic bitwise Or 1640 1641 Syntax: ``ATOMOR dst, resource, offset, src`` 1642 1643 Example: ``ATOMOR TEMP[0], RES[0], TEMP[1], TEMP[2]`` 1644 1645 The following operation is performed atomically on each component: 1646 1647 .. math:: 1648 1649 dst_i = resource[offset]_i 1650 1651 resource[offset]_i = dst_i | src_i 1652 1653 1654 .. opcode:: ATOMXOR - Atomic bitwise Xor 1655 1656 Syntax: ``ATOMXOR dst, resource, offset, src`` 1657 1658 Example: ``ATOMXOR TEMP[0], RES[0], TEMP[1], TEMP[2]`` 1659 1660 The following operation is performed atomically on each component: 1661 1662 .. math:: 1663 1664 dst_i = resource[offset]_i 1665 1666 resource[offset]_i = dst_i \oplus src_i 1667 1668 1669 .. opcode:: ATOMUMIN - Atomic unsigned minimum 1670 1671 Syntax: ``ATOMUMIN dst, resource, offset, src`` 1672 1673 Example: ``ATOMUMIN TEMP[0], RES[0], TEMP[1], TEMP[2]`` 1674 1675 The following operation is performed atomically on each component: 1676 1677 .. math:: 1678 1679 dst_i = resource[offset]_i 1680 1681 resource[offset]_i = (dst_i < src_i ? dst_i : src_i) 1682 1683 1684 .. opcode:: ATOMUMAX - Atomic unsigned maximum 1685 1686 Syntax: ``ATOMUMAX dst, resource, offset, src`` 1687 1688 Example: ``ATOMUMAX TEMP[0], RES[0], TEMP[1], TEMP[2]`` 1689 1690 The following operation is performed atomically on each component: 1691 1692 .. math:: 1693 1694 dst_i = resource[offset]_i 1695 1696 resource[offset]_i = (dst_i > src_i ? dst_i : src_i) 1697 1698 1699 .. opcode:: ATOMIMIN - Atomic signed minimum 1700 1701 Syntax: ``ATOMIMIN dst, resource, offset, src`` 1702 1703 Example: ``ATOMIMIN TEMP[0], RES[0], TEMP[1], TEMP[2]`` 1704 1705 The following operation is performed atomically on each component: 1706 1707 .. math:: 1708 1709 dst_i = resource[offset]_i 1710 1711 resource[offset]_i = (dst_i < src_i ? dst_i : src_i) 1712 1713 1714 .. opcode:: ATOMIMAX - Atomic signed maximum 1715 1716 Syntax: ``ATOMIMAX dst, resource, offset, src`` 1717 1718 Example: ``ATOMIMAX TEMP[0], RES[0], TEMP[1], TEMP[2]`` 1719 1720 The following operation is performed atomically on each component: 1721 1722 .. math:: 1723 1724 dst_i = resource[offset]_i 1725 1726 resource[offset]_i = (dst_i > src_i ? dst_i : src_i) 1727 1728 1729 1730 Explanation of symbols used 1731 ------------------------------ 1732 1733 1734 Functions 1735 ^^^^^^^^^^^^^^ 1736 1737 1738 :math:`|x|` Absolute value of `x`. 1739 1740 :math:`\lceil x \rceil` Ceiling of `x`. 1741 1742 clamp(x,y,z) Clamp x between y and z. 1743 (x < y) ? y : (x > z) ? z : x 1744 1745 :math:`\lfloor x\rfloor` Floor of `x`. 1746 1747 :math:`\log_2{x}` Logarithm of `x`, base 2. 1748 1749 max(x,y) Maximum of x and y. 1750 (x > y) ? x : y 1751 1752 min(x,y) Minimum of x and y. 1753 (x < y) ? x : y 1754 1755 partialx(x) Derivative of x relative to fragment's X. 1756 1757 partialy(x) Derivative of x relative to fragment's Y. 1758 1759 pop() Pop from stack. 1760 1761 :math:`x^y` `x` to the power `y`. 1762 1763 push(x) Push x on stack. 1764 1765 round(x) Round x. 1766 1767 trunc(x) Truncate x, i.e. drop the fraction bits. 1768 1769 1770 Keywords 1771 ^^^^^^^^^^^^^ 1772 1773 1774 discard Discard fragment. 1775 1776 pc Program counter. 1777 1778 target Label of target instruction. 1779 1780 1781 Other tokens 1782 --------------- 1783 1784 1785 Declaration 1786 ^^^^^^^^^^^ 1787 1788 1789 Declares a register that is will be referenced as an operand in Instruction 1790 tokens. 1791 1792 File field contains register file that is being declared and is one 1793 of TGSI_FILE. 1794 1795 UsageMask field specifies which of the register components can be accessed 1796 and is one of TGSI_WRITEMASK. 1797 1798 The Local flag specifies that a given value isn't intended for 1799 subroutine parameter passing and, as a result, the implementation 1800 isn't required to give any guarantees of it being preserved across 1801 subroutine boundaries. As it's merely a compiler hint, the 1802 implementation is free to ignore it. 1803 1804 If Dimension flag is set to 1, a Declaration Dimension token follows. 1805 1806 If Semantic flag is set to 1, a Declaration Semantic token follows. 1807 1808 If Interpolate flag is set to 1, a Declaration Interpolate token follows. 1809 1810 If file is TGSI_FILE_RESOURCE, a Declaration Resource token follows. 1811 1812 1813 Declaration Semantic 1814 ^^^^^^^^^^^^^^^^^^^^^^^^ 1815 1816 Vertex and fragment shader input and output registers may be labeled 1817 with semantic information consisting of a name and index. 1818 1819 Follows Declaration token if Semantic bit is set. 1820 1821 Since its purpose is to link a shader with other stages of the pipeline, 1822 it is valid to follow only those Declaration tokens that declare a register 1823 either in INPUT or OUTPUT file. 1824 1825 SemanticName field contains the semantic name of the register being declared. 1826 There is no default value. 1827 1828 SemanticIndex is an optional subscript that can be used to distinguish 1829 different register declarations with the same semantic name. The default value 1830 is 0. 1831 1832 The meanings of the individual semantic names are explained in the following 1833 sections. 1834 1835 TGSI_SEMANTIC_POSITION 1836 """""""""""""""""""""" 1837 1838 For vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader 1839 output register which contains the homogeneous vertex position in the clip 1840 space coordinate system. After clipping, the X, Y and Z components of the 1841 vertex will be divided by the W value to get normalized device coordinates. 1842 1843 For fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that 1844 fragment shader input contains the fragment's window position. The X 1845 component starts at zero and always increases from left to right. 1846 The Y component starts at zero and always increases but Y=0 may either 1847 indicate the top of the window or the bottom depending on the fragment 1848 coordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN). 1849 The Z coordinate ranges from 0 to 1 to represent depth from the front 1850 to the back of the Z buffer. The W component contains the reciprocol 1851 of the interpolated vertex position W component. 1852 1853 Fragment shaders may also declare an output register with 1854 TGSI_SEMANTIC_POSITION. Only the Z component is writable. This allows 1855 the fragment shader to change the fragment's Z position. 1856 1857 1858 1859 TGSI_SEMANTIC_COLOR 1860 """"""""""""""""""" 1861 1862 For vertex shader outputs or fragment shader inputs/outputs, this 1863 label indicates that the resister contains an R,G,B,A color. 1864 1865 Several shader inputs/outputs may contain colors so the semantic index 1866 is used to distinguish them. For example, color[0] may be the diffuse 1867 color while color[1] may be the specular color. 1868 1869 This label is needed so that the flat/smooth shading can be applied 1870 to the right interpolants during rasterization. 1871 1872 1873 1874 TGSI_SEMANTIC_BCOLOR 1875 """""""""""""""""""" 1876 1877 Back-facing colors are only used for back-facing polygons, and are only valid 1878 in vertex shader outputs. After rasterization, all polygons are front-facing 1879 and COLOR and BCOLOR end up occupying the same slots in the fragment shader, 1880 so all BCOLORs effectively become regular COLORs in the fragment shader. 1881 1882 1883 TGSI_SEMANTIC_FOG 1884 """"""""""""""""" 1885 1886 Vertex shader inputs and outputs and fragment shader inputs may be 1887 labeled with TGSI_SEMANTIC_FOG to indicate that the register contains 1888 a fog coordinate in the form (F, 0, 0, 1). Typically, the fragment 1889 shader will use the fog coordinate to compute a fog blend factor which 1890 is used to blend the normal fragment color with a constant fog color. 1891 1892 Only the first component matters when writing from the vertex shader; 1893 the driver will ensure that the coordinate is in this format when used 1894 as a fragment shader input. 1895 1896 1897 TGSI_SEMANTIC_PSIZE 1898 """"""""""""""""""" 1899 1900 Vertex shader input and output registers may be labeled with 1901 TGIS_SEMANTIC_PSIZE to indicate that the register contains a point size 1902 in the form (S, 0, 0, 1). The point size controls the width or diameter 1903 of points for rasterization. This label cannot be used in fragment 1904 shaders. 1905 1906 When using this semantic, be sure to set the appropriate state in the 1907 :ref:`rasterizer` first. 1908 1909 1910 TGSI_SEMANTIC_GENERIC 1911 """"""""""""""""""""" 1912 1913 All vertex/fragment shader inputs/outputs not labeled with any other 1914 semantic label can be considered to be generic attributes. Typical 1915 uses of generic inputs/outputs are texcoords and user-defined values. 1916 1917 1918 TGSI_SEMANTIC_NORMAL 1919 """""""""""""""""""" 1920 1921 Indicates that a vertex shader input is a normal vector. This is 1922 typically only used for legacy graphics APIs. 1923 1924 1925 TGSI_SEMANTIC_FACE 1926 """""""""""""""""" 1927 1928 This label applies to fragment shader inputs only and indicates that 1929 the register contains front/back-face information of the form (F, 0, 1930 0, 1). The first component will be positive when the fragment belongs 1931 to a front-facing polygon, and negative when the fragment belongs to a 1932 back-facing polygon. 1933 1934 1935 TGSI_SEMANTIC_EDGEFLAG 1936 """""""""""""""""""""" 1937 1938 For vertex shaders, this sematic label indicates that an input or 1939 output is a boolean edge flag. The register layout is [F, x, x, x] 1940 where F is 0.0 or 1.0 and x = don't care. Normally, the vertex shader 1941 simply copies the edge flag input to the edgeflag output. 1942 1943 Edge flags are used to control which lines or points are actually 1944 drawn when the polygon mode converts triangles/quads/polygons into 1945 points or lines. 1946 1947 TGSI_SEMANTIC_STENCIL 1948 """""""""""""""""""""" 1949 1950 For fragment shaders, this semantic label indicates than an output 1951 is a writable stencil reference value. Only the Y component is writable. 1952 This allows the fragment shader to change the fragments stencilref value. 1953 1954 1955 Declaration Interpolate 1956 ^^^^^^^^^^^^^^^^^^^^^^^ 1957 1958 This token is only valid for fragment shader INPUT declarations. 1959 1960 The Interpolate field specifes the way input is being interpolated by 1961 the rasteriser and is one of TGSI_INTERPOLATE_*. 1962 1963 The CylindricalWrap bitfield specifies which register components 1964 should be subject to cylindrical wrapping when interpolating by the 1965 rasteriser. If TGSI_CYLINDRICAL_WRAP_X is set to 1, the X component 1966 should be interpolated according to cylindrical wrapping rules. 1967 1968 1969 Declaration Sampler View 1970 ^^^^^^^^^^^^^^^^^^^^^^^^ 1971 1972 Follows Declaration token if file is TGSI_FILE_SAMPLER_VIEW. 1973 1974 DCL SVIEW[#], resource, type(s) 1975 1976 Declares a shader input sampler view and assigns it to a SVIEW[#] 1977 register. 1978 1979 resource can be one of BUFFER, 1D, 2D, 3D, 1DArray and 2DArray. 1980 1981 type must be 1 or 4 entries (if specifying on a per-component 1982 level) out of UNORM, SNORM, SINT, UINT and FLOAT. 1983 1984 1985 Declaration Resource 1986 ^^^^^^^^^^^^^^^^^^^^ 1987 1988 Follows Declaration token if file is TGSI_FILE_RESOURCE. 1989 1990 DCL RES[#], resource [, WR] [, RAW] 1991 1992 Declares a shader input resource and assigns it to a RES[#] 1993 register. 1994 1995 resource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1DArray and 1996 2DArray. 1997 1998 If the RAW keyword is not specified, the texture data will be 1999 subject to conversion, swizzling and scaling as required to yield 2000 the specified data type from the physical data format of the bound 2001 resource. 2002 2003 If the RAW keyword is specified, no channel conversion will be 2004 performed: the values read for each of the channels (X,Y,Z,W) will 2005 correspond to consecutive words in the same order and format 2006 they're found in memory. No element-to-address conversion will be 2007 performed either: the value of the provided X coordinate will be 2008 interpreted in byte units instead of texel units. The result of 2009 accessing a misaligned address is undefined. 2010 2011 Usage of the STORE opcode is only allowed if the WR (writable) flag 2012 is set. 2013 2014 2015 Properties 2016 ^^^^^^^^^^^^^^^^^^^^^^^^ 2017 2018 2019 Properties are general directives that apply to the whole TGSI program. 2020 2021 FS_COORD_ORIGIN 2022 """"""""""""""" 2023 2024 Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin. 2025 The default value is UPPER_LEFT. 2026 2027 If UPPER_LEFT, the position will be (0,0) at the upper left corner and 2028 increase downward and rightward. 2029 If LOWER_LEFT, the position will be (0,0) at the lower left corner and 2030 increase upward and rightward. 2031 2032 OpenGL defaults to LOWER_LEFT, and is configurable with the 2033 GL_ARB_fragment_coord_conventions extension. 2034 2035 DirectX 9/10 use UPPER_LEFT. 2036 2037 FS_COORD_PIXEL_CENTER 2038 """"""""""""""""""""" 2039 2040 Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention. 2041 The default value is HALF_INTEGER. 2042 2043 If HALF_INTEGER, the fractionary part of the position will be 0.5 2044 If INTEGER, the fractionary part of the position will be 0.0 2045 2046 Note that this does not affect the set of fragments generated by 2047 rasterization, which is instead controlled by gl_rasterization_rules in the 2048 rasterizer. 2049 2050 OpenGL defaults to HALF_INTEGER, and is configurable with the 2051 GL_ARB_fragment_coord_conventions extension. 2052 2053 DirectX 9 uses INTEGER. 2054 DirectX 10 uses HALF_INTEGER. 2055 2056 FS_COLOR0_WRITES_ALL_CBUFS 2057 """""""""""""""""""""""""" 2058 Specifies that writes to the fragment shader color 0 are replicated to all 2059 bound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where 2060 fragData is directed to a single color buffer, but fragColor is broadcast. 2061 2062 VS_PROHIBIT_UCPS 2063 """""""""""""""""""""""""" 2064 If this property is set on the program bound to the shader stage before the 2065 fragment shader, user clip planes should have no effect (be disabled) even if 2066 that shader does not write to any clip distance outputs and the rasterizer's 2067 clip_plane_enable is non-zero. 2068 This property is only supported by drivers that also support shader clip 2069 distance outputs. 2070 This is useful for APIs that don't have UCPs and where clip distances written 2071 by a shader cannot be disabled. 2072 2073 2074 Texture Sampling and Texture Formats 2075 ------------------------------------ 2076 2077 This table shows how texture image components are returned as (x,y,z,w) tuples 2078 by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and 2079 :opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as 2080 well. 2081 2082 +--------------------+--------------+--------------------+--------------+ 2083 | Texture Components | Gallium | OpenGL | Direct3D 9 | 2084 +====================+==============+====================+==============+ 2085 | R | (r, 0, 0, 1) | (r, 0, 0, 1) | (r, 1, 1, 1) | 2086 +--------------------+--------------+--------------------+--------------+ 2087 | RG | (r, g, 0, 1) | (r, g, 0, 1) | (r, g, 1, 1) | 2088 +--------------------+--------------+--------------------+--------------+ 2089 | RGB | (r, g, b, 1) | (r, g, b, 1) | (r, g, b, 1) | 2090 +--------------------+--------------+--------------------+--------------+ 2091 | RGBA | (r, g, b, a) | (r, g, b, a) | (r, g, b, a) | 2092 +--------------------+--------------+--------------------+--------------+ 2093 | A | (0, 0, 0, a) | (0, 0, 0, a) | (0, 0, 0, a) | 2094 +--------------------+--------------+--------------------+--------------+ 2095 | L | (l, l, l, 1) | (l, l, l, 1) | (l, l, l, 1) | 2096 +--------------------+--------------+--------------------+--------------+ 2097 | LA | (l, l, l, a) | (l, l, l, a) | (l, l, l, a) | 2098 +--------------------+--------------+--------------------+--------------+ 2099 | I | (i, i, i, i) | (i, i, i, i) | N/A | 2100 +--------------------+--------------+--------------------+--------------+ 2101 | UV | XXX TBD | (0, 0, 0, 1) | (u, v, 1, 1) | 2102 | | | [#envmap-bumpmap]_ | | 2103 +--------------------+--------------+--------------------+--------------+ 2104 | Z | XXX TBD | (z, z, z, 1) | (0, z, 0, 1) | 2105 | | | [#depth-tex-mode]_ | | 2106 +--------------------+--------------+--------------------+--------------+ 2107 | S | (s, s, s, s) | unknown | unknown | 2108 +--------------------+--------------+--------------------+--------------+ 2109 2110 .. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt 2111 .. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z) 2112 or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE. 2113