1 Name 2 3 MESA_shader_integer_functions 4 5 Name Strings 6 7 GL_MESA_shader_integer_functions 8 9 Contact 10 11 Ian Romanick <ian.d.romanick (a] intel.com> 12 13 Contributors 14 15 All the contributors of GL_ARB_gpu_shader5 16 17 Status 18 19 Supported by all GLSL 1.30 capable drivers in Mesa 12.1 and later 20 21 Version 22 23 Version 2, July 7, 2016 24 25 Number 26 27 TBD 28 29 Dependencies 30 31 This extension is written against the OpenGL 3.2 (Compatibility Profile) 32 Specification. 33 34 This extension is written against Version 1.50 (Revision 09) of the OpenGL 35 Shading Language Specification. 36 37 GLSL 1.30 is required. 38 39 This extension interacts with ARB_gpu_shader5. 40 41 This extension interacts with ARB_gpu_shader_fp64. 42 43 This extension interacts with NV_gpu_shader5. 44 45 Overview 46 47 GL_ARB_gpu_shader5 extends GLSL in a number of useful ways. Much of this 48 added functionality requires significant hardware support. There are many 49 aspects, however, that can be easily implmented on any GPU with "real" 50 integer support (as opposed to simulating integers using floating point 51 calculations). 52 53 This extension provides a set of new features to the OpenGL Shading 54 Language to support capabilities of these GPUs, extending the capabilities 55 of version 1.30 of the OpenGL Shading Language. Shaders 56 using the new functionality provided by this extension should enable this 57 functionality via the construct 58 59 #extension GL_MESA_shader_integer_functions : require (or enable) 60 61 This extension provides a variety of new features for all shader types, 62 including: 63 64 * support for implicitly converting signed integer types to unsigned 65 types, as well as more general implicit conversion and function 66 overloading infrastructure to support new data types introduced by 67 other extensions; 68 69 * new built-in functions supporting: 70 71 * splitting a floating-point number into a significand and exponent 72 (frexp), or building a floating-point number from a significand and 73 exponent (ldexp); 74 75 * integer bitfield manipulation, including functions to find the 76 position of the most or least significant set bit, count the number 77 of one bits, and bitfield insertion, extraction, and reversal; 78 79 * extended integer precision math, including add with carry, subtract 80 with borrow, and extenended multiplication; 81 82 The resulting extension is a strict subset of GL_ARB_gpu_shader5. 83 84 IP Status 85 86 No known IP claims. 87 88 New Procedures and Functions 89 90 None 91 92 New Tokens 93 94 None 95 96 Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification 97 (OpenGL Operation) 98 99 None. 100 101 Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification 102 (Rasterization) 103 104 None. 105 106 Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification 107 (Per-Fragment Operations and the Frame Buffer) 108 109 None. 110 111 Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification 112 (Special Functions) 113 114 None. 115 116 Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification 117 (State and State Requests) 118 119 None. 120 121 Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile) 122 Specification (Invariance) 123 124 None. 125 126 Additions to the AGL/GLX/WGL Specifications 127 128 None. 129 130 Modifications to The OpenGL Shading Language Specification, Version 1.50 131 (Revision 09) 132 133 Including the following line in a shader can be used to control the 134 language features described in this extension: 135 136 #extension GL_MESA_shader_integer_functions : <behavior> 137 138 where <behavior> is as specified in section 3.3. 139 140 New preprocessor #defines are added to the OpenGL Shading Language: 141 142 #define GL_MESA_shader_integer_functions 1 143 144 145 Modify Section 4.1.10, Implicit Conversions, p. 27 146 147 (modify table of implicit conversions) 148 149 Can be implicitly 150 Type of expression converted to 151 --------------------- ----------------- 152 int uint, float 153 ivec2 uvec2, vec2 154 ivec3 uvec3, vec3 155 ivec4 uvec4, vec4 156 157 uint float 158 uvec2 vec2 159 uvec3 vec3 160 uvec4 vec4 161 162 (modify second paragraph of the section) No implicit conversions are 163 provided to convert from unsigned to signed integer types or from 164 floating-point to integer types. There are no implicit array or structure 165 conversions. 166 167 (insert before the final paragraph of the section) When performing 168 implicit conversion for binary operators, there may be multiple data types 169 to which the two operands can be converted. For example, when adding an 170 int value to a uint value, both values can be implicitly converted to uint 171 and float. In such cases, a floating-point type is chosen if either 172 operand has a floating-point type. Otherwise, an unsigned integer type is 173 chosen if either operand has an unsigned integer type. Otherwise, a 174 signed integer type is chosen. 175 176 177 Modify Section 5.9, Expressions, p. 57 178 179 (modify bulleted list as follows, adding support for implicit conversion 180 between signed and unsigned types) 181 182 Expressions in the shading language are built from the following: 183 184 * Constants of type bool, int, int64_t, uint, uint64_t, float, all vector 185 types, and all matrix types. 186 187 ... 188 189 * The operator modulus (%) operates on signed or unsigned integer scalars 190 or vectors. If the fundamental types of the operands do not match, the 191 conversions from Section 4.1.10 "Implicit Conversions" are applied to 192 produce matching types. ... 193 194 195 Modify Section 6.1, Function Definitions, p. 63 196 197 (modify description of overloading, beginning at the top of p. 64) 198 199 Function names can be overloaded. The same function name can be used for 200 multiple functions, as long as the parameter types differ. If a function 201 name is declared twice with the same parameter types, then the return 202 types and all qualifiers must also match, and it is the same function 203 being declared. For example, 204 205 vec4 f(in vec4 x, out vec4 y); // (A) 206 vec4 f(in vec4 x, out uvec4 y); // (B) okay, different argument type 207 vec4 f(in ivec4 x, out uvec4 y); // (C) okay, different argument type 208 209 int f(in vec4 x, out ivec4 y); // error, only return type differs 210 vec4 f(in vec4 x, in vec4 y); // error, only qualifier differs 211 vec4 f(const in vec4 x, out vec4 y); // error, only qualifier differs 212 213 When function calls are resolved, an exact type match for all the 214 arguments is sought. If an exact match is found, all other functions are 215 ignored, and the exact match is used. If no exact match is found, then 216 the implicit conversions in Section 4.1.10 (Implicit Conversions) will be 217 applied to find a match. Mismatched types on input parameters (in or 218 inout or default) must have a conversion from the calling argument type 219 to the formal parameter type. Mismatched types on output parameters (out 220 or inout) must have a conversion from the formal parameter type to the 221 calling argument type. 222 223 If implicit conversions can be used to find more than one matching 224 function, a single best-matching function is sought. To determine a best 225 match, the conversions between calling argument and formal parameter 226 types are compared for each function argument and pair of matching 227 functions. After these comparisons are performed, each pair of matching 228 functions are compared. A function definition A is considered a better 229 match than function definition B if: 230 231 * for at least one function argument, the conversion for that argument 232 in A is better than the corresponding conversion in B; and 233 234 * there is no function argument for which the conversion in B is better 235 than the corresponding conversion in A. 236 237 If a single function definition is considered a better match than every 238 other matching function definition, it will be used. Otherwise, a 239 semantic error occurs and the shader will fail to compile. 240 241 To determine whether the conversion for a single argument in one match is 242 better than that for another match, the following rules are applied, in 243 order: 244 245 1. An exact match is better than a match involving any implicit 246 conversion. 247 248 2. A match involving an implicit conversion from float to double is 249 better than a match involving any other implicit conversion. 250 251 3. A match involving an implicit conversion from either int or uint to 252 float is better than a match involving an implicit conversion from 253 either int or uint to double. 254 255 If none of the rules above apply to a particular pair of conversions, 256 neither conversion is considered better than the other. 257 258 For the function prototypes (A), (B), and (C) above, the following 259 examples show how the rules apply to different sets of calling argument 260 types: 261 262 f(vec4, vec4); // exact match of vec4 f(in vec4 x, out vec4 y) 263 f(vec4, uvec4); // exact match of vec4 f(in vec4 x, out ivec4 y) 264 f(vec4, ivec4); // matched to vec4 f(in vec4 x, out vec4 y) 265 // (C) not relevant, can't convert vec4 to 266 // ivec4. (A) better than (B) for 2nd 267 // argument (rule 2), same on first argument. 268 f(ivec4, vec4); // NOT matched. All three match by implicit 269 // conversion. (C) is better than (A) and (B) 270 // on the first argument. (A) is better than 271 // (B) and (C). 272 273 274 Modify Section 8.3, Common Functions, p. 84 275 276 (add support for single-precision frexp and ldexp functions) 277 278 Syntax: 279 280 genType frexp(genType x, out genIType exp); 281 genType ldexp(genType x, in genIType exp); 282 283 The function frexp() splits each single-precision floating-point number in 284 <x> into a binary significand, a floating-point number in the range [0.5, 285 1.0), and an integral exponent of two, such that: 286 287 x = significand * 2 ^ exponent 288 289 The significand is returned by the function; the exponent is returned in 290 the parameter <exp>. For a floating-point value of zero, the significant 291 and exponent are both zero. For a floating-point value that is an 292 infinity or is not a number, the results of frexp() are undefined. 293 294 If the input <x> is a vector, this operation is performed in a 295 component-wise manner; the value returned by the function and the value 296 written to <exp> are vectors with the same number of components as <x>. 297 298 The function ldexp() builds a single-precision floating-point number from 299 each significand component in <x> and the corresponding integral exponent 300 of two in <exp>, returning: 301 302 significand * 2 ^ exponent 303 304 If this product is too large to be represented as a single-precision 305 floating-point value, the result is considered undefined. 306 307 If the input <x> is a vector, this operation is performed in a 308 component-wise manner; the value passed in <exp> and returned by the 309 function are vectors with the same number of components as <x>. 310 311 312 (add support for new integer built-in functions) 313 314 Syntax: 315 316 genIType bitfieldExtract(genIType value, int offset, int bits); 317 genUType bitfieldExtract(genUType value, int offset, int bits); 318 319 genIType bitfieldInsert(genIType base, genIType insert, int offset, 320 int bits); 321 genUType bitfieldInsert(genUType base, genUType insert, int offset, 322 int bits); 323 324 genIType bitfieldReverse(genIType value); 325 genUType bitfieldReverse(genUType value); 326 327 genIType bitCount(genIType value); 328 genIType bitCount(genUType value); 329 330 genIType findLSB(genIType value); 331 genIType findLSB(genUType value); 332 333 genIType findMSB(genIType value); 334 genIType findMSB(genUType value); 335 336 The function bitfieldExtract() extracts bits <offset> through 337 <offset>+<bits>-1 from each component in <value>, returning them in the 338 least significant bits of corresponding component of the result. For 339 unsigned data types, the most significant bits of the result will be set 340 to zero. For signed data types, the most significant bits will be set to 341 the value of bit <offset>+<base>-1. If <bits> is zero, the result will be 342 zero. The result will be undefined if <offset> or <bits> is negative, or 343 if the sum of <offset> and <bits> is greater than the number of bits used 344 to store the operand. Note that for vector versions of bitfieldExtract(), 345 a single pair of <offset> and <bits> values is shared for all components. 346 347 The function bitfieldInsert() inserts the <bits> least significant bits of 348 each component of <insert> into the corresponding component of <base>. 349 The result will have bits numbered <offset> through <offset>+<bits>-1 350 taken from bits 0 through <bits>-1 of <insert>, and all other bits taken 351 directly from the corresponding bits of <base>. If <bits> is zero, the 352 result will simply be <base>. The result will be undefined if <offset> or 353 <bits> is negative, or if the sum of <offset> and <bits> is greater than 354 the number of bits used to store the operand. Note that for vector 355 versions of bitfieldInsert(), a single pair of <offset> and <bits> values 356 is shared for all components. 357 358 The function bitfieldReverse() reverses the bits of <value>. The bit 359 numbered <n> of the result will be taken from bit (<bits>-1)-<n> of 360 <value>, where <bits> is the total number of bits used to represent 361 <value>. 362 363 The function bitCount() returns the number of one bits in the binary 364 representation of <value>. 365 366 The function findLSB() returns the bit number of the least significant one 367 bit in the binary representation of <value>. If <value> is zero, -1 will 368 be returned. 369 370 The function findMSB() returns the bit number of the most significant bit 371 in the binary representation of <value>. For positive integers, the 372 result will be the bit number of the most significant one bit. For 373 negative integers, the result will be the bit number of the most 374 significant zero bit. For a <value> of zero or negative one, -1 will be 375 returned. 376 377 378 (support for unsigned integer add/subtract with carry-out) 379 380 Syntax: 381 382 genUType uaddCarry(genUType x, genUType y, out genUType carry); 383 genUType usubBorrow(genUType x, genUType y, out genUType borrow); 384 385 The function uaddCarry() adds 32-bit unsigned integers or vectors <x> and 386 <y>, returning the sum modulo 2^32. The value <carry> is set to zero if 387 the sum was less than 2^32, or one otherwise. 388 389 The function usubBorrow() subtracts the 32-bit unsigned integer or vector 390 <y> from <x>, returning the difference if non-negative or 2^32 plus the 391 difference, otherwise. The value <borrow> is set to zero if x >= y, or 392 one otherwise. 393 394 395 (support for signed and unsigned multiplies, with 32-bit inputs and a 396 64-bit result spanning two 32-bit outputs) 397 398 Syntax: 399 400 void umulExtended(genUType x, genUType y, out genUType msb, 401 out genUType lsb); 402 void imulExtended(genIType x, genIType y, out genIType msb, 403 out genIType lsb); 404 405 The functions umulExtended() and imulExtended() multiply 32-bit unsigned 406 or signed integers or vectors <x> and <y>, producing a 64-bit result. The 407 32 least significant bits are returned in <lsb>; the 32 most significant 408 bits are returned in <msb>. 409 410 411 GLX Protocol 412 413 None. 414 415 Dependencies on ARB_gpu_shader_fp64 416 417 This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set 418 of implicit conversions supported in the OpenGL Shading Language. If more 419 than one of these extensions is supported, an expression of one type may 420 be converted to another type if that conversion is allowed by any of these 421 specifications. 422 423 If ARB_gpu_shader_fp64 or a similar extension introducing new data types 424 is not supported, the function overloading rule in the GLSL specification 425 preferring promotion an input parameters to smaller type to a larger type 426 is never applicable, as all data types are of the same size. That rule 427 and the example referring to "double" should be removed. 428 429 430 Dependencies on NV_gpu_shader5 431 432 This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set 433 of implicit conversions supported in the OpenGL Shading Language. If more 434 than one of these extensions is supported, an expression of one type may 435 be converted to another type if that conversion is allowed by any of these 436 specifications. 437 438 If NV_gpu_shader5 is supported, integer data types are supported with four 439 different precisions (8-, 16, 32-, and 64-bit) and floating-point data 440 types are supported with three different precisions (16-, 32-, and 441 64-bit). The extension adds the following rule for output parameters, 442 which is similar to the one present in this extension for input 443 parameters: 444 445 5. If the formal parameters in both matches are output parameters, a 446 conversion from a type with a larger number of bits per component is 447 better than a conversion from a type with a smaller number of bits 448 per component. For example, a conversion from an "int16_t" formal 449 parameter type to "int" is better than one from an "int8_t" formal 450 parameter type to "int". 451 452 Such a rule is not provided in this extension because there is no 453 combination of types in this extension and ARB_gpu_shader_fp64 where this 454 rule has any effect. 455 456 457 Errors 458 459 None 460 461 462 New State 463 464 None 465 466 New Implementation Dependent State 467 468 None 469 470 Issues 471 472 (1) What should this extension be called? 473 474 UNRESOLVED. This extension borrows from GL_ARB_gpu_shader5, so creating 475 some sort of a play on that name would be viable. However, nothing in 476 this extension should require SM5 hardware, so such a name would be a 477 little misleading and weird. 478 479 Since the primary purpose is to add integer related functions from 480 GL_ARB_gpu_shader5, call this extension GL_MESA_shader_integer_functions 481 for now. 482 483 (2) Why is some of the formatting in this extension weird? 484 485 RESOLVED: This extension is formatted to minimize the differences (as 486 reported by 'diff --side-by-side -W180') with the GL_ARB_gpu_shader5 487 specification. 488 489 (3) Should ldexp and frexp be included? 490 491 RESOLVED: Yes. Few GPUs have native instructions to implement these 492 functions. These are generally implemented using existing GLSL built-in 493 functions and the other functions provided by this extension. 494 495 (4) Should umulExtended and imulExtended be included? 496 497 RESOLVED: Yes. These functions should be implementable on any GPU that 498 can support the rest of this extension, but the implementation may be 499 complex. The implementation on a GPU that only supports 32bit x 32bit = 500 32bit multiplication would be quite expensive. However, many GPUs 501 (including OpenGL 4.0 GPUs that already support this function) have a 502 32bit x 16bit = 48bit multiplier. The implementation there is only 503 trivially more expensive than regular 32bit multiplication. 504 505 (5) Should the pack and unpack functions be included? 506 507 RESOLVED: No. These functions are already available via 508 GL_ARB_shading_language_packing. 509 510 (6) Should the "BitsTo" functions be included? 511 512 RESOLVED: No. These functions are already available via 513 GL_ARB_shader_bit_encoding. 514 515 Revision History 516 517 Rev. Date Author Changes 518 ---- ----------- -------- ----------------------------------------- 519 2 7-Jul-2016 idr Fix typo in #extension line 520 1 20-Jun-2016 idr Initial version based on GL_ARB_gpu_shader5. 521