1 <?xml version="1.0" encoding="UTF-8"?> 2 <!DOCTYPE rfc SYSTEM "rfc2629.dtd" [ 3 <!ENTITY rfc2119 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'> 4 <!ENTITY rfc3550 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml'> 5 <!ENTITY rfc3711 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3711.xml'> 6 <!ENTITY rfc3551 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3551.xml'> 7 <!ENTITY rfc4288 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4288.xml'> 8 <!ENTITY rfc4855 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4855.xml'> 9 <!ENTITY rfc4566 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4566.xml'> 10 <!ENTITY rfc3264 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3264.xml'> 11 <!ENTITY rfc2974 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2974.xml'> 12 <!ENTITY rfc2326 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2326.xml'> 13 <!ENTITY rfc3555 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3555.xml'> 14 <!ENTITY rfc5576 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5576.xml'> 15 <!ENTITY rfc6562 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.6562.xml'> 16 <!ENTITY rfc6716 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.6716.xml'> 17 <!ENTITY nbsp " "> 18 ]> 19 20 <rfc category="std" ipr="trust200902" docName="draft-ietf-payload-rtp-opus-01"> 21 <?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?> 22 23 <?rfc strict="yes" ?> 24 <?rfc toc="yes" ?> 25 <?rfc tocdepth="3" ?> 26 <?rfc tocappendix='no' ?> 27 <?rfc tocindent='yes' ?> 28 <?rfc symrefs="yes" ?> 29 <?rfc sortrefs="yes" ?> 30 <?rfc compact="no" ?> 31 <?rfc subcompact="yes" ?> 32 <?rfc iprnotified="yes" ?> 33 34 <front> 35 <title abbrev="RTP Payload Format for Opus Codec"> 36 RTP Payload Format for Opus Speech and Audio Codec 37 </title> 38 39 <author fullname="Julian Spittka" initials="J." surname="Spittka"> 40 <address> 41 <email>jspittka (a] gmail.com</email> 42 </address> 43 </author> 44 45 <author initials='K.' surname='Vos' fullname='Koen Vos'> 46 <organization>Skype Technologies S.A.</organization> 47 <address> 48 <postal> 49 <street>3210 Porter Drive</street> 50 <code>94304</code> 51 <city>Palo Alto</city> 52 <region>CA</region> 53 <country>USA</country> 54 </postal> 55 <email>koenvos74 (a] gmail.com</email> 56 </address> 57 </author> 58 59 <author initials="JM" surname="Valin" fullname="Jean-Marc Valin"> 60 <organization>Mozilla</organization> 61 <address> 62 <postal> 63 <street>650 Castro Street</street> 64 <city>Mountain View</city> 65 <region>CA</region> 66 <code>94041</code> 67 <country>USA</country> 68 </postal> 69 <email>jmvalin (a] jmvalin.ca</email> 70 </address> 71 </author> 72 73 <date day='2' month='August' year='2013' /> 74 75 <abstract> 76 <t> 77 This document defines the Real-time Transport Protocol (RTP) payload 78 format for packetization of Opus encoded 79 speech and audio data that is essential to integrate the codec in the 80 most compatible way. Further, media type registrations 81 are described for the RTP payload format. 82 </t> 83 </abstract> 84 </front> 85 86 <middle> 87 <section title='Introduction'> 88 <t> 89 The Opus codec is a speech and audio codec developed within the 90 IETF Internet Wideband Audio Codec working group (codec). The codec 91 has a very low algorithmic delay and it 92 is highly scalable in terms of audio bandwidth, bitrate, and 93 complexity. Further, it provides different modes to efficiently encode speech signals 94 as well as music signals, thus, making it the codec of choice for 95 various applications using the Internet or similar networks. 96 </t> 97 <t> 98 This document defines the Real-time Transport Protocol (RTP) 99 <xref target="RFC3550"/> payload format for packetization 100 of Opus encoded speech and audio data that is essential to 101 integrate the Opus codec in the 102 most compatible way. Further, media type registrations are described for 103 the RTP payload format. More information on the Opus 104 codec can be obtained from <xref target="RFC6716"/>. 105 </t> 106 </section> 107 108 <section title='Conventions, Definitions and Acronyms used in this document'> 109 <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 110 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 111 document are to be interpreted as described in <xref target="RFC2119"/>.</t> 112 <t> 113 <list style='hanging'> 114 <t hangText="CBR:"> Constant bitrate</t> 115 <t hangText="CPU:"> Central Processing Unit</t> 116 <t hangText="DTX:"> Discontinuous transmission</t> 117 <t hangText="FEC:"> Forward error correction</t> 118 <t hangText="IP:"> Internet Protocol</t> 119 <t hangText="samples:"> Speech or audio samples (usually per channel)</t> 120 <t hangText="SDP:"> Session Description Protocol</t> 121 <t hangText="VBR:"> Variable bitrate</t> 122 </list> 123 </t> 124 <section title='Audio Bandwidth'> 125 <t> 126 Throughout this document, we refer to the following definitions: 127 </t> 128 <texttable anchor='bandwidth_definitions'> 129 <ttcol align='center'>Abbreviation</ttcol> 130 <ttcol align='center'>Name</ttcol> 131 <ttcol align='center'>Bandwidth</ttcol> 132 <ttcol align='center'>Sampling</ttcol> 133 <c>nb</c> 134 <c>Narrowband</c> 135 <c>0 - 4000</c> 136 <c>8000</c> 137 138 <c>mb</c> 139 <c>Mediumband</c> 140 <c>0 - 6000</c> 141 <c>12000</c> 142 143 <c>wb</c> 144 <c>Wideband</c> 145 <c>0 - 8000</c> 146 <c>16000</c> 147 148 <c>swb</c> 149 <c>Super-wideband</c> 150 <c>0 - 12000</c> 151 <c>24000</c> 152 153 <c>fb</c> 154 <c>Fullband</c> 155 <c>0 - 20000</c> 156 <c>48000</c> 157 158 <postamble> 159 Audio bandwidth naming 160 </postamble> 161 </texttable> 162 </section> 163 </section> 164 165 <section title='Opus Codec'> 166 <t> 167 The Opus <xref target="RFC6716"/> speech and audio codec has been developed to encode speech 168 signals as well as audio signals. Two different modes, a voice mode 169 or an audio mode, may be chosen to allow the most efficient coding 170 dependent on the type of input signal, the sampling frequency of the 171 input signal, and the specific application. 172 </t> 173 174 <t> 175 The voice mode allows efficient encoding of voice signals at lower bit 176 rates while the audio mode is optimized for audio signals at medium and 177 higher bitrates. 178 </t> 179 180 <t> 181 The Opus speech and audio codec is highly scalable in terms of audio 182 bandwidth, bitrate, and complexity. Further, Opus allows 183 transmitting stereo signals. 184 </t> 185 186 <section title='Network Bandwidth'> 187 <t> 188 Opus supports all bitrates from 6 kb/s to 510 kb/s. 189 The bitrate can be changed dynamically within that range. 190 All 191 other parameters being 192 equal, higher bitrate results in higher quality. 193 </t> 194 <section title='Recommended Bitrate' anchor='bitrate_by_bandwidth'> 195 <t> 196 For a frame size of 197 20 ms, these 198 are the bitrate "sweet spots" for Opus in various configurations: 199 200 <list style="symbols"> 201 <t>8-12 kb/s for NB speech,</t> 202 <t>16-20 kb/s for WB speech,</t> 203 <t>28-40 kb/s for FB speech,</t> 204 <t>48-64 kb/s for FB mono music, and</t> 205 <t>64-128 kb/s for FB stereo music.</t> 206 </list> 207 </t> 208 </section> 209 <section title='Variable versus Constant Bit Rate' anchor='variable-vs-constant-bitrate'> 210 <t> 211 For the same average bitrate, variable bitrate (VBR) can achieve higher quality 212 than constant bitrate (CBR). For the majority of voice transmission application, VBR 213 is the best choice. One potential reason for choosing CBR is the potential 214 information leak that <spanx style='emph'>may</spanx> occur when encrypting the 215 compressed stream. See <xref target="RFC6562"/> for guidelines on when VBR is 216 appropriate for encrypted audio communications. In the case where an existing 217 VBR stream needs to be converted to CBR for security reasons, then the Opus padding 218 mechanism described in <xref target="RFC6716"/> is the RECOMMENDED way to achieve padding 219 because the RTP padding bit is unencrypted.</t> 220 221 <t> 222 The bitrate can be adjusted at any point in time. To avoid congestion, 223 the average bitrate SHOULD be adjusted to the available 224 network capacity. If no target bitrate is specified, the bitrates specified in 225 <xref target='bitrate_by_bandwidth'/> are RECOMMENDED. 226 </t> 227 228 </section> 229 230 <section title='Discontinuous Transmission (DTX)'> 231 232 <t> 233 The Opus codec may, as described in <xref target='variable-vs-constant-bitrate'/>, 234 be operated with an adaptive bitrate. In that case, the bitrate 235 will automatically be reduced for certain input signals like periods 236 of silence. During continuous transmission the bitrate will be 237 reduced, when the input signal allows to do so, but the transmission 238 to the receiver itself will never be interrupted. Therefore, the 239 received signal will maintain the same high level of quality over the 240 full duration of a transmission while minimizing the average bit 241 rate over time. 242 </t> 243 244 <t> 245 In cases where the bitrate of Opus needs to be reduced even 246 further or in cases where only constant bitrate is available, 247 the Opus encoder may be set to use discontinuous 248 transmission (DTX), where parts of the encoded signal that 249 correspond to periods of silence in the input speech or audio signal 250 are not transmitted to the receiver. 251 </t> 252 253 <t> 254 On the receiving side, the non-transmitted parts will be handled by a 255 frame loss concealment unit in the Opus decoder which generates a 256 comfort noise signal to replace the non transmitted parts of the 257 speech or audio signal. 258 </t> 259 260 <t> 261 The DTX mode of Opus will have a slightly lower speech or audio 262 quality than the continuous mode. Therefore, it is RECOMMENDED to 263 use Opus in the continuous mode unless restraints on network 264 capacity are severe. The DTX mode can be engaged for operation 265 in both adaptive or constant bitrate. 266 </t> 267 268 </section> 269 270 </section> 271 272 <section title='Complexity'> 273 274 <t> 275 Complexity can be scaled to optimize for CPU resources in real-time, mostly as 276 a trade-off between audio quality and bitrate. Also, different modes of Opus have different complexity. 277 </t> 278 279 </section> 280 281 <section title="Forward Error Correction (FEC)"> 282 283 <t> 284 The voice mode of Opus allows for "in-band" forward error correction (FEC) 285 data to be embedded into the bit stream of Opus. This FEC scheme adds 286 redundant information about the previous packet (n-1) to the current 287 output packet n. For 288 each frame, the encoder decides whether to use FEC based on (1) an 289 externally-provided estimate of the channel's packet loss rate; (2) an 290 externally-provided estimate of the channel's capacity; (3) the 291 sensitivity of the audio or speech signal to packet loss; (4) whether 292 the receiving decoder has indicated it can take advantage of "in-band" 293 FEC information. The decision to send "in-band" FEC information is 294 entirely controlled by the encoder and therefore no special precautions 295 for the payload have to be taken. 296 </t> 297 298 <t> 299 On the receiving side, the decoder can take advantage of this 300 additional information when, in case of a packet loss, the next packet 301 is available. In order to use the FEC data, the jitter buffer needs 302 to provide access to payloads with the FEC data. The decoder API function 303 has a flag to indicate that a FEC frame rather than a regular frame should 304 be decoded. If no FEC data is available for the current frame, the decoder 305 will consider the frame lost and invokes the frame loss concealment. 306 </t> 307 308 <t> 309 If the FEC scheme is not implemented on the receiving side, FEC 310 SHOULD NOT be used, as it leads to an inefficient usage of network 311 resources. Decoder support for FEC SHOULD be indicated at the time a 312 session is set up. 313 </t> 314 315 </section> 316 317 <section title='Stereo Operation'> 318 319 <t> 320 Opus allows for transmission of stereo audio signals. This operation 321 is signaled in-band in the Opus payload and no special arrangement 322 is required in the payload format. Any implementation of the Opus 323 decoder MUST be capable of receiving stereo signals, although it MAY 324 decode those signals as mono. 325 </t> 326 <t> 327 If a decoder can not take advantage of the benefits of a stereo signal 328 this SHOULD be indicated at the time a session is set up. In that case 329 the sending side SHOULD NOT send stereo signals as it leads to an 330 inefficient usage of the network. 331 </t> 332 333 </section> 334 335 </section> 336 337 <section title='Opus RTP Payload Format' anchor='opus-rtp-payload-format'> 338 <t>The payload format for Opus consists of the RTP header and Opus payload 339 data.</t> 340 <section title='RTP Header Usage'> 341 <t>The format of the RTP header is specified in <xref target="RFC3550"/>. The Opus 342 payload format uses the fields of the RTP header consistent with this 343 specification.</t> 344 345 <t>The payload length of Opus is a multiple number of octets and 346 therefore no padding is required. The payload MAY be padded by an 347 integer number of octets according to <xref target="RFC3550"/>.</t> 348 349 <t>The marker bit (M) of the RTP header is used in accordance with 350 Section 4.1 of <xref target="RFC3551"/>.</t> 351 352 <t>The RTP payload type for Opus has not been assigned statically and is 353 expected to be assigned dynamically.</t> 354 355 <t>The receiving side MUST be prepared to receive duplicates of RTP 356 packets. Only one of those payloads MUST be provided to the Opus decoder 357 for decoding and others MUST be discarded.</t> 358 359 <t>Opus supports 5 different audio bandwidths which may be adjusted during 360 the duration of a call. The RTP timestamp clock frequency is defined as 361 the highest supported sampling frequency of Opus, i.e. 48000 Hz, for all 362 modes and sampling rates of Opus. The unit 363 for the timestamp is samples per single (mono) channel. The RTP timestamp corresponds to the 364 sample time of the first encoded sample in the encoded frame. For sampling 365 rates lower than 48000 Hz the number of samples has to be multiplied with 366 a multiplier according to <xref target="fs-upsample-factors"/> to determine 367 the RTP timestamp.</t> 368 369 <texttable anchor='fs-upsample-factors' title="Timestamp multiplier"> 370 <ttcol align='center'>fs (Hz)</ttcol> 371 <ttcol align='center'>Multiplier</ttcol> 372 <c>8000</c> 373 <c>6</c> 374 <c>12000</c> 375 <c>4</c> 376 <c>16000</c> 377 <c>3</c> 378 <c>24000</c> 379 <c>2</c> 380 <c>48000</c> 381 <c>1</c> 382 </texttable> 383 </section> 384 385 <section title='Payload Structure'> 386 <t> 387 The Opus encoder can be set to output encoded frames representing 2.5, 5, 10, 20, 388 40, or 60 ms of speech or audio data. Further, an arbitrary number of frames can be 389 combined into a packet. The maximum packet length is limited to the amount of encoded 390 data representing 120 ms of speech or audio data. The packetization of encoded data 391 is purely done by the Opus encoder and therefore only one packet output from the Opus 392 encoder MUST be used as a payload. 393 </t> 394 395 <t><xref target='payload-structure'/> shows the structure combined with the RTP header.</t> 396 397 <figure anchor="payload-structure" 398 title="Payload Structure with RTP header"> 399 <artwork> 400 <![CDATA[ 401 +----------+--------------+ 402 |RTP Header| Opus Payload | 403 +----------+--------------+ 404 ]]> 405 </artwork> 406 </figure> 407 408 <t> 409 <xref target='opus-packetization'/> shows supported frame sizes in 410 milliseconds of encoded speech or audio data for speech and audio mode 411 (Mode) and sampling rates (fs) of Opus and how the timestamp needs to 412 be incremented for packetization (ts incr). If the Opus encoder 413 outputs multiple encoded frames into a single packet the timestamps 414 have to be added up according to the combined frames. 415 </t> 416 417 <texttable anchor='opus-packetization' title="Supported Opus frame 418 sizes and timestamp increments"> 419 <ttcol align='center'>Mode</ttcol> 420 <ttcol align='center'>fs</ttcol> 421 <ttcol align='center'>2.5</ttcol> 422 <ttcol align='center'>5</ttcol> 423 <ttcol align='center'>10</ttcol> 424 <ttcol align='center'>20</ttcol> 425 <ttcol align='center'>40</ttcol> 426 <ttcol align='center'>60</ttcol> 427 <c>ts incr</c> 428 <c>all</c> 429 <c>120</c> 430 <c>240</c> 431 <c>480</c> 432 <c>960</c> 433 <c>1920</c> 434 <c>2880</c> 435 <c>voice</c> 436 <c>nb/mb/wb/swb/fb</c> 437 <c></c> 438 <c></c> 439 <c>x</c> 440 <c>x</c> 441 <c>x</c> 442 <c>x</c> 443 <c>audio</c> 444 <c>nb/wb/swb/fb</c> 445 <c>x</c> 446 <c>x</c> 447 <c>x</c> 448 <c>x</c> 449 <c></c> 450 <c></c> 451 </texttable> 452 453 </section> 454 455 </section> 456 457 <section title='Congestion Control'> 458 459 <t>The adaptive nature of the Opus codec allows for an efficient 460 congestion control.</t> 461 462 <t>The target bitrate of Opus can be adjusted at any point in time and 463 thus allowing for an efficient congestion control. Furthermore, the amount 464 of encoded speech or audio data encoded in a 465 single packet can be used for congestion control since the transmission 466 rate is inversely proportional to these frame sizes. A lower packet 467 transmission rate reduces the amount of header overhead but at the same 468 time increases latency and error sensitivity and should be done with care.</t> 469 470 <t>It is RECOMMENDED that congestion control is applied during the 471 transmission of Opus encoded data.</t> 472 </section> 473 474 <section title='IANA Considerations'> 475 <t>One media subtype (audio/opus) has been defined and registered as 476 described in the following section.</t> 477 478 <section title='Opus Media Type Registration'> 479 <t>Media type registration is done according to <xref 480 target="RFC4288"/> and <xref target="RFC4855"/>.<vspace 481 blankLines='1'/></t> 482 483 <t>Type name: audio<vspace blankLines='1'/></t> 484 <t>Subtype name: opus<vspace blankLines='1'/></t> 485 486 <t>Required parameters:</t> 487 <t><list style="hanging"> 488 <t hangText="rate:"> RTP timestamp clock rate is incremented with 489 48000 Hz clock rate for all modes of Opus and all sampling 490 frequencies. For audio sampling rates other than 48000 Hz the rate 491 has to be adjusted to 48000 Hz according to <xref target="fs-upsample-factors"/>. 492 </t> 493 </list></t> 494 495 <t>Optional parameters:</t> 496 497 <t><list style="hanging"> 498 <t hangText="maxplaybackrate:"> 499 a hint about the maximum output sampling rate that the receiver is 500 capable of rendering in Hz. 501 The decoder MUST be capable of decoding 502 any audio bandwidth but due to hardware limitations only signals 503 up to the specified sampling rate can be played back. Sending signals 504 with higher audio bandwidth results in higher than necessary network 505 usage and encoding complexity, so an encoder SHOULD NOT encode 506 frequencies above the audio bandwidth specified by maxplaybackrate. 507 This parameter can take any value between 8000 and 48000, although 508 commonly the value will match one of the Opus bandwidths 509 (<xref target="bandwidth_definitions"/>). 510 By default, the receiver is assumed to have no limitations, i.e. 48000. 511 <vspace blankLines='1'/> 512 </t> 513 514 <t hangText="sprop-maxcapturerate:"> 515 a hint about the maximum input sampling rate that the sender is likely to produce. 516 This is not a guarantee that the sender will never send any higher bandwidth 517 (e.g. it could send a pre-recorded prompt that uses a higher bandwidth), but it 518 indicates to the receiver that frequencies above this maximum can safely be discarded. 519 This parameter is useful to avoid wasting receiver resources by operating the audio 520 processing pipeline (e.g. echo cancellation) at a higher rate than necessary. 521 This parameter can take any value between 8000 and 48000, although 522 commonly the value will match one of the Opus bandwidths 523 (<xref target="bandwidth_definitions"/>). 524 By default, the sender is assumed to have no limitations, i.e. 48000. 525 <vspace blankLines='1'/> 526 </t> 527 528 <t hangText="maxptime:"> the decoder's maximum length of time in 529 milliseconds rounded up to the next full integer value represented 530 by the media in a packet that can be 531 encapsulated in a received packet according to Section 6 of 532 <xref target="RFC4566"/>. Possible values are 3, 5, 10, 20, 40, 533 and 60 or an arbitrary multiple of Opus frame sizes rounded up to 534 the next full integer value up to a maximum value of 120 as 535 defined in <xref target='opus-rtp-payload-format'/>. If no value is 536 specified, 120 is assumed as default. This value is a recommendation 537 by the decoding side to ensure the best 538 performance for the decoder. The decoder MUST be 539 capable of accepting any allowed packet sizes to 540 ensure maximum compatibility. 541 <vspace blankLines='1'/></t> 542 543 <t hangText="ptime:"> the decoder's recommended length of time in 544 milliseconds rounded up to the next full integer value represented 545 by the media in a packet according to 546 Section 6 of <xref target="RFC4566"/>. Possible values are 547 3, 5, 10, 20, 40, or 60 or an arbitrary multiple of Opus frame sizes 548 rounded up to the next full integer value up to a maximum 549 value of 120 as defined in <xref 550 target='opus-rtp-payload-format'/>. If no value is 551 specified, 20 is assumed as default. If ptime is greater than 552 maxptime, ptime MUST be ignored. This parameter MAY be changed 553 during a session. This value is a recommendation by the decoding 554 side to ensure the best 555 performance for the decoder. The decoder MUST be 556 capable of accepting any allowed packet sizes to 557 ensure maximum compatibility. 558 <vspace blankLines='1'/></t> 559 560 <t hangText="minptime:"> the decoder's minimum length of time in 561 milliseconds rounded up to the next full integer value represented 562 by the media in a packet that SHOULD 563 be encapsulated in a received packet according to Section 6 of <xref 564 target="RFC4566"/>. Possible values are 3, 5, 10, 20, 40, and 60 565 or an arbitrary multiple of Opus frame sizes rounded up to the next 566 full integer value up to a maximum value of 120 567 as defined in <xref target='opus-rtp-payload-format'/>. If no value is 568 specified, 3 is assumed as default. This value is a recommendation 569 by the decoding side to ensure the best 570 performance for the decoder. The decoder MUST be 571 capable to accept any allowed packet sizes to 572 ensure maximum compatibility. 573 <vspace blankLines='1'/></t> 574 575 <t hangText="maxaveragebitrate:"> specifies the maximum average 576 receive bitrate of a session in bits per second (b/s). The actual 577 value of the bitrate may vary as it is dependent on the 578 characteristics of the media in a packet. Note that the maximum 579 average bitrate MAY be modified dynamically during a session. Any 580 positive integer is allowed but values outside the range between 581 6000 and 510000 SHOULD be ignored. If no value is specified, the 582 maximum value specified in <xref target='bitrate_by_bandwidth'/> 583 for the corresponding mode of Opus and corresponding maxplaybackrate: 584 will be the default.<vspace blankLines='1'/></t> 585 586 <t hangText="stereo:"> 587 specifies whether the decoder prefers receiving stereo or mono signals. 588 Possible values are 1 and 0 where 1 specifies that stereo signals are preferred 589 and 0 specifies that only mono signals are preferred. 590 Independent of the stereo parameter every receiver MUST be able to receive and 591 decode stereo signals but sending stereo signals to a receiver that signaled a 592 preference for mono signals may result in higher than necessary network 593 utilisation and encoding complexity. If no value is specified, mono 594 is assumed (stereo=0).<vspace blankLines='1'/> 595 </t> 596 597 <t hangText="sprop-stereo:"> 598 specifies whether the sender is likely to produce stereo audio. 599 Possible values are 1 and 0 where 1 specifies that stereo signals are likely to 600 be sent, and 0 speficies that the sender will likely only send mono. 601 This is not a guarantee that the sender will never send stereo audio 602 (e.g. it could send a pre-recorded prompt that uses stereo), but it 603 indicates to the receiver that the received signal can be safely downmixed to mono. 604 This parameter is useful to avoid wasting receiver resources by operating the audio 605 processing pipeline (e.g. echo cancellation) in stereo when not necessary. 606 If no value is specified, mono 607 is assumed (sprop-stereo=0).<vspace blankLines='1'/> 608 </t> 609 610 <t hangText="cbr:"> 611 specifies if the decoder prefers the use of a constant bitrate versus 612 variable bitrate. Possible values are 1 and 0 where 1 specifies constant 613 bitrate and 0 specifies variable bitrate. If no value is specified, cbr 614 is assumed to be 0. Note that the maximum average bitrate may still be 615 changed, e.g. to adapt to changing network conditions.<vspace blankLines='1'/> 616 </t> 617 618 <t hangText="useinbandfec:"> specifies that the decoder has the capability to 619 take advantage of the Opus in-band FEC. Possible values are 1 and 0. It is RECOMMENDED to provide 620 0 in case FEC cannot be utilized on the receiving side. If no 621 value is specified, useinbandfec is assumed to be 0. 622 This parameter is only a preference and the receiver MUST be able to process 623 packets that include FEC information, even if it means the FEC part is discarded. 624 <vspace blankLines='1'/></t> 625 626 <t hangText="usedtx:"> specifies if the decoder prefers the use of 627 DTX. Possible values are 1 and 0. If no value is specified, usedtx 628 is assumed to be 0.<vspace blankLines='1'/></t> 629 </list></t> 630 631 <t>Encoding considerations:<vspace blankLines='1'/></t> 632 <t><list style="hanging"> 633 <t>Opus media type is framed and consists of binary data according 634 to Section 4.8 in <xref target="RFC4288"/>.</t> 635 </list></t> 636 637 <t>Security considerations: </t> 638 <t><list style="hanging"> 639 <t>See <xref target='security-considerations'/> of this document.</t> 640 </list></t> 641 642 <t>Interoperability considerations: none<vspace blankLines='1'/></t> 643 <t>Published specification: none<vspace blankLines='1'/></t> 644 645 <t>Applications that use this media type: </t> 646 <t><list style="hanging"> 647 <t>Any application that requires the transport of 648 speech or audio data may use this media type. Some examples are, 649 but not limited to, audio and video conferencing, Voice over IP, 650 media streaming.</t> 651 </list></t> 652 653 <t>Person & email address to contact for further information:</t> 654 <t><list style="hanging"> 655 <t>SILK Support silksupport (a] skype.net</t> 656 <t>Jean-Marc Valin jmvalin (a] jmvalin.ca</t> 657 </list></t> 658 659 <t>Intended usage: COMMON<vspace blankLines='1'/></t> 660 661 <t>Restrictions on usage:<vspace blankLines='1'/></t> 662 663 <t><list style="hanging"> 664 <t>For transfer over RTP, the RTP payload format (<xref 665 target='opus-rtp-payload-format'/> of this document) SHALL be 666 used.</t> 667 </list></t> 668 669 <t>Author:</t> 670 <t><list style="hanging"> 671 <t>Julian Spittka jspittka (a] gmail.com<vspace blankLines='1'/></t> 672 <t>Koen Vos koenvos74 (a] gmail.com<vspace blankLines='1'/></t> 673 <t>Jean-Marc Valin jmvalin (a] jmvalin.ca<vspace blankLines='1'/></t> 674 </list></t> 675 676 <t> Change controller: TBD</t> 677 </section> 678 679 <section title='Mapping to SDP Parameters'> 680 <t>The information described in the media type specification has a 681 specific mapping to fields in the Session Description Protocol (SDP) 682 <xref target="RFC4566"/>, which is commonly used to describe RTP 683 sessions. When SDP is used to specify sessions employing the Opus codec, 684 the mapping is as follows:</t> 685 686 <t> 687 <list style="symbols"> 688 <t>The media type ("audio") goes in SDP "m=" as the media name.</t> 689 690 <t>The media subtype ("opus") goes in SDP "a=rtpmap" as the encoding 691 name. The RTP clock rate in "a=rtpmap" MUST be 48000 and the number of 692 channels MUST be 2.</t> 693 694 <t>The OPTIONAL media type parameters "ptime" and "maxptime" are 695 mapped to "a=ptime" and "a=maxptime" attributes, respectively, in the 696 SDP.</t> 697 698 <t>The OPTIONAL media type parameters "maxaveragebitrate", 699 "maxplaybackrate", "minptime", "stereo", "cbr", "useinbandfec", and 700 "usedtx", when present, MUST be included in the "a=fmtp" attribute 701 in the SDP, expressed as a media type string in the form of a 702 semicolon-separated list of parameter=value pairs (e.g., 703 maxaveragebitrate=20000). They MUST NOT be specified in an 704 SSRC-specific "fmtp" source-level attribute (as defined in 705 Section 6.3 of <xref target="RFC5576"/>).</t> 706 707 <t>The OPTIONAL media type parameters "sprop-maxcapturerate", 708 and "sprop-stereo" MAY be mapped to the "a=fmtp" SDP attribute by 709 copying them directly from the media type parameter string as part 710 of the semicolon-separated list of parameter=value pairs (e.g., 711 sprop-stereo=1). These same OPTIONAL media type parameters MAY also 712 be specified using an SSRC-specific "fmtp" source-level attribute 713 as described in Section 6.3 of <xref target="RFC5576"/>. 714 They MAY be specified in both places, in which case the parameter 715 in the source-level attribute overrides the one found on the 716 "a=fmtp" line. The value of any parameter which is not specified in 717 a source-level source attribute MUST be taken from the "a=fmtp" 718 line, if it is present there.</t> 719 720 </list> 721 </t> 722 723 <t>Below are some examples of SDP session descriptions for Opus:</t> 724 725 <t>Example 1: Standard mono session with 48000 Hz clock rate</t> 726 <figure> 727 <artwork> 728 <![CDATA[ 729 m=audio 54312 RTP/AVP 101 730 a=rtpmap:101 opus/48000/2 731 ]]> 732 </artwork> 733 </figure> 734 735 736 <t>Example 2: 16000 Hz clock rate, maximum packet size of 40 ms, 737 recommended packet size of 40 ms, maximum average bitrate of 20000 bps, 738 prefers to receive stereo but only plans to send mono, FEC is allowed, 739 DTX is not allowed</t> 740 741 <figure> 742 <artwork> 743 <![CDATA[ 744 m=audio 54312 RTP/AVP 101 745 a=rtpmap:101 opus/48000/2 746 a=fmtp:101 maxplaybackrate=16000; sprop-maxcapturerate=16000; 747 maxaveragebitrate=20000; stereo=1; useinbandfec=1; usedtx=0 748 a=ptime:40 749 a=maxptime:40 750 ]]> 751 </artwork> 752 </figure> 753 754 <t>Example 3: Two-way full-band stereo preferred</t> 755 756 <figure> 757 <artwork> 758 <![CDATA[ 759 m=audio 54312 RTP/AVP 101 760 a=rtpmap:101 opus/48000/2 761 a=fmtp:101 stereo=1; sprop-stereo=1 762 ]]> 763 </artwork> 764 </figure> 765 766 767 <section title='Offer-Answer Model Considerations for Opus'> 768 769 <t>When using the offer-answer procedure described in <xref 770 target="RFC3264"/> to negotiate the use of Opus, the following 771 considerations apply:</t> 772 773 <t><list style="symbols"> 774 775 <t>Opus supports several clock rates. For signaling purposes only 776 the highest, i.e. 48000, is used. The actual clock rate of the 777 corresponding media is signaled inside the payload and is not 778 subject to this payload format description. The decoder MUST be 779 capable to decode every received clock rate. An example 780 is shown below: 781 782 <figure> 783 <artwork> 784 <![CDATA[ 785 m=audio 54312 RTP/AVP 100 786 a=rtpmap:100 opus/48000/2 787 ]]> 788 </artwork> 789 </figure> 790 </t> 791 792 <t>The "ptime" and "maxptime" parameters are unidirectional 793 receive-only parameters and typically will not compromise 794 interoperability; however, dependent on the set values of the 795 parameters the performance of the application may suffer. <xref 796 target="RFC3264"/> defines the SDP offer-answer handling of the 797 "ptime" parameter. The "maxptime" parameter MUST be handled in the 798 same way.</t> 799 800 <t> 801 The "minptime" parameter is a unidirectional 802 receive-only parameters and typically will not compromise 803 interoperability; however, dependent on the set values of the 804 parameter the performance of the application may suffer and should be 805 set with care. 806 </t> 807 808 <t> 809 The "maxplaybackrate" parameter is a unidirectional receive-only 810 parameter that reflects limitations of the local receiver. The sender 811 of the other side SHOULD NOT send with an audio bandwidth higher than 812 "maxplaybackrate" as this would lead to inefficient use of network resources. 813 The "maxplaybackrate" parameter does not 814 affect interoperability. Also, this parameter SHOULD NOT be used 815 to adjust the audio bandwidth as a function of the bitrates, as this 816 is the responsibility of the Opus encoder implementation. 817 </t> 818 819 <t>The "maxaveragebitrate" parameter is a unidirectional receive-only 820 parameter that reflects limitations of the local receiver. The sender 821 of the other side MUST NOT send with an average bitrate higher than 822 "maxaveragebitrate" as it might overload the network and/or 823 receiver. The "maxaveragebitrate" parameter typically will not 824 compromise interoperability; however, dependent on the set value of 825 the parameter the performance of the application may suffer and should 826 be set with care.</t> 827 828 <t>The "sprop-maxcapturerate" and "sprop-stereo" parameters are 829 unidirectional sender-only parameters that reflect limitations of 830 the sender side. 831 They allow the receiver to set up a reduced-complexity audio 832 processing pipeline if the sender is not planning to use the full 833 range of Opus's capabilities. 834 Neither "sprop-maxcapturerate" nor "sprop-stereo" affect 835 interoperability and the receiver MUST be capable of receiving any signal. 836 </t> 837 838 <t> 839 The "stereo" parameter is a unidirectional receive-only 840 parameter. 841 </t> 842 843 <t> 844 The "cbr" parameter is a unidirectional receive-only 845 parameter. 846 </t> 847 848 <t>The "useinbandfec" parameter is a unidirectional receive-only 849 parameter.</t> 850 851 <t>The "usedtx" parameter is a unidirectional receive-only 852 parameter.</t> 853 854 <t>Any unknown parameter in an offer MUST be ignored by the receiver 855 and MUST be removed from the answer.</t> 856 857 </list></t> 858 </section> 859 860 <section title='Declarative SDP Considerations for Opus'> 861 862 <t>For declarative use of SDP such as in Session Announcement Protocol 863 (SAP), <xref target="RFC2974"/>, and RTSP, <xref target="RFC2326"/>, for 864 Opus, the following needs to be considered:</t> 865 866 <t><list style="symbols"> 867 868 <t>The values for "maxptime", "ptime", "minptime", "maxplaybackrate", and 869 "maxaveragebitrate" should be selected carefully to ensure that a 870 reasonable performance can be achieved for the participants of a session.</t> 871 872 <t> 873 The values for "maxptime", "ptime", and "minptime" of the payload 874 format configuration are recommendations by the decoding side to ensure 875 the best performance for the decoder. The decoder MUST be 876 capable to accept any allowed packet sizes to 877 ensure maximum compatibility. 878 </t> 879 880 <t>All other parameters of the payload format configuration are declarative 881 and a participant MUST use the configurations that are provided for 882 the session. More than one configuration may be provided if necessary 883 by declaring multiple RTP payload types; however, the number of types 884 should be kept small.</t> 885 </list></t> 886 </section> 887 </section> 888 </section> 889 890 <section title='Security Considerations' anchor='security-considerations'> 891 892 <t>All RTP packets using the payload format defined in this specification 893 are subject to the general security considerations discussed in the RTP 894 specification <xref target="RFC3550"/> and any profile from 895 e.g. <xref target="RFC3711"/> or <xref target="RFC3551"/>.</t> 896 897 <t>This payload format transports Opus encoded speech or audio data, 898 hence, security issues include confidentiality, integrity protection, and 899 authentication of the speech or audio itself. The Opus payload format does 900 not have any built-in security mechanisms. Any suitable external 901 mechanisms, such as SRTP <xref target="RFC3711"/>, MAY be used.</t> 902 903 <t>This payload format and the Opus encoding do not exhibit any 904 significant non-uniformity in the receiver-end computational load and thus 905 are unlikely to pose a denial-of-service threat due to the receipt of 906 pathological datagrams.</t> 907 </section> 908 909 <section title='Acknowledgements'> 910 <t>TBD</t> 911 </section> 912 </middle> 913 914 <back> 915 <references title="Normative References"> 916 &rfc2119; 917 &rfc3550; 918 &rfc3711; 919 &rfc3551; 920 &rfc4288; 921 &rfc4855; 922 &rfc4566; 923 &rfc3264; 924 &rfc2974; 925 &rfc2326; 926 &rfc5576; 927 &rfc6562; 928 &rfc6716; 929 </references> 930 931 </back> 932 </rfc> 933