1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 2 <html> 3 <head> 4 5 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/> 6 <title>Ogg Documentation</title> 7 8 <style type="text/css"> 9 body { 10 margin: 0 18px 0 18px; 11 padding-bottom: 30px; 12 font-family: Verdana, Arial, Helvetica, sans-serif; 13 color: #333333; 14 font-size: .8em; 15 } 16 17 a { 18 color: #3366cc; 19 } 20 21 img { 22 border: 0; 23 } 24 25 #xiphlogo { 26 margin: 30px 0 16px 0; 27 } 28 29 #content p { 30 line-height: 1.4; 31 } 32 33 h1, h1 a, h2, h2 a, h3, h3 a { 34 font-weight: bold; 35 color: #ff9900; 36 margin: 1.3em 0 8px 0; 37 } 38 39 h1 { 40 font-size: 1.3em; 41 } 42 43 h2 { 44 font-size: 1.2em; 45 } 46 47 h3 { 48 font-size: 1.1em; 49 } 50 51 li { 52 line-height: 1.4; 53 } 54 55 #copyright { 56 margin-top: 30px; 57 line-height: 1.5em; 58 text-align: center; 59 font-size: .8em; 60 color: #888888; 61 clear: both; 62 } 63 </style> 64 65 </head> 66 67 <body> 68 69 <div id="xiphlogo"> 70 <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a> 71 </div> 72 73 <h1>Ogg bitstream overview</h1> 74 75 This document serves as starting point for understanding the design 76 and implementation of the Ogg container format. If you're new to Ogg 77 or merely want a high-level technical overview, start reading here. 78 Other documents linked from the <a href="index.html">index page</a> 79 give distilled technical descriptions and references of the container 80 mechanisms. This document is intended to aid understanding. 81 82 <h2>Container format design points</h2> 83 84 <p>Ogg is intended to be a simplest-possible container, concerned only 85 with framing, ordering, and interleave. It can be used as a stream delivery 86 mechanism, for media file storage, or as a building block toward 87 implementing a more complex, non-linear container (for example, see 88 the <a href="skeleton.html">Skeleton</a> or <a 89 href="http://en.wikipedia.org/wiki/Annodex">Annodex/CMML</a>). 90 91 <p>The Ogg container is not intended to be a monolithic 92 'kitchen-sink'. It exists only to frame and deliver in-order stream 93 data and as such is vastly simpler than most other containers. 94 Elementary and multiplexed streams are both constructed entirely from a 95 single building block (an Ogg page) comprised of eight fields 96 totalling twenty-eight bytes (the page header) a list of packet lengths 97 (up to 255 bytes) and payload data (up to 65025 bytes). The structure 98 of every page is the same. There are no optional fields or alternate 99 encodings. 100 101 <p>Stream and media metadata is contained in Ogg and not built into 102 the Ogg container itself. Metadata is thus compartmentalized and 103 layered rather than part of a monolithic design, an especially good 104 idea as no two groups seem able to agree on what a complete or 105 complete-enough metadata set should be. In this way, the container and 106 container implementation are isolated from unnecessary design flux. 107 108 <h3>Streaming</h3> 109 110 <p>The Ogg container is primarily a streaming format, 111 encapsulating chronological, time-linear mixed media into a single 112 delivery stream or file. The design is such that an application can 113 always encode and/or decode all features of a bitstream in one pass 114 with no seeking and minimal buffering. Seeking to provide optimized 115 encoding (such as two-pass encoding) or interactive decoding (such as 116 scrubbing or instant replay) is not disallowed or discouraged, however 117 no container feature requires nonlinear access of the bitstream. 118 119 <h3>Variable Bit Rate, Variable Payload Size</h3> 120 121 <p>Ogg is designed to contain any size data payload with bounded, 122 predictable efficiency. Ogg packets have no maximum size and a 123 zero-byte minimum size. There is no restriction on size changes from 124 packet to packet. Variable size packets do not require the use of any 125 optional or additional container features. There is no optimal 126 suggested packet size, though special consideration was paid to make 127 sure 50-200 byte packets were no less efficient than larger packet 128 sizes. The original design criteria was a 2% overhead at 50 byte 129 packets, dropping to a maximum working overhead of 1% with larger 130 packets, and a typical working overhead of .5-.7% for most practical 131 uses. 132 133 <h3>Simple pagination</h3> 134 135 <p>Ogg is a byte-aligned container with no context-dependent, optional 136 or variable-length fields. Ogg requires no repacking of codec data. 137 The page structure is written out in-line as packet data is submitted 138 to the streaming abstraction. In addition, it is possible to 139 implement both Ogg mux and demux as MT-hot zero-copy abstractions (as 140 is done in the Tremor sourcebase). 141 142 <h3>Capture</h3> 143 144 <p>Ogg is designed for efficient and immediate stream capture with 145 high confidence. Although packets have no size limit in Ogg, pages 146 are a maximum of just under 64kB meaning that any Ogg stream can be 147 captured with confidence after seeing 128kB of data or less [worst 148 case; typical figure is 6kB] from any random starting point in the 149 stream. 150 151 <h3>Seeking</h3> 152 153 <p>Ogg implements simple coarse- and fine-grained seeking by design. 154 155 <p>Coarse seeking may be performed by simply 'moving the tone arm' to a 156 new position and 'dropping the needle'. Rapid capture with 157 accompanying timecode from any location in an Ogg file is guaranteed 158 by the stream design. From the acquisition of the first timecode, 159 all data needed to play back from that time code forward is ahead of 160 the stream cursor. 161 162 <p>Ogg implements full sample-granularity seeking using an 163 interpolated bisection search built on the capture and timecode 164 mechanisms used by coarse seeking. As above, once a search finds 165 the desired timecode, all data needed to play back from that time code 166 forward is ahead of the stream cursor. 167 168 <p>Both coarse and fine seeking use the page structure and sequencing 169 inherent to the Ogg format. All Ogg streams are fully seekable from 170 creation; seekability is unaffected by truncation or missing data, and 171 is tolerant of gross corruption. Seek operations are neither 'fuzzy' nor 172 heuristic. 173 174 <p>Seeking without use of an index is a major point of the Ogg 175 design. There are several reasons why Ogg forgoes an index: 176 177 <ul> 178 179 <li>It must be possible to create an Ogg stream in a single pass, and 180 an index requires either two passes to create, or the index must be 181 tacked onto the end of a live stream after the stream is finished. 182 Both methods run afoul of other design constraints. 183 184 <li>An index is only marginally useful in Ogg for the complexity 185 added; it adds no new functionality and seldom improves performance 186 noticeably. Empirical testing shows that indexless interpolation 187 search does not require many more seeks in practice than using an 188 index would. 189 190 <li>'Optional' indexes encourage lazy implementations that can seek 191 only when indexes are present, or that implement indexless seeking 192 only by building an internal index after reading the entire file 193 beginning to end. This has been the fate of other containers that 194 specify optional indexing. 195 196 </ul> 197 198 <h3>Simple multiplexing</h3> 199 200 <p>Ogg multiplexes streams by interleaving pages from multiple elementary streams into a 201 multiplexed stream in time order. The multiplexed pages are not 202 altered. Muxing an Ogg AV stream out of separate audio, 203 video and data streams is akin to shuffling several decks of cards 204 together into a single deck; the cards themselves remain unchanged. 205 Demultiplexing is similarly simple (as the cards are marked). 206 207 <p>The goal of this design is to make the mux/demux operation as 208 trivial as possible to allow live streaming systems to build and 209 rebuild streams on the fly with minimal CPU usage and no additional 210 storage or latency requirements. 211 212 <h3>Continuous and Discontinuous Media</h3> 213 214 <p>Ogg streams belong to one of two categories, "Continuous" streams and 215 "Discontinuous" streams. 216 217 <p>A stream that provides a gapless, time-continuous media type with a 218 fine-grained timebase is considered to be 'Continuous'. A continuous 219 stream should never be starved of data. Examples of continuous data 220 types include broadcast audio and video. 221 222 <p>A stream that delivers data in a potentially irregular pattern or 223 with widely spaced timing gaps is considered to be 'Discontinuous'. A 224 discontinuous stream may be best thought of as data representing 225 scattered events; although they happen in order, they are typically 226 unconnected data often located far apart. One example of a 227 discontinuous stream types would be captioning such as <a 228 href="http://wiki.xiph.org/OggKate">Ogg Kate</a>. Although it's 229 possible to design captions as a continuous stream type, it's most 230 natural to think of captions as widely spaced pieces of text with 231 little happening between. 232 233 <p>The fundamental reason for distinction between continuous and 234 discontinuous streams concerns buffering. 235 236 <h3>Buffering</h3> 237 238 <p>A continuous stream is, by definition, gapless. Ogg buffering is based 239 on the simple premise of never allowing an active continuous stream 240 to starve for data during decode; buffering works ahead until all 241 continuous streams in a physical stream have data ready and no further. 242 243 <p>Discontinuous stream data is not assumed to be predictable. The 244 buffering design takes discontinuous data 'as it comes' rather than 245 working ahead to look for future discontinuous data for a potentially 246 unbounded period. Thus, the buffering process makes no attempt to fill 247 discontinuous stream buffers; their pages simply 'fall out' of the 248 stream when continuous streams are handled properly. 249 250 <p>Buffering requirements in this design need not be explicitly 251 declared or managed in the encoded stream. The decoder simply reads as 252 much data as is necessary to keep all continuous stream types gapless 253 and no more, with discontinuous data processed as it arrives in the 254 continuous data. Buffering is implicitly optimal for the given 255 stream. Because all pages of all data types are stamped with absolute 256 timing information within the stream, inter-stream synchronization 257 timing is always maintained without the need for explicitly declared 258 buffer-ahead hinting. 259 260 <h3>Codec metadata</h3> 261 262 <p>Ogg does not replicate codec-specific metadata into the mux layer 263 in an attempt to make the mux and codec layer implementations 'fully 264 separable'. Things like specific timebase, keyframing strategy, frame 265 duration, etc, do not appear in the Ogg container. The mux layer is, 266 instead, expected to query a codec through a standardized interface, 267 left to the implementation, for this data when it is needed. 268 269 <p>Though modern design wisdom usually prefers to predict all possible 270 needs of current and future codecs then embed these dependencies and 271 the required metadata into the container itself, this strategy 272 increases container specification complexity, fragility, and rigidity. 273 The mux and codec implementations become more independent, but the 274 specifications become less independent. A codec can't do what a 275 container hasn't already provided for. New codecs are harder to 276 support, and you can do fewer useful things with the ones you've 277 already got (eg, try to make a good splitter without using any codecs. 278 You're stuck splitting at keyframes only, or building yet another new 279 mechanism into the container layer to mark what frames to skip 280 displaying). 281 282 <p>Ogg's design goes the opposite direction, where the specification 283 is to be as simple, easy to understand, and 'proofed' against novel 284 codecs as possible. When an Ogg mux layer requires codec-specific 285 information, it queries the codec (or a codec stub). This trades a 286 more complex implementation for a simpler, more flexible 287 specification. 288 289 <h3>Stream structure metadata</h3> 290 291 <p>The Ogg container itself does not define a metadata system for 292 declaring the structure and interrelations between multiple media 293 types in a muxed stream. That is, the Ogg container itself does not 294 specify data like 'which steam is the subtitle stream?' or 'which 295 video stream is the primary angle?'. This metadata still exists, but 296 is stored in the Ogg container rather than being built into the Ogg 297 container. Xiph specifies the 'Skeleton' metadata format for Ogg 298 streams, but this decoupling of container and stream structure 299 metadata means it is possible to use Ogg with any metadata 300 specification without altering the container itself, or without stream 301 structure metadata at all. 302 303 <h3>Frame accurate absolute position</h3> 304 305 <p>Every Ogg page is stamped with a 64 bit 'granule position' that 306 serves as an absolute timestamp for mux and seeking. A few nifty 307 little tricks are usually also embedded in the granpos state, but 308 we'll leave those aside for the moment (strictly speaking, they're 309 part of each codec's mapping, not Ogg). 310 311 <p>As previously mentioned above, granule positions are mapped into 312 absolute timestamps by the codec, rather than being a hard timestamp. 313 This allows maximally efficient use of the available 64 bits to 314 address every sample/frame position without approximation while 315 supporting new and previously unknown timebase encodings without 316 needing to extend or update the mux layer. When a codec needs a novel 317 timebase, it simply brings the code for that mapping along with it. 318 This is not a theoretical curiosity; new, wholly novel timebases were 319 deployed with the adoption of both Theora and Dirac. "Rolling INTRA" 320 (keyframeless video) also benefits from novel use of the granule 321 position. 322 323 <h2>Ogg stream arrangement</h2> 324 325 <h3>Packets, pages, and bitstreams</h3> 326 327 <p>Ogg codecs use <em>packets</em>. Packets are octet payloads of 328 raw, compressed data, containing the data needed for a single 329 decompressed unit, eg, one video frame. Packets have no maximum size 330 and may be zero length. They do not have any high-level structure or 331 boundary information; strung together, the unframed packets form a 332 <em>logical bitstream</em> of apparently random bytes with no internal 333 landmarks. 334 335 <p>Logical bitstream packets are grouped and framed into Ogg pages 336 along with a unique stream <em>serial number</em> to produce a 337 <em>physical bitstream</em>. An <em>elementary stream</em> is a 338 physical bitstream containing only the pages framing a single logical 339 bitstream. Each page is a self contained entity, although a packet may 340 be split and encoded across one or more pages. The page decode 341 mechanism is designed to recognize, verify and handle single pages at 342 a time from the overall bitstream. 343 344 <p><a href="framing.html">Ogg Bitstream Framing</a> specifies 345 the page format of an Ogg bitstream, the packet coding process 346 and elementary bitstreams in detail. 347 348 <h3>Multiplexed bitstreams</h3> 349 350 <p>Multiple logical/elementary bitstreams can be combined into a single 351 <em>multiplexed bitstream</em> by interleaving whole pages from each 352 contributing elementary stream in time order. The result is a single 353 physical stream that multiplexes and frames multiple logical streams. 354 Each logical stream is identified by the unique stream serial number 355 stamped in its pages. A physical stream may include a 'meta-header' 356 (such as the <a href="skeleton.html">Ogg Skeleton</a>) comprising its 357 own Ogg page at the beginning of the physical stream. A decoder 358 recovers the original logical/elementary bitstreams out of the 359 physical bitstream by taking the pages in order from the physical 360 bitstream and redirecting them into the appropriate logical decoding 361 entity. 362 363 <p><a href="ogg-multiplex.html">Ogg Bitstream Multiplexing</a> specifies 364 proper multiplexing of an Ogg bitstream in detail. 365 366 <h3>Chaining</h3> 367 368 <p>Multiple Ogg physical bitstreams may be concatenated into a single new 369 stream; this is <em>chaining</em>. The bitstreams do not overlap; the 370 final page of a given logical bitstream is immediately followed by the 371 initial page of the next.</p> 372 373 <p>Each logical bitstream in a chain must have a unique serial number 374 within the scope of the full physical bitstream, not only within a 375 particular <em>link</em> or <em>segment</em> of the chain.</p> 376 377 <h3>Continuous and discontinuous streams</h3> 378 379 <p>Within Ogg, each stream must be declared (by the codec) to be 380 continuous- or discontinuous-time. Most codecs treat all streams they 381 use as either inherently continuous- or discontinuous-time, although 382 this is not a requirement. A codec may, as part of its mapping, choose 383 according to data in the initial header. 384 385 <p>Continuous-time pages are stamped by end-time, discontinuous pages 386 are stamped by begin-time. Pages in a multiplexed stream are 387 interleaved in order of the time stamp regardless of stream type. 388 Both continuous and discontinuous logical streams are used to seek 389 within a physical stream, however only continuous streams are used to 390 determine buffering depth; because discontinuous streams are stamped 391 by start time, they will always 'fall out' in time when buffering 392 tracks only the continuous streams. See 'Examples' for an 393 illustration of the buffering mechanism. 394 395 <h2>Mapping Requirements</h2> 396 397 <p>Each codec is allowed some freedom in deciding how its logical 398 bitstream is encapsulated into an Ogg bitstream (even if it is a 399 trivial mapping, eg, 'plop the packets in and go'). This is the 400 codec's <em>mapping</em>. Ogg imposes a few mapping requirements 401 on any codec. 402 403 <p>The <a href="framing.html">framing specification</a> defines 404 'beginning of stream' and 'end of stream' page markers via a header 405 flag (it is possible for a stream to consist of a single page). A 406 correct stream always consists of an integer number of pages, an easy 407 requirement given the variable size nature of pages.</p> 408 409 <p>The first page of an elementary Ogg bitstream consists of a single, 410 small 'initial header' packet that must include sufficient information 411 to identify the exact CODEC type. From this initial header, the codec 412 must also be able to determine its timebase and whether or not it is a 413 continuous- or discontinuous-time stream. The initial header must fit 414 on a single page. If a codec makes use of auxiliary headers (for 415 example, Vorbis uses two auxiliary headers), these headers must follow 416 the initial header immediately. The last header finishes its page; 417 data begins on a fresh page. 418 419 <p>As an example, Ogg Vorbis places the name and revision of the 420 Vorbis CODEC, the audio rate and the audio quality into this initial 421 header. Comments and detailed codec setup appears in the larger 422 auxiliary headers.</p> 423 424 <h2>Multiplexing Requirements</h2> 425 426 <p>Multiplexing requirements within Ogg are straightforward. When 427 constructing a single-link (unchained) physical bitstream consisting 428 of multiple elementary streams: 429 430 <ol> 431 432 <li> The initial header for each stream appears in sequence, each 433 header on a single page. All initial headers must appear with no 434 intervening data (no auxiliary header pages or packets, no data pages 435 or packets). Order of the initial headers is unspecified. The 436 'beginning of stream' flag is set on each initial header. 437 438 <li> All auxiliary headers for all streams must follow. Order 439 is unspecified. The final auxiliary header of each stream must flush 440 its page. 441 442 <li>Data pages for each stream follow, interleaved in time order. 443 444 <li>The final page of each stream sets the 'end of stream' flag. 445 Unlike initial pages, terminal pages for the logical bitstreams need 446 not occur contiguously; indeed it may not be possible for them to do so. 447 </oL> 448 449 <p>Each grouped bitstream must have a unique serial number within the 450 scope of the physical bitstream.</p> 451 452 <h3>chaining and multiplexing</h3> 453 454 <p>Multiplexed and/or unmultiplexed bitstreams may be chained 455 consecutively. Such a physical bitstream obeys all the rules of both 456 chained and multiplexed streams. Each link, when unchained, must 457 stand on its own as a valid physical bitstream. Chained streams do 458 not mix; a new segment may not begin until all streams in the 459 preceding segment have terminated. </p> 460 461 <h2>Examples</h2> 462 463 <em>[More to come shortly; this section is currently being revised and expanded]</em> 464 465 <p>Below, we present an example of a multiplexed and chained bitstream:</p> 466 467 <p><img src="stream.png" alt="stream"/></p> 468 469 <p>In this example, we see pages from five total logical bitstreams 470 multiplexed into a physical bitstream. Note the following 471 characteristics:</p> 472 473 <ol> 474 <li>Multiplexed bitstreams in a given link begin together; all of the 475 initial pages must appear before any data pages. When concurrently 476 multiplexed groups are chained, the new group does not begin until all 477 the bitstreams in the previous group have terminated.</li> 478 479 <li>The ordering of pages of concurrently multiplexed bitstreams is 480 goverened by timestamp (not shown here); there is no regular 481 interleaving order. Pages within a logical bitstream appear in 482 sequence order.</li> 483 </ol> 484 485 <div id="copyright"> 486 The Xiph Fish Logo is a 487 trademark (™) of Xiph.Org.<br/> 488 489 These pages © 1994 - 2010 Xiph.Org. All rights reserved. 490 </div> 491 492 </body> 493 </html> 494