Home | History | Annotate | Download | only in doc
      1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
      2 <html>
      3 <head>
      4 
      5 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
      6 <title>Ogg Documentation</title>
      7 
      8 <style type="text/css">
      9 body {
     10   margin: 0 18px 0 18px;
     11   padding-bottom: 30px;
     12   font-family: Verdana, Arial, Helvetica, sans-serif;
     13   color: #333333;
     14   font-size: .8em;
     15 }
     16 
     17 a {
     18   color: #3366cc;
     19 }
     20 
     21 img {
     22   border: 0;
     23 }
     24 
     25 #xiphlogo {
     26   margin: 30px 0 16px 0;
     27 }
     28 
     29 #content p {
     30   line-height: 1.4;
     31 }
     32 
     33 h1, h1 a, h2, h2 a, h3, h3 a {
     34   font-weight: bold;
     35   color: #ff9900;
     36   margin: 1.3em 0 8px 0;
     37 }
     38 
     39 h1 {
     40   font-size: 1.3em;
     41 }
     42 
     43 h2 {
     44   font-size: 1.2em;
     45 }
     46 
     47 h3 {
     48   font-size: 1.1em;
     49 }
     50 
     51 li {
     52   line-height: 1.4;
     53 }
     54 
     55 #copyright {
     56   margin-top: 30px;
     57   line-height: 1.5em;
     58   text-align: center;
     59   font-size: .8em;
     60   color: #888888;
     61   clear: both;
     62 }
     63 </style>
     64 
     65 </head>
     66 
     67 <body>
     68 
     69 <div id="xiphlogo">
     70   <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a>
     71 </div>
     72 
     73 <h1>Ogg bitstream overview</h1>
     74 
     75 This document serves as starting point for understanding the design
     76 and implementation of the Ogg container format.  If you're new to Ogg
     77 or merely want a high-level technical overview, start reading here.
     78 Other documents linked from the <a href="index.html">index page</a>
     79 give distilled technical descriptions and references of the container
     80 mechanisms.  This document is intended to aid understanding.
     81 
     82 <h2>Container format design points</h2>
     83 
     84 <p>Ogg is intended to be a simplest-possible container, concerned only
     85 with framing, ordering, and interleave. It can be used as a stream delivery
     86 mechanism, for media file storage, or as a building block toward
     87 implementing a more complex, non-linear container (for example, see
     88 the <a href="skeleton.html">Skeleton</a> or <a
     89 href="http://en.wikipedia.org/wiki/Annodex">Annodex/CMML</a>).
     90 
     91 <p>The Ogg container is not intended to be a monolithic
     92 'kitchen-sink'.  It exists only to frame and deliver in-order stream
     93 data and as such is vastly simpler than most other containers.
     94 Elementary and multiplexed streams are both constructed entirely from a
     95 single building block (an Ogg page) comprised of eight fields
     96 totalling twenty-eight bytes (the page header) a list of packet lengths
     97 (up to 255 bytes) and payload data (up to 65025 bytes).  The structure
     98 of every page is the same.  There are no optional fields or alternate
     99 encodings.
    100 
    101 <p>Stream and media metadata is contained in Ogg and not built into
    102 the Ogg container itself.  Metadata is thus compartmentalized and
    103 layered rather than part of a monolithic design, an especially good
    104 idea as no two groups seem able to agree on what a complete or
    105 complete-enough metadata set should be. In this way, the container and
    106 container implementation are isolated from unnecessary design flux.
    107 
    108 <h3>Streaming</h3> 
    109 
    110 <p>The Ogg container is primarily a streaming format,
    111 encapsulating chronological, time-linear mixed media into a single
    112 delivery stream or file. The design is such that an application can
    113 always encode and/or decode all features of a bitstream in one pass
    114 with no seeking and minimal buffering.  Seeking to provide optimized
    115 encoding (such as two-pass encoding) or interactive decoding (such as
    116 scrubbing or instant replay) is not disallowed or discouraged, however
    117 no container feature requires nonlinear access of the bitstream.
    118 
    119 <h3>Variable Bit Rate, Variable Payload Size</h3>
    120 
    121 <p>Ogg is designed to contain any size data payload with bounded,
    122 predictable efficiency.  Ogg packets have no maximum size and a
    123 zero-byte minimum size.  There is no restriction on size changes from
    124 packet to packet. Variable size packets do not require the use of any
    125 optional or additional container features.  There is no optimal
    126 suggested packet size, though special consideration was paid to make
    127 sure 50-200 byte packets were no less efficient than larger packet
    128 sizes.  The original design criteria was a 2% overhead at 50 byte
    129 packets, dropping to a maximum working overhead of 1% with larger
    130 packets, and a typical working overhead of .5-.7% for most practical
    131 uses. 
    132 
    133 <h3>Simple pagination</h3>
    134 
    135 <p>Ogg is a byte-aligned container with no context-dependent, optional
    136 or variable-length fields.  Ogg requires no repacking of codec data.
    137 The page structure is written out in-line as packet data is submitted
    138 to the streaming abstraction.  In addition, it is possible to
    139 implement both Ogg mux and demux as MT-hot zero-copy abstractions (as
    140 is done in the Tremor sourcebase).
    141 
    142 <h3>Capture</h3>
    143 
    144 <p>Ogg is designed for efficient and immediate stream capture with
    145 high confidence.  Although packets have no size limit in Ogg, pages
    146 are a maximum of just under 64kB meaning that any Ogg stream can be
    147 captured with confidence after seeing 128kB of data or less [worst
    148 case; typical figure is 6kB] from any random starting point in the
    149 stream.
    150 
    151 <h3>Seeking</h3>
    152 
    153 <p>Ogg implements simple coarse- and fine-grained seeking by design.
    154 
    155 <p>Coarse seeking may be performed by simply 'moving the tone arm' to a
    156 new position and 'dropping the needle'.  Rapid capture with
    157 accompanying timecode from any location in an Ogg file is guaranteed
    158 by the stream design.  From the acquisition of the first timecode,
    159 all data needed to play back from that time code forward is ahead of
    160 the stream cursor.
    161 
    162 <p>Ogg implements full sample-granularity seeking using an
    163 interpolated bisection search built on the capture and timecode
    164 mechanisms used by coarse seeking.  As above, once a search finds
    165 the desired timecode, all data needed to play back from that time code
    166 forward is ahead of the stream cursor.
    167 
    168 <p>Both coarse and fine seeking use the page structure and sequencing
    169 inherent to the Ogg format.  All Ogg streams are fully seekable from
    170 creation; seekability is unaffected by truncation or missing data, and
    171 is tolerant of gross corruption.  Seek operations are neither 'fuzzy' nor
    172 heuristic.
    173 
    174 <p>Seeking without use of an index is a major point of the Ogg
    175 design. There are several reasons why Ogg forgoes an index:
    176 			  
    177 <ul>
    178 
    179 <li>It must be possible to create an Ogg stream in a single pass, and
    180 an index requires either two passes to create, or the index must be
    181 tacked onto the end of a live stream after the stream is finished.
    182 Both methods run afoul of other design constraints.
    183 
    184 <li>An index is only marginally useful in Ogg for the complexity
    185 added; it adds no new functionality and seldom improves performance
    186 noticeably.  Empirical testing shows that indexless interpolation
    187 search does not require many more seeks in practice than using an
    188 index would.
    189 
    190 <li>'Optional' indexes encourage lazy implementations that can seek
    191 only when indexes are present, or that implement indexless seeking
    192 only by building an internal index after reading the entire file
    193 beginning to end.  This has been the fate of other containers that
    194 specify optional indexing.
    195 
    196 </ul>
    197 
    198 <h3>Simple multiplexing</h3>
    199 
    200 <p>Ogg multiplexes streams by interleaving pages from multiple elementary streams into a
    201 multiplexed stream in time order.  The multiplexed pages are not
    202 altered.  Muxing an Ogg AV stream out of separate audio,
    203 video and data streams is akin to shuffling several decks of cards
    204 together into a single deck; the cards themselves remain unchanged.
    205 Demultiplexing is similarly simple (as the cards are marked).
    206 
    207 <p>The goal of this design is to make the mux/demux operation as
    208 trivial as possible to allow live streaming systems to build and
    209 rebuild streams on the fly with minimal CPU usage and no additional
    210 storage or latency requirements.
    211 
    212 <h3>Continuous and Discontinuous Media</h3>
    213 
    214 <p>Ogg streams belong to one of two categories, "Continuous" streams and
    215 "Discontinuous" streams.
    216 
    217 <p>A stream that provides a gapless, time-continuous media type with a
    218 fine-grained timebase is considered to be 'Continuous'. A continuous
    219 stream should never be starved of data. Examples of continuous data
    220 types include broadcast audio and video.
    221 
    222 <p>A stream that delivers data in a potentially irregular pattern or
    223 with widely spaced timing gaps is considered to be 'Discontinuous'. A
    224 discontinuous stream may be best thought of as data representing
    225 scattered events; although they happen in order, they are typically
    226 unconnected data often located far apart. One example of a
    227 discontinuous stream types would be captioning such as <a
    228 href="http://wiki.xiph.org/OggKate">Ogg Kate</a>. Although it's
    229 possible to design captions as a continuous stream type, it's most
    230 natural to think of captions as widely spaced pieces of text with
    231 little happening between.
    232 
    233 <p>The fundamental reason for distinction between continuous and
    234 discontinuous streams concerns buffering.
    235 
    236 <h3>Buffering</h3>
    237 
    238 <p>A continuous stream is, by definition, gapless. Ogg buffering is based
    239 on the simple premise of never allowing an active continuous stream
    240 to starve for data during decode; buffering works ahead until all
    241 continuous streams in a physical stream have data ready and no further.
    242 
    243 <p>Discontinuous stream data is not assumed to be predictable. The
    244 buffering design takes discontinuous data 'as it comes' rather than
    245 working ahead to look for future discontinuous data for a potentially
    246 unbounded period. Thus, the buffering process makes no attempt to fill
    247 discontinuous stream buffers; their pages simply 'fall out' of the
    248 stream when continuous streams are handled properly.
    249 
    250 <p>Buffering requirements in this design need not be explicitly
    251 declared or managed in the encoded stream. The decoder simply reads as
    252 much data as is necessary to keep all continuous stream types gapless
    253 and no more, with discontinuous data processed as it arrives in the
    254 continuous data. Buffering is implicitly optimal for the given
    255 stream. Because all pages of all data types are stamped with absolute
    256 timing information within the stream, inter-stream synchronization
    257 timing is always maintained without the need for explicitly declared
    258 buffer-ahead hinting.
    259 
    260 <h3>Codec metadata</h3>
    261 
    262 <p>Ogg does not replicate codec-specific metadata into the mux layer
    263 in an attempt to make the mux and codec layer implementations 'fully
    264 separable'.  Things like specific timebase, keyframing strategy, frame
    265 duration, etc, do not appear in the Ogg container.  The mux layer is,
    266 instead, expected to query a codec through a standardized interface,
    267 left to the implementation, for this data when it is needed.
    268 
    269 <p>Though modern design wisdom usually prefers to predict all possible
    270 needs of current and future codecs then embed these dependencies and
    271 the required metadata into the container itself, this strategy
    272 increases container specification complexity, fragility, and rigidity.
    273 The mux and codec implementations become more independent, but the
    274 specifications become less independent. A codec can't do what a
    275 container hasn't already provided for.  New codecs are harder to
    276 support, and you can do fewer useful things with the ones you've
    277 already got (eg, try to make a good splitter without using any codecs.
    278 You're stuck splitting at keyframes only, or building yet another new
    279 mechanism into the container layer to mark what frames to skip
    280 displaying).
    281 
    282 <p>Ogg's design goes the opposite direction, where the specification
    283 is to be as simple, easy to understand, and 'proofed' against novel
    284 codecs as possible.  When an Ogg mux layer requires codec-specific
    285 information, it queries the codec (or a codec stub).  This trades a
    286 more complex implementation for a simpler, more flexible
    287 specification.
    288 
    289 <h3>Stream structure metadata</h3>
    290 
    291 <p>The Ogg container itself does not define a metadata system for
    292 declaring the structure and interrelations between multiple media
    293 types in a muxed stream.  That is, the Ogg container itself does not
    294 specify data like 'which steam is the subtitle stream?' or 'which
    295 video stream is the primary angle?'.  This metadata still exists, but
    296 is stored in the Ogg container rather than being built into the Ogg
    297 container.  Xiph specifies the 'Skeleton' metadata format for Ogg
    298 streams, but this decoupling of container and stream structure
    299 metadata means it is possible to use Ogg with any metadata
    300 specification without altering the container itself, or without stream
    301 structure metadata at all.
    302 
    303 <h3>Frame accurate absolute position</h3>
    304 
    305 <p>Every Ogg page is stamped with a 64 bit 'granule position' that
    306 serves as an absolute timestamp for mux and seeking.  A few nifty
    307 little tricks are usually also embedded in the granpos state, but
    308 we'll leave those aside for the moment (strictly speaking, they're
    309 part of each codec's mapping, not Ogg).
    310 
    311 <p>As previously mentioned above, granule positions are mapped into
    312 absolute timestamps by the codec, rather than being a hard timestamp.
    313 This allows maximally efficient use of the available 64 bits to
    314 address every sample/frame position without approximation while
    315 supporting new and previously unknown timebase encodings without
    316 needing to extend or update the mux layer.  When a codec needs a novel
    317 timebase, it simply brings the code for that mapping along with it.
    318 This is not a theoretical curiosity; new, wholly novel timebases were
    319 deployed with the adoption of both Theora and Dirac.  "Rolling INTRA"
    320 (keyframeless video) also benefits from novel use of the granule
    321 position.
    322 
    323 <h2>Ogg stream arrangement</h2>
    324 
    325 <h3>Packets, pages, and bitstreams</h3>
    326 
    327 <p>Ogg codecs use <em>packets</em>.  Packets are octet payloads of
    328 raw, compressed data, containing the data needed for a single
    329 decompressed unit, eg, one video frame. Packets have no maximum size
    330 and may be zero length. They do not have any high-level structure or
    331 boundary information; strung together, the unframed packets form a
    332 <em>logical bitstream</em> of apparently random bytes with no internal
    333 landmarks.
    334 
    335 <p>Logical bitstream packets are grouped and framed into Ogg pages
    336 along with a unique stream <em>serial number</em> to produce a
    337 <em>physical bitstream</em>.  An <em>elementary stream</em> is a
    338 physical bitstream containing only the pages framing a single logical
    339 bitstream. Each page is a self contained entity, although a packet may
    340 be split and encoded across one or more pages. The page decode
    341 mechanism is designed to recognize, verify and handle single pages at
    342 a time from the overall bitstream.
    343 
    344 <p><a href="framing.html">Ogg Bitstream Framing</a> specifies
    345 the page format of an Ogg bitstream, the packet coding process
    346 and elementary bitstreams in detail.
    347 
    348 <h3>Multiplexed bitstreams</h3>
    349 
    350 <p>Multiple logical/elementary bitstreams can be combined into a single
    351 <em>multiplexed bitstream</em> by interleaving whole pages from each
    352 contributing elementary stream in time order. The result is a single
    353 physical stream that multiplexes and frames multiple logical streams.
    354 Each logical stream is identified by the unique stream serial number
    355 stamped in its pages.  A physical stream may include a 'meta-header'
    356 (such as the <a href="skeleton.html">Ogg Skeleton</a>) comprising its
    357 own Ogg page at the beginning of the physical stream. A decoder
    358 recovers the original logical/elementary bitstreams out of the
    359 physical bitstream by taking the pages in order from the physical
    360 bitstream and redirecting them into the appropriate logical decoding
    361 entity.
    362 
    363 <p><a href="ogg-multiplex.html">Ogg Bitstream Multiplexing</a> specifies
    364 proper multiplexing of an Ogg bitstream in detail.
    365 
    366 <h3>Chaining</h3>
    367 
    368 <p>Multiple Ogg physical bitstreams may be concatenated into a single new
    369 stream; this is <em>chaining</em>. The bitstreams do not overlap; the
    370 final page of a given logical bitstream is immediately followed by the
    371 initial page of the next.</p>
    372 
    373 <p>Each logical bitstream in a chain must have a unique serial number
    374 within the scope of the full physical bitstream, not only within a
    375 particular <em>link</em> or <em>segment</em> of the chain.</p>
    376 
    377 <h3>Continuous and discontinuous streams</h3>
    378 
    379 <p>Within Ogg, each stream must be declared (by the codec) to be
    380 continuous- or discontinuous-time.  Most codecs treat all streams they
    381 use as either inherently continuous- or discontinuous-time, although
    382 this is not a requirement. A codec may, as part of its mapping, choose
    383 according to data in the initial header.
    384 
    385 <p>Continuous-time pages are stamped by end-time, discontinuous pages
    386 are stamped by begin-time.  Pages in a multiplexed stream are
    387 interleaved in order of the time stamp regardless of stream type.
    388 Both continuous and discontinuous logical streams are used to seek
    389 within a physical stream, however only continuous streams are used to
    390 determine buffering depth; because discontinuous streams are stamped
    391 by start time, they will always 'fall out' in time when buffering
    392 tracks only the continuous streams.  See 'Examples' for an
    393 illustration of the buffering mechanism.
    394 
    395 <h2>Mapping Requirements</h2>
    396 
    397 <p>Each codec is allowed some freedom in deciding how its logical
    398 bitstream is encapsulated into an Ogg bitstream (even if it is a
    399 trivial mapping, eg, 'plop the packets in and go'). This is the
    400 codec's <em>mapping</em>. Ogg imposes a few mapping requirements
    401 on any codec.
    402 
    403 <p>The <a href="framing.html">framing specification</a> defines
    404 'beginning of stream' and 'end of stream' page markers via a header
    405 flag (it is possible for a stream to consist of a single page). A
    406 correct stream always consists of an integer number of pages, an easy
    407 requirement given the variable size nature of pages.</p>
    408 
    409 <p>The first page of an elementary Ogg bitstream consists of a single,
    410 small 'initial header' packet that must include sufficient information
    411 to identify the exact CODEC type. From this initial header, the codec
    412 must also be able to determine its timebase and whether or not it is a
    413 continuous- or discontinuous-time stream.  The initial header must fit
    414 on a single page. If a codec makes use of auxiliary headers (for
    415 example, Vorbis uses two auxiliary headers), these headers must follow
    416 the initial header immediately.  The last header finishes its page;
    417 data begins on a fresh page.
    418 
    419 <p>As an example, Ogg Vorbis places the name and revision of the
    420 Vorbis CODEC, the audio rate and the audio quality into this initial
    421 header.  Comments and detailed codec setup appears in the larger
    422 auxiliary headers.</p>
    423 
    424 <h2>Multiplexing Requirements</h2>
    425 
    426 <p>Multiplexing requirements within Ogg are straightforward. When
    427 constructing a single-link (unchained) physical bitstream consisting
    428 of multiple elementary streams:
    429 
    430 <ol>
    431 
    432 <li> The initial header for each stream appears in sequence, each
    433 header on a single page.  All initial headers must appear with no
    434 intervening data (no auxiliary header pages or packets, no data pages
    435 or packets).  Order of the initial headers is unspecified. The
    436 'beginning of stream' flag is set on each initial header.
    437 
    438 <li> All auxiliary headers for all streams must follow.  Order
    439 is unspecified.  The final auxiliary header of each stream must flush
    440 its page.
    441 
    442 <li>Data pages for each stream follow, interleaved in time order. 
    443 
    444 <li>The final page of each stream sets the 'end of stream' flag.
    445 Unlike initial pages, terminal pages for the logical bitstreams need
    446 not occur contiguously; indeed it may not be possible for them to do so.
    447 </oL>
    448 
    449 <p>Each grouped bitstream must have a unique serial number within the
    450 scope of the physical bitstream.</p>
    451 
    452 <h3>chaining and multiplexing</h3>
    453 
    454 <p>Multiplexed and/or unmultiplexed bitstreams may be chained
    455 consecutively. Such a physical bitstream obeys all the rules of both
    456 chained and multiplexed streams.  Each link, when unchained, must
    457 stand on its own as a valid physical bitstream.  Chained streams do
    458 not mix; a new segment may not begin until all streams in the
    459 preceding segment have terminated. </p>
    460 
    461 <h2>Examples</h2>
    462 
    463 <em>[More to come shortly; this section is currently being revised and expanded]</em>
    464 
    465 <p>Below, we present an example of a multiplexed and chained bitstream:</p>
    466 
    467 <p><img src="stream.png" alt="stream"/></p>
    468 
    469 <p>In this example, we see pages from five total logical bitstreams
    470 multiplexed into a physical bitstream. Note the following
    471 characteristics:</p>
    472 
    473 <ol>
    474 <li>Multiplexed bitstreams in a given link begin together; all of the
    475 initial pages must appear before any data pages. When concurrently
    476 multiplexed groups are chained, the new group does not begin until all
    477 the bitstreams in the previous group have terminated.</li>
    478 
    479 <li>The ordering of pages of concurrently multiplexed bitstreams is
    480 goverened by timestamp (not shown here); there is no regular
    481 interleaving order.  Pages within a logical bitstream appear in
    482 sequence order.</li>
    483 </ol>
    484 
    485 <div id="copyright">
    486   The Xiph Fish Logo is a
    487   trademark (&trade;) of Xiph.Org.<br/>
    488 
    489   These pages &copy; 1994 - 2010 Xiph.Org. All rights reserved.
    490 </div>
    491 
    492 </body>
    493 </html>
    494