Home | History | Annotate | Download | only in doc
      1 % -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*-
      2 %!TEX root = Vorbis_I_spec.tex
      3 % $Id$
      4 \section{Embedding Vorbis into an Ogg stream} \label{vorbis:over:ogg}
      5 
      6 \subsection{Overview}
      7 
      8 This document describes using Ogg logical and physical transport
      9 streams to encapsulate Vorbis compressed audio packet data into file
     10 form.
     11 
     12 The \xref{vorbis:spec:intro} provides an overview of the construction
     13 of Vorbis audio packets.
     14 
     15 The \href{oggstream.html}{Ogg
     16 bitstream overview} and \href{framing.html}{Ogg logical
     17 bitstream and framing spec} provide detailed descriptions of Ogg
     18 transport streams. This specification document assumes a working
     19 knowledge of the concepts covered in these named backround
     20 documents.  Please read them first.
     21 
     22 \subsubsection{Restrictions}
     23 
     24 The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis
     25 streams use Ogg transport streams in degenerate, unmultiplexed
     26 form only. That is:
     27 
     28 \begin{itemize}
     29  \item
     30   A meta-headerless Ogg file encapsulates the Vorbis I packets
     31 
     32  \item
     33   The Ogg stream may be chained, i.e., contain multiple, contigous logical streams (links).
     34 
     35  \item
     36   The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link)
     37 
     38 \end{itemize}
     39 
     40 
     41 This is not to say that it is not currently possible to multiplex
     42 Vorbis with other media types into a multi-stream Ogg file.  At the
     43 time this document was written, Ogg was becoming a popular container
     44 for low-bitrate movies consisting of DivX video and Vorbis audio.
     45 However, a 'Vorbis I audio file' is taken to imply Vorbis audio
     46 existing alone within a degenerate Ogg stream.  A compliant 'Vorbis
     47 audio player' is not required to implement Ogg support beyond the
     48 specific support of Vorbis within a degenrate Ogg stream (naturally,
     49 application authors are encouraged to support full multiplexed Ogg
     50 handling).
     51 
     52 
     53 
     54 
     55 \subsubsection{MIME type}
     56 
     57 The MIME type of Ogg files depend on the context.  Specifically, complex
     58 multimedia and applications should use \literal{application/ogg},
     59 while visual media should use \literal{video/ogg}, and audio
     60 \literal{audio/ogg}.  Vorbis data encapsulated in Ogg may appear
     61 in any of those types.  RTP encapsulated Vorbis should use
     62 \literal{audio/vorbis} + \literal{audio/vorbis-config}.
     63 
     64 
     65 \subsection{Encapsulation}
     66 
     67 Ogg encapsulation of a Vorbis packet stream is straightforward.
     68 
     69 \begin{itemize}
     70 
     71 \item
     72   The first Vorbis packet (the identification header), which
     73   uniquely identifies a stream as Vorbis audio, is placed alone in the
     74   first page of the logical Ogg stream.  This results in a first Ogg
     75   page of exactly 58 bytes at the very beginning of the logical stream.
     76 
     77 
     78 \item
     79   This first page is marked 'beginning of stream' in the page flags.
     80 
     81 
     82 \item
     83   The second and third vorbis packets (comment and setup
     84   headers) may span one or more pages beginning on the second page of
     85   the logical stream.  However many pages they span, the third header
     86   packet finishes the page on which it ends.  The next (first audio) packet
     87   must begin on a fresh page.
     88 
     89 
     90 \item
     91   The granule position of these first pages containing only headers is zero.
     92 
     93 
     94 \item
     95   The first audio packet of the logical stream begins a fresh Ogg page.
     96 
     97 
     98 \item
     99   Packets are placed into ogg pages in order until the end of stream.
    100 
    101 
    102 \item
    103   The last page is marked 'end of stream' in the page flags.
    104 
    105 
    106 \item
    107   Vorbis packets may span page boundaries.
    108 
    109 
    110 \item
    111   The granule position of pages containing Vorbis audio is in units
    112   of PCM audio samples (per channel; a stereo stream's granule position
    113   does not increment at twice the speed of a mono stream).
    114 
    115 
    116 \item
    117   The granule position of a page represents the end PCM sample
    118   position of the last packet \emph{completed} on that
    119   page.  The 'last PCM sample' is the last complete sample returned by
    120   decode, not an internal sample awaiting lapping with a
    121   subsequent block.  A page that is entirely spanned by a single
    122   packet (that completes on a subsequent page) has no granule
    123   position, and the granule position is set to '-1'.
    124 
    125 
    126   Note that the last decoded (fully lapped) PCM sample from a packet
    127   is not necessarily the middle sample from that block. If, eg, the
    128   current Vorbis packet encodes a "long block" and the next Vorbis
    129   packet encodes a "short block", the last decodable sample from the
    130   current packet be at position (3*long\_block\_length/4) -
    131   (short\_block\_length/4).
    132 
    133 
    134 \item
    135     The granule (PCM) position of the first page need not indicate
    136     that the stream started at position zero.  Although the granule
    137     position belongs to the last completed packet on the page and a
    138     valid granule position must be positive, by
    139     inference it may indicate that the PCM position of the beginning
    140     of audio is positive or negative.
    141 
    142 
    143   \begin{itemize}
    144     \item
    145         A positive starting value simply indicates that this stream begins at
    146         some positive time offset, potentially within a larger
    147         program. This is a common case when connecting to the middle
    148         of broadcast stream.
    149 
    150     \item
    151         A negative value indicates that
    152         output samples preceeding time zero should be discarded during
    153         decoding; this technique is used to allow sample-granularity
    154         editing of the stream start time of already-encoded Vorbis
    155         streams.  The number of samples to be discarded must not exceed
    156         the overlap-add span of the first two audio packets.
    157 
    158   \end{itemize}
    159 
    160 
    161     In both of these cases in which the initial audio PCM starting
    162     offset is nonzero, the second finished audio packet must flush the
    163     page on which it appears and the third packet begin a fresh page.
    164     This allows the decoder to always be able to perform PCM position
    165     adjustments before needing to return any PCM data from synthesis,
    166     resulting in correct positioning information without any aditional
    167     seeking logic.
    168 
    169 
    170   \begin{note}
    171     Failure to do so should, at worst, cause a
    172     decoder implementation to return incorrect positioning information
    173     for seeking operations at the very beginning of the stream.
    174   \end{note}
    175 
    176 
    177 \item
    178   A granule position on the final page in a stream that indicates
    179   less audio data than the final packet would normally return is used to
    180   end the stream on other than even frame boundaries.  The difference
    181   between the actual available data returned and the declared amount
    182   indicates how many trailing samples to discard from the decoding
    183   process.
    184 
    185 \end{itemize}
    186