1 % -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*- 2 %!TEX root = Vorbis_I_spec.tex 3 % $Id$ 4 \section{Embedding Vorbis into an Ogg stream} \label{vorbis:over:ogg} 5 6 \subsection{Overview} 7 8 This document describes using Ogg logical and physical transport 9 streams to encapsulate Vorbis compressed audio packet data into file 10 form. 11 12 The \xref{vorbis:spec:intro} provides an overview of the construction 13 of Vorbis audio packets. 14 15 The \href{oggstream.html}{Ogg 16 bitstream overview} and \href{framing.html}{Ogg logical 17 bitstream and framing spec} provide detailed descriptions of Ogg 18 transport streams. This specification document assumes a working 19 knowledge of the concepts covered in these named backround 20 documents. Please read them first. 21 22 \subsubsection{Restrictions} 23 24 The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis 25 streams use Ogg transport streams in degenerate, unmultiplexed 26 form only. That is: 27 28 \begin{itemize} 29 \item 30 A meta-headerless Ogg file encapsulates the Vorbis I packets 31 32 \item 33 The Ogg stream may be chained, i.e., contain multiple, contigous logical streams (links). 34 35 \item 36 The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link) 37 38 \end{itemize} 39 40 41 This is not to say that it is not currently possible to multiplex 42 Vorbis with other media types into a multi-stream Ogg file. At the 43 time this document was written, Ogg was becoming a popular container 44 for low-bitrate movies consisting of DivX video and Vorbis audio. 45 However, a 'Vorbis I audio file' is taken to imply Vorbis audio 46 existing alone within a degenerate Ogg stream. A compliant 'Vorbis 47 audio player' is not required to implement Ogg support beyond the 48 specific support of Vorbis within a degenrate Ogg stream (naturally, 49 application authors are encouraged to support full multiplexed Ogg 50 handling). 51 52 53 54 55 \subsubsection{MIME type} 56 57 The MIME type of Ogg files depend on the context. Specifically, complex 58 multimedia and applications should use \literal{application/ogg}, 59 while visual media should use \literal{video/ogg}, and audio 60 \literal{audio/ogg}. Vorbis data encapsulated in Ogg may appear 61 in any of those types. RTP encapsulated Vorbis should use 62 \literal{audio/vorbis} + \literal{audio/vorbis-config}. 63 64 65 \subsection{Encapsulation} 66 67 Ogg encapsulation of a Vorbis packet stream is straightforward. 68 69 \begin{itemize} 70 71 \item 72 The first Vorbis packet (the identification header), which 73 uniquely identifies a stream as Vorbis audio, is placed alone in the 74 first page of the logical Ogg stream. This results in a first Ogg 75 page of exactly 58 bytes at the very beginning of the logical stream. 76 77 78 \item 79 This first page is marked 'beginning of stream' in the page flags. 80 81 82 \item 83 The second and third vorbis packets (comment and setup 84 headers) may span one or more pages beginning on the second page of 85 the logical stream. However many pages they span, the third header 86 packet finishes the page on which it ends. The next (first audio) packet 87 must begin on a fresh page. 88 89 90 \item 91 The granule position of these first pages containing only headers is zero. 92 93 94 \item 95 The first audio packet of the logical stream begins a fresh Ogg page. 96 97 98 \item 99 Packets are placed into ogg pages in order until the end of stream. 100 101 102 \item 103 The last page is marked 'end of stream' in the page flags. 104 105 106 \item 107 Vorbis packets may span page boundaries. 108 109 110 \item 111 The granule position of pages containing Vorbis audio is in units 112 of PCM audio samples (per channel; a stereo stream's granule position 113 does not increment at twice the speed of a mono stream). 114 115 116 \item 117 The granule position of a page represents the end PCM sample 118 position of the last packet \emph{completed} on that 119 page. The 'last PCM sample' is the last complete sample returned by 120 decode, not an internal sample awaiting lapping with a 121 subsequent block. A page that is entirely spanned by a single 122 packet (that completes on a subsequent page) has no granule 123 position, and the granule position is set to '-1'. 124 125 126 Note that the last decoded (fully lapped) PCM sample from a packet 127 is not necessarily the middle sample from that block. If, eg, the 128 current Vorbis packet encodes a "long block" and the next Vorbis 129 packet encodes a "short block", the last decodable sample from the 130 current packet be at position (3*long\_block\_length/4) - 131 (short\_block\_length/4). 132 133 134 \item 135 The granule (PCM) position of the first page need not indicate 136 that the stream started at position zero. Although the granule 137 position belongs to the last completed packet on the page and a 138 valid granule position must be positive, by 139 inference it may indicate that the PCM position of the beginning 140 of audio is positive or negative. 141 142 143 \begin{itemize} 144 \item 145 A positive starting value simply indicates that this stream begins at 146 some positive time offset, potentially within a larger 147 program. This is a common case when connecting to the middle 148 of broadcast stream. 149 150 \item 151 A negative value indicates that 152 output samples preceeding time zero should be discarded during 153 decoding; this technique is used to allow sample-granularity 154 editing of the stream start time of already-encoded Vorbis 155 streams. The number of samples to be discarded must not exceed 156 the overlap-add span of the first two audio packets. 157 158 \end{itemize} 159 160 161 In both of these cases in which the initial audio PCM starting 162 offset is nonzero, the second finished audio packet must flush the 163 page on which it appears and the third packet begin a fresh page. 164 This allows the decoder to always be able to perform PCM position 165 adjustments before needing to return any PCM data from synthesis, 166 resulting in correct positioning information without any aditional 167 seeking logic. 168 169 170 \begin{note} 171 Failure to do so should, at worst, cause a 172 decoder implementation to return incorrect positioning information 173 for seeking operations at the very beginning of the stream. 174 \end{note} 175 176 177 \item 178 A granule position on the final page in a stream that indicates 179 less audio data than the final packet would normally return is used to 180 end the stream on other than even frame boundaries. The difference 181 between the actual available data returned and the declared amount 182 indicates how many trailing samples to discard from the decoding 183 process. 184 185 \end{itemize} 186