Home | History | Annotate | Download | only in PDB
      1 =====================================
      2 The MSF File Format
      3 =====================================
      4 
      5 .. contents::
      6    :local:
      7 
      8 .. _msf_layout:
      9 
     10 File Layout
     11 ===========
     12 
     13 The MSF file format consists of the following components:
     14 
     15 1. :ref:`msf_superblock`
     16 2. :ref:`msf_freeblockmap` (also know as Free Page Map, or FPM)
     17 3. Data
     18 
     19 Each component is stored as an indexed block, the length of which is specified
     20 in ``SuperBlock::BlockSize``. The file consists of 1 or more iterations of the
     21 following pattern (sometimes referred to as an "interval"):
     22 
     23 1. 1 block of data
     24 2. Free Block Map 1 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 1)
     25 3. Free Block Map 2 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 2)
     26 4. ``SuperBlock::BlockSize - 3`` blocks of data
     27 
     28 In the first interval, the first data block is used to store
     29 :ref:`msf_superblock`.
     30 
     31 The following diagram demonstrates the general layout of the file (\| denotes
     32 the end of an interval, and is for visualization purposes only):
     33 
     34 +-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
     35 | Block Index | 0                     | 1                | 2                | 3 - 4095 | \| | 4096 | 4097 | 4098 | 4099 - 8191 | \| | ... |
     36 +=============+=======================+==================+==================+==========+====+======+======+======+=============+====+=====+
     37 | Meaning     | :ref:`msf_superblock` | Free Block Map 1 | Free Block Map 2 | Data     | \| | Data | FPM1 | FPM2 | Data        | \| | ... |
     38 +-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
     39 
     40 The file may end after any block, including immediately after a FPM1.
     41 
     42 .. note::
     43   LLVM only supports 4096 byte blocks (sometimes referred to as the "BigMsf"
     44   variant), so the rest of this document will assume a block size of 4096.
     45 
     46 .. _msf_superblock:
     47 
     48 The Superblock
     49 ==============
     50 At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as
     51 follows:
     52 
     53 .. code-block:: c++
     54 
     55   struct SuperBlock {
     56     char FileMagic[sizeof(Magic)];
     57     ulittle32_t BlockSize;
     58     ulittle32_t FreeBlockMapBlock;
     59     ulittle32_t NumBlocks;
     60     ulittle32_t NumDirectoryBytes;
     61     ulittle32_t Unknown;
     62     ulittle32_t BlockMapAddr;
     63   };
     64 
     65 - **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``
     66   followed by the bytes ``1A 44 53 00 00 00``.
     67 - **BlockSize** - The block size of the internal file system.  Valid values are
     68   512, 1024, 2048, and 4096 bytes.  Certain aspects of the MSF file layout vary
     69   depending on the block sizes.  For the purposes of LLVM, we handle only block
     70   sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
     71 - **FreeBlockMapBlock** - The index of a block within the file, at which begins
     72   a bitfield representing the set of all blocks within the file which are "free"
     73   (i.e. the data within that block is not used).  See :ref:`msf_freeblockmap` for
     74   more information.
     75   **Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``!
     76 - **NumBlocks** - The total number of blocks in the file.  ``NumBlocks * BlockSize``
     77   should equal the size of the file on disk.
     78 - **NumDirectoryBytes** - The size of the stream directory, in bytes.  The stream
     79   directory contains information about each stream's size and the set of blocks
     80   that it occupies.  It will be described in more detail later.
     81 - **BlockMapAddr** - The index of a block within the MSF file.  At this block is
     82   an array of ``ulittle32_t``'s listing the blocks that the stream directory
     83   resides on.  For large MSF files, the stream directory (which describes the
     84   block layout of each stream) may not fit entirely on a single block.  As a
     85   result, this extra layer of indirection is introduced, whereby this block
     86   contains the list of blocks that the stream directory occupies, and the stream
     87   directory itself can be stitched together accordingly.  The number of
     88   ``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.
     89 
     90 .. _msf_freeblockmap:
     91 
     92 The Free Block Map
     93 ==================
     94 
     95 The Free Block Map (sometimes referred to as the Free Page Map, or FPM) is a
     96 series of blocks which contains a bit flag for every block in the file. The
     97 flag will be set to 0 if the block is in use, and 1 if the block is unused.
     98 
     99 Each file contains two FPMs, one of which is active at any given time. This
    100 feature is designed to support incremental and atomic updates of the underlying
    101 MSF file. While writing to an MSF file, if the active FPM is FPM1, you can
    102 write your new modified bitfield to FPM2, and vice versa. Only when you commit
    103 the file to disk do you need to swap the value in the SuperBlock to point to
    104 the new ``FreeBlockMapBlock``.
    105 
    106 The Free Block Maps are stored as a series of single blocks thoughout the file
    107 at intervals of BlockSize. Because each FPM block is of size ``BlockSize``
    108 bytes, it contains 8 times as many bits as an interval has blocks. This means
    109 that the first block of each FPM refers to the first 8 intervals of the file
    110 (the first 32768 blocks), the second block of each FPM refers to the next 8
    111 blocks, and so on. This results in far more FPM blocks being present than are
    112 required, but in order to maintain backwards compatibility the format must stay
    113 this way.
    114 
    115 The Stream Directory
    116 ====================
    117 The Stream Directory is the root of all access to the other streams in an MSF
    118 file.  Beginning at byte 0 of the stream directory is the following structure:
    119 
    120 .. code-block:: c++
    121 
    122   struct StreamDirectory {
    123     ulittle32_t NumStreams;
    124     ulittle32_t StreamSizes[NumStreams];
    125     ulittle32_t StreamBlocks[NumStreams][];
    126   };
    127 
    128 And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.
    129 Note that each of the last two arrays is of variable length, and in particular
    130 that the second array is jagged.
    131 
    132 **Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4
    133 streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
    134 
    135 Stream 0: ceil(1000 / 4096) = 1 block
    136 
    137 Stream 1: ceil(8000 / 4096) = 2 blocks
    138 
    139 Stream 2: ceil(16000 / 4096) = 4 blocks
    140 
    141 Stream 3: ceil(9000 / 4096) = 3 blocks
    142 
    143 In total, 10 blocks are used.  Let's see what the stream directory might look
    144 like:
    145 
    146 .. code-block:: c++
    147 
    148   struct StreamDirectory {
    149     ulittle32_t NumStreams = 4;
    150     ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};
    151     ulittle32_t StreamBlocks[][] = {
    152       {4},
    153       {5, 6},
    154       {11, 9, 7, 8},
    155       {10, 15, 12}
    156     };
    157   };
    158 
    159 In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``
    160 would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one
    161 ``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.
    162 
    163 Note also that the streams are discontiguous, and that part of stream 3 is in the
    164 middle of part of stream 2.  You cannot assume anything about the layout of the
    165 blocks!
    166 
    167 Alignment and Block Boundaries
    168 ==============================
    169 As may be clear by now, it is possible for a single field (whether it be a high
    170 level record, a long string field, or even a single ``uint16``) to begin and
    171 end in separate blocks.  For example, if the block size is 4096 bytes, and a
    172 ``uint16`` field begins at the last byte of the current block, then it would
    173 need to end on the first byte of the next block.  Since blocks are not
    174 necessarily contiguously laid out in the file, this means that both the consumer
    175 and the producer of an MSF file must be prepared to split data apart
    176 accordingly.  In the aforementioned example, the high byte of the ``uint16``
    177 would be written to the last byte of block N, and the low byte would be written
    178 to the first byte of block N+1, which could be tens of thousands of bytes later
    179 (or even earlier!) in the file, depending on what the stream directory says.
    180