Home | History | Annotate | Download | only in docs
      1 An overview of memory management in QEMU:
      2 
      3 I. RAM Management:
      4 ==================
      5 
      6 I.1. RAM Address space:
      7 -----------------------
      8 
      9 All pages of virtual RAM used by QEMU at runtime are allocated from
     10 contiguous blocks in a specific abstract "RAM address space".
     11 |ram_addr_t| is the type of block addresses in this space.
     12 
     13 A single block of contiguous RAM is allocated with 'qemu_ram_alloc()', which
     14 takes a size in bytes, and allocates the pages through mmap() in the QEMU
     15 host process. It also sets up the corresponding KVM / Xen / HAX mappings,
     16 depending on each accelerator's specific needs.
     17 
     18 Each block has a name, which is used for snapshot support.
     19 
     20 'qemu_ram_alloc_from_ptr()' can also be used to allocated a new RAM
     21 block, by passing its content explicitly (can be useful for pages of
     22 ROM).
     23 
     24 'qemu_get_ram_ptr()' will translate a 'ram_addr_t' into the corresponding
     25 address in the QEMU host process. 'qemu_ram_addr_from_host()' does the
     26 opposite (i.e. translates a host address into a ram_addr_t if possible,
     27 or return an error).
     28 
     29 Note that ram_addr_t addresses are an internal implementation detail of
     30 QEMU, i.e. the virtual CPU never sees their values directly; it relies
     31 instead of addresses in its virtual physical address space, described
     32 in section II. below.
     33 
     34 As an example, when emulating an Android/x86 virtual device, the following
     35 RAM space is being used:
     36 
     37   0x0000_0000 ... 0x1000_0000   "pc.ram"
     38   0x1000_0000 ... 0x1002_0000   "bios.bin"
     39   0x1002_0000 ... 0x1004_0000   "pc.rom"
     40 
     41 
     42 I.2. RAM Dirty tracking:
     43 ------------------------
     44 
     45 QEMU also associates with each RAM page an 8-bit 'dirty' bitmap. The
     46 main idea is that whenever a page is written to, the value 0xff is
     47 written to the page's 'dirty' bitmap. Various clients can later inspect
     48 some of the flags and clear them. I.e.:
     49 
     50   VGA_DIRTY_FLAG (0x1) is typically used by framebuffer drivers to detect
     51   which pages of video RAM were touched since the latest VSYNC. The driver
     52   typically copies the pixel values to the real QEMU output, then clears
     53   the bits. This is very useful to avoid needless copies if nothing
     54   changed in the framebuffer.
     55 
     56   MIGRATION_DIRTY_FLAG (0x8) is used to tracked modified RAM pages during
     57   live migration (i.e. moving a QEMU virtual machine from one host to
     58   another)
     59 
     60   CODE_DIRTY_FLAG (0x2) is a bit more special, and is used to support
     61   self-modifying code properly. More on this later.
     62 
     63 
     64 II. The physical address space:
     65 ===============================
     66 
     67 Represents the address space that the virtual CPU can read from / write to.
     68 |hwaddr| is the type of addresses in this space, which is decomposed
     69 into 'pages'. Each page in the address space is either unassigned, or
     70 mapped to a specific kind of memory region.
     71 
     72 See |phys_page_find()| and |phys_page_find_alloc()| in translate-all.c for
     73 the implementation details.
     74 
     75 
     76 II.1. Memory region types:
     77 --------------------------
     78 
     79 There are several memory region types:
     80 
     81   - Regions of RAM pages.
     82   - Regions of ROM pages (similar to RAM, but cannot be written to).
     83   - Regions of I/O pages, used to communicate with virtual hardware.
     84 
     85 Virtual devices can register a new I/O region type by calling
     86 |cpu_register_io_memory()|. This function allows them to provide
     87 callbacks that will be invoked every time the virtual CPU reads from
     88 or writes to any page of the corresponding type.
     89 
     90 The memory region type of a given page is encoded using PAGE_BITS bits
     91 in the following format:
     92 
     93         +-------------------------------+
     94         |    mem_type_index     | flags |
     95         +-------------------------------+
     96 
     97 Where |mem_type_index| is a unique value identifying a given memory
     98 region type, and |flags| is a 3-bit bitmap used to store flags that are
     99 only relevant for I/O pages.
    100 
    101 The following memory region type values are important:
    102 
    103   IO_MEM_RAM (mem_type_index=0, flags=0):
    104     Used for regular RAM pages, always all zero on purpose.
    105 
    106   IO_MEM_ROM (mem_type_index=1, flags=0):
    107     Used for ROM pages.
    108 
    109   IO_MEM_UNASSIGNED (mem_type_index=2, flags=0):
    110     Used to identify unassigned pages of the physical address space.
    111 
    112   IO_MEM_NOTDIRTY (mem_type_index=3, flags=0):
    113     Used to implement tracking of dirty RAM pages. This is essentially
    114     used for RAM pages that have not been written to yet.
    115 
    116 Any mem_type_index value of 4 or higher corresponds to a device-specific
    117 I/O memory region type (i.e. with custom read/write callbaks, a
    118 corresponding 'opaque' value), and can also use the following bits
    119 in |flags|:
    120 
    121   IO_MEM_ROMD (0x1):
    122     Used for ROM-like I/O pages, i.e. they are backed by a page from
    123     the RAM address space, but writing to them triggers a device-specific
    124     write callback (instead of being ignored or faulting the CPU).
    125 
    126   IO_MEM_SUBPAGE (0x02)
    127     Used to indicate that not all addresses in this page map to the same
    128     I/O region type / callbacks.
    129 
    130   IO_MEM_SUBWIDTH (0x04)
    131     Probably obsolete. Set to indicate that the corresponding I/O region
    132     type doesn't support reading/writing values of all possible sizes
    133     (1, 2 and 4 bytes). This seems to be never used by the current code.
    134 
    135 Note that cpu_register_io_memory() returns a new memory region type value.
    136 
    137 II.2. Physical address map:
    138 ---------------------------
    139 
    140 QEMU maintains for each assigned page in the physical address space
    141 two values:
    142 
    143   |phys_offset|, a combination of ram address and memory region type.
    144 
    145   |region_offset|, an optional offset into the region backing the
    146   page. This is only useful for I/O pages.
    147 
    148 The |phys_offset| value has many interesting encoding which require
    149 further clarification:
    150 
    151   - Generally speaking, a phys_offset value is decomposed into
    152     the following bit fields:
    153 
    154       +-----------------------------------------------------+
    155       |         high_addr               |     mem_type      |
    156       +-----------------------------------------------------+
    157 
    158     where |mem_type| is a PAGE_BITS memory region type as described
    159     previously, and |high_addr| may contain the high bits of a
    160     ram_addr_t address for RAM-backed pages.
    161 
    162 More specifically:
    163 
    164   - Unassigned pages always have the special value IO_MEM_UNASSIGNED
    165     (high_addr=0, mem_type=IO_MEM_UNASSIGNED)
    166 
    167   - RAM pages have mem_type=0 (i.e. IO_MEM_RAM) while high_addr are
    168     the high bits of the corresponding ram_addr_t. Hence, a simple call to
    169     qemu_get_ram_ptr(phys_offset) will return the corresponding
    170     address in host QEMU memory.
    171 
    172     This is the reson why IO_MEM_RAM is always 0:
    173 
    174     RAM page phys_offset value:
    175       +-----------------------------------------------------+
    176       |   high_addr                     |           0       |
    177       +-----------------------------------------------------+
    178 
    179 
    180   - ROM pages are like RAM pages, but have mem_type=IO_MEM_ROM.
    181     QEMU ensures that writing to such a page is a no-op, except on
    182     some target architectures, like Sparc, this may cause a CPU fault.
    183 
    184     ROM page phys_offset value:
    185       +-----------------------------------------------------+
    186       |   high_addr                     |     IO_MEM_ROM    |
    187       +-----------------------------------------------------+
    188 
    189   - Dirty RAM page tracking is implemented by using special
    190     phys_offset values with mem_type=IO_MEM_NOTDIRTY. Note that these
    191     values do not appear directly in the physical page map, but in
    192     the CPU TLB cache (explained later).
    193 
    194     non-dirty RAM page phys_offset value (CPU TLB cache only):
    195       +-----------------------------------------------------+
    196       |   high_addr                     |  IO_MEM_NOTDIRTY  |
    197       +-----------------------------------------------------+
    198 
    199    - Other pages are I/O pages, and their high_addr value will
    200      be 0 / ignored:
    201 
    202     I/O page phys_offset value:
    203       +----------------------------------------------------------+
    204       |  0                              | mem_type_index | flags |
    205       +----------------------------------------------------------+
    206 
    207     Note that when reading from or writing to I/O pages, the lowest
    208     PAGE_BITS bits of the corresponding hwaddr value will be added
    209     to the page's |region_offset| value. This new address is passed
    210     to the read/write callback as the 'i/o address' for the operation.
    211 
    212    - As a special exception, if the I/O page's IO_MEM_ROMD flag is
    213      set, then high_addr is not 0, but the high bits of the corresponding
    214      ram_addr_t backing the page's contents on reads. On write operations
    215      though, the I/O region type's write callback will be called instead.
    216 
    217      ROMD I/O page phys_offset value:
    218       +----------------------------------------------------------+
    219       |  high_addr                      | mem_type_index | flags |
    220       +----------------------------------------------------------+
    221 
    222      Note that |region_offset| is ignored when reading from such pages,
    223      it's only used when writing to the I/O page.
    224