1 An overview of memory management in QEMU: 2 3 I. RAM Management: 4 ================== 5 6 I.1. RAM Address space: 7 ----------------------- 8 9 All pages of virtual RAM used by QEMU at runtime are allocated from 10 contiguous blocks in a specific abstract "RAM address space". 11 |ram_addr_t| is the type of block addresses in this space. 12 13 A single block of contiguous RAM is allocated with 'qemu_ram_alloc()', which 14 takes a size in bytes, and allocates the pages through mmap() in the QEMU 15 host process. It also sets up the corresponding KVM / Xen / HAX mappings, 16 depending on each accelerator's specific needs. 17 18 Each block has a name, which is used for snapshot support. 19 20 'qemu_ram_alloc_from_ptr()' can also be used to allocated a new RAM 21 block, by passing its content explicitly (can be useful for pages of 22 ROM). 23 24 'qemu_get_ram_ptr()' will translate a 'ram_addr_t' into the corresponding 25 address in the QEMU host process. 'qemu_ram_addr_from_host()' does the 26 opposite (i.e. translates a host address into a ram_addr_t if possible, 27 or return an error). 28 29 Note that ram_addr_t addresses are an internal implementation detail of 30 QEMU, i.e. the virtual CPU never sees their values directly; it relies 31 instead of addresses in its virtual physical address space, described 32 in section II. below. 33 34 As an example, when emulating an Android/x86 virtual device, the following 35 RAM space is being used: 36 37 0x0000_0000 ... 0x1000_0000 "pc.ram" 38 0x1000_0000 ... 0x1002_0000 "bios.bin" 39 0x1002_0000 ... 0x1004_0000 "pc.rom" 40 41 42 I.2. RAM Dirty tracking: 43 ------------------------ 44 45 QEMU also associates with each RAM page an 8-bit 'dirty' bitmap. The 46 main idea is that whenever a page is written to, the value 0xff is 47 written to the page's 'dirty' bitmap. Various clients can later inspect 48 some of the flags and clear them. I.e.: 49 50 VGA_DIRTY_FLAG (0x1) is typically used by framebuffer drivers to detect 51 which pages of video RAM were touched since the latest VSYNC. The driver 52 typically copies the pixel values to the real QEMU output, then clears 53 the bits. This is very useful to avoid needless copies if nothing 54 changed in the framebuffer. 55 56 MIGRATION_DIRTY_FLAG (0x8) is used to tracked modified RAM pages during 57 live migration (i.e. moving a QEMU virtual machine from one host to 58 another) 59 60 CODE_DIRTY_FLAG (0x2) is a bit more special, and is used to support 61 self-modifying code properly. More on this later. 62 63 64 II. The physical address space: 65 =============================== 66 67 Represents the address space that the virtual CPU can read from / write to. 68 |hwaddr| is the type of addresses in this space, which is decomposed 69 into 'pages'. Each page in the address space is either unassigned, or 70 mapped to a specific kind of memory region. 71 72 See |phys_page_find()| and |phys_page_find_alloc()| in translate-all.c for 73 the implementation details. 74 75 76 II.1. Memory region types: 77 -------------------------- 78 79 There are several memory region types: 80 81 - Regions of RAM pages. 82 - Regions of ROM pages (similar to RAM, but cannot be written to). 83 - Regions of I/O pages, used to communicate with virtual hardware. 84 85 Virtual devices can register a new I/O region type by calling 86 |cpu_register_io_memory()|. This function allows them to provide 87 callbacks that will be invoked every time the virtual CPU reads from 88 or writes to any page of the corresponding type. 89 90 The memory region type of a given page is encoded using PAGE_BITS bits 91 in the following format: 92 93 +-------------------------------+ 94 | mem_type_index | flags | 95 +-------------------------------+ 96 97 Where |mem_type_index| is a unique value identifying a given memory 98 region type, and |flags| is a 3-bit bitmap used to store flags that are 99 only relevant for I/O pages. 100 101 The following memory region type values are important: 102 103 IO_MEM_RAM (mem_type_index=0, flags=0): 104 Used for regular RAM pages, always all zero on purpose. 105 106 IO_MEM_ROM (mem_type_index=1, flags=0): 107 Used for ROM pages. 108 109 IO_MEM_UNASSIGNED (mem_type_index=2, flags=0): 110 Used to identify unassigned pages of the physical address space. 111 112 IO_MEM_NOTDIRTY (mem_type_index=3, flags=0): 113 Used to implement tracking of dirty RAM pages. This is essentially 114 used for RAM pages that have not been written to yet. 115 116 Any mem_type_index value of 4 or higher corresponds to a device-specific 117 I/O memory region type (i.e. with custom read/write callbaks, a 118 corresponding 'opaque' value), and can also use the following bits 119 in |flags|: 120 121 IO_MEM_ROMD (0x1): 122 Used for ROM-like I/O pages, i.e. they are backed by a page from 123 the RAM address space, but writing to them triggers a device-specific 124 write callback (instead of being ignored or faulting the CPU). 125 126 IO_MEM_SUBPAGE (0x02) 127 Used to indicate that not all addresses in this page map to the same 128 I/O region type / callbacks. 129 130 IO_MEM_SUBWIDTH (0x04) 131 Probably obsolete. Set to indicate that the corresponding I/O region 132 type doesn't support reading/writing values of all possible sizes 133 (1, 2 and 4 bytes). This seems to be never used by the current code. 134 135 Note that cpu_register_io_memory() returns a new memory region type value. 136 137 II.2. Physical address map: 138 --------------------------- 139 140 QEMU maintains for each assigned page in the physical address space 141 two values: 142 143 |phys_offset|, a combination of ram address and memory region type. 144 145 |region_offset|, an optional offset into the region backing the 146 page. This is only useful for I/O pages. 147 148 The |phys_offset| value has many interesting encoding which require 149 further clarification: 150 151 - Generally speaking, a phys_offset value is decomposed into 152 the following bit fields: 153 154 +-----------------------------------------------------+ 155 | high_addr | mem_type | 156 +-----------------------------------------------------+ 157 158 where |mem_type| is a PAGE_BITS memory region type as described 159 previously, and |high_addr| may contain the high bits of a 160 ram_addr_t address for RAM-backed pages. 161 162 More specifically: 163 164 - Unassigned pages always have the special value IO_MEM_UNASSIGNED 165 (high_addr=0, mem_type=IO_MEM_UNASSIGNED) 166 167 - RAM pages have mem_type=0 (i.e. IO_MEM_RAM) while high_addr are 168 the high bits of the corresponding ram_addr_t. Hence, a simple call to 169 qemu_get_ram_ptr(phys_offset) will return the corresponding 170 address in host QEMU memory. 171 172 This is the reson why IO_MEM_RAM is always 0: 173 174 RAM page phys_offset value: 175 +-----------------------------------------------------+ 176 | high_addr | 0 | 177 +-----------------------------------------------------+ 178 179 180 - ROM pages are like RAM pages, but have mem_type=IO_MEM_ROM. 181 QEMU ensures that writing to such a page is a no-op, except on 182 some target architectures, like Sparc, this may cause a CPU fault. 183 184 ROM page phys_offset value: 185 +-----------------------------------------------------+ 186 | high_addr | IO_MEM_ROM | 187 +-----------------------------------------------------+ 188 189 - Dirty RAM page tracking is implemented by using special 190 phys_offset values with mem_type=IO_MEM_NOTDIRTY. Note that these 191 values do not appear directly in the physical page map, but in 192 the CPU TLB cache (explained later). 193 194 non-dirty RAM page phys_offset value (CPU TLB cache only): 195 +-----------------------------------------------------+ 196 | high_addr | IO_MEM_NOTDIRTY | 197 +-----------------------------------------------------+ 198 199 - Other pages are I/O pages, and their high_addr value will 200 be 0 / ignored: 201 202 I/O page phys_offset value: 203 +----------------------------------------------------------+ 204 | 0 | mem_type_index | flags | 205 +----------------------------------------------------------+ 206 207 Note that when reading from or writing to I/O pages, the lowest 208 PAGE_BITS bits of the corresponding hwaddr value will be added 209 to the page's |region_offset| value. This new address is passed 210 to the read/write callback as the 'i/o address' for the operation. 211 212 - As a special exception, if the I/O page's IO_MEM_ROMD flag is 213 set, then high_addr is not 0, but the high bits of the corresponding 214 ram_addr_t backing the page's contents on reads. On write operations 215 though, the I/O region type's write callback will be called instead. 216 217 ROMD I/O page phys_offset value: 218 +----------------------------------------------------------+ 219 | high_addr | mem_type_index | flags | 220 +----------------------------------------------------------+ 221 222 Note that |region_offset| is ignored when reading from such pages, 223 it's only used when writing to the I/O page. 224