Home | History | Annotate | Download | only in doc
      1 Here's how mount actually works:
      2 
      3 The mount comand calls the mount system call, which has five arguments you
      4 can see on the "man 2 mount" page:
      5 
      6   int mount(const char *source, const char *target, const char *filesystemtype,
      7             unsigned long mountflags, const void *data);
      8 
      9 The command "mount -t ext2 /dev/sda1 /path/to/mntpoint -o ro,noatime",
     10 parses its command line arguments to feed them into those five system call
     11 arguments. In this example, the source is "/dev/sda1", the target is
     12 "/path/to/mountpoint", and the filesystemtype is "ext2".
     13 
     14 The other two syscall arguments (mountflags and data) come from the
     15 "-o option,option,option" argument. The mountflags argument goes to the VFS
     16 (explained below), and the data argument is passed to the filesystem driver.
     17 
     18 The mount command's options string is a list of comma separated values. If
     19 there's more than one -o argument on the mount command line, they get glued
     20 together (in order) with a comma. The mount command also checks the file
     21 /etc/fstab for default options, and the options you specify on the command
     22 line get appended to those defaults (if any). Most other command line mount
     23 flags are just synonyms for adding option flags (for example
     24 "mount -o remount -w" is equivalent to "mount -o remount,rw"). Behind the
     25 scenes they all get appended to the -o string and fed to a common parser.
     26 
     27 VFS stands for "Virtual File System" and is the common infrastructure shared
     28 by different filesystems. It handles common things like making the filesystem
     29 read only. The mount command assembles an option string to supply to the "data"
     30 argument of the option syscall, but first it parses it for VFS options
     31 (ro,noexec,nodev,nosuid,noatime...) each of which corresponds to a flag
     32 from #include <sys/mount.h>. The mount command removes those options from the
     33 sting and sets the corresponding bit in mountflags, then the remaining options
     34 (if any) form the data argument for the filesystem driver.
     35 
     36 A few quick implementation details: the mountflag MS_SILENCE gets set by
     37 default even if there's nothing in /etc/fstab. Some actions (such as --bind
     38 and --move mounts, I.E. -o bind and -o move) are just VFS actions and don't
     39 require any specific filesystem at all. The "-o remount" flag requires looking
     40 up the filesystem in /proc/mounts and reassembling the full option string
     41 because you don't _just_ pass in the changed flags but have to reassemble
     42 the complete new filesystem state to give the system call. Some of the options
     43 in /etc/fstab are for the mount command (such as "user" which only does
     44 anything if the mount command has the suid bit set) and don't get passed
     45 through to the system call.
     46 
     47 When mounting a new filesystem, the "filesystem" argument to the mount system
     48 call specifies which filesystem driver to use. All the loaded drivers are
     49 listed in /proc/filesystems, but calling mount can also trigger a module load
     50 request to add another. A filesystem driver is responsible for putting files
     51 and subdirectories under the mount point: any time you open, close, read,
     52 write, truncate, list the contents of a directory, move, or delete a file,
     53 you're talking to a filesystem driver to do it. (Or when you call
     54 ioctl(), stat(), statvfs(), utime()...)
     55 
     56 Different drivers implement different filesystems, which have four categories:
     57 
     58 1) Block device backed filesystems, such as ext2 and vfat.
     59 
     60 This kind of filesystem driver acts as a lens to look at a block device
     61 through. The source argument for block backed filesystems is a path to a
     62 block device, such as "/dev/hda1", which stores the contents of the
     63 filesystem in a fixed block of sequential storage, and there's a seperate
     64 driver providing that block device.
     65 
     66 Block backed filesystems are the "conventional" filesystem type most people
     67 think of when they mount things. The name means that the "backing store"
     68 (where the data lives when the system is switched off) is on a block device.
     69 
     70 2) Server backed filesystems, such as cifs/samba or fuse.
     71 
     72 These drivers convert filesystem operations into a sequential stream of
     73 bytes, which it can send through a pipe to talk to a program. The filesystem
     74 server could be a local Filesystem in Userspace daemon (connected to a local
     75 process through a pipe filehandle), behind a network socket (CIFS and v9fs),
     76 behind a char device (/dev/ttyS0), and so on. The common attribute is there's
     77 some program on the other end sending and receiving a sequential bytestream.
     78 The backing store is a server somewhere, and the filesystem driver is talking
     79 to a process that reads and writes data in some known protocol.
     80 
     81 The source argument for these filesystems indicates where the filesystem lives. It's often in a URL-like format for network filesystems, but it's really just a blob of data that the filesystem driver understands.
     82 
     83 A lot of server backed filesystems want to open their own connection so they
     84 don't have to pass their data through a persistent local userspace process,
     85 not really for performance reasons but because in low memory situations a
     86 chicken-and-egg situation can develop where all the process's pages have
     87 been swapped out but the filesystem needs to write data to its backing
     88 store in order to free up memory so it can swap the process's pages back in.
     89 If this mechanism is providing the root filesystem, this can deadlock and
     90 freeze the system solid. So while you _can_ pass some of them a filehandle,
     91 more often than not you don't.
     92 
     93 These are also known as "pipe backed" filesystems (or "network filesystems"
     94 because that's a common case, although a network doesn't need to be inolved).
     95 Conceptually they're char device backed filesystems (analogus to the block
     96 device backed ones), but you don't commonly specify a character device in
     97 /dev when mounting them because you're talking to a specific server process,
     98 not a whole machine.
     99 
    100 3) Ram backed filesystems, such as ramfs and tmpfs.
    101 
    102 These are very simple filesystems that don't implement a backing store. Data
    103 written to these gets stored in the disk cache, and the driver ignores requests
    104 to flush it to backing store (reporting all the pages as pinned and
    105 unfreeable).
    106 
    107 These drivers essentially mount the VFS's page/dentry cache as if it was a
    108 filesystem. (Page cache stores file contents, dentry cache stores directory
    109 entries.) They grow and shrink dynamically, as needed: when you write files
    110 into them they allocate more memory to store it, and when you delete files
    111 the memory is freed.
    112 
    113 There's a simple one (ramfs) that does only that, and a more complex one (tmpfs)
    114 which adds a size limitation (by default 50%, but it's adjustable as a mount
    115 option) so the system doesn't run out of memory and lock up if you
    116 "cat /dev/zero > file", and can also report how much space is remaining
    117 when asked (ramfs always says 0 bytes free). The other thing tmpfs does
    118 is write its data out to swap space (like processes do) when the system
    119 is under memory proessure.
    120 
    121 Note that "ramdisk" is not the same as "ramfs". The ramdisk driver uses a
    122 chunk of memory to implement a block device, and then you can format that
    123 block device and mount it with a block device backed filesystem driver.
    124 (This is the same "two device drivers" approach you always have with block
    125 backed filesystems: one driver provides /dev/ram0 and the second driver mounts
    126 it as vfat.) Ram disks are significantly less efficient than ramfs,
    127 allocating a fixed amount of memory up front for the block device instead of
    128 dynamically resizing itself as files are written into an deleted from the
    129 page and dentry caches the way ramfs does.
    130 
    131 Note: initramfs cpio, tmpfs as rootfs.
    132 
    133 4) Synthetic filesystems, such as proc, sysfs, devpts...
    134 
    135 These filesystems don't have any backing store either, because they don't
    136 store arbitrary data the way the first three types of filesystems do.
    137 
    138 Instead they present artificial contents, which can represent processes or
    139 hardware or anything the driver writer wants them to show. Listing or reading
    140 from these files calls a driver function that produces whatever output it's
    141 programmed to, and writing to these files submits data to the driver which
    142 can do anything it wants with it.
    143 
    144 Synthetic ilesystems are often implemented to provide monitoring and control
    145 knobs for parts of the operating system. It's an alternative to adding more
    146 system calls (or ioctl, sysctl, etc), and provides a more human friendly user
    147 interface which programs can use but which users can also interact with
    148 directly from the command line via "cat" and redirecting the output of
    149 "echo" into special files.
    150 
    151 
    152 Those are the four types of filesystems: backing store can be a fixed length
    153 block of storage, backing store can be some server the driver connects to,
    154 backing store can not exist and the files merely reside in the disk cache,
    155 or the filesystem driver can just make up its contents programmatically.
    156 
    157 And that's how filesystems get mounted, using the mount system call which has
    158 five arguments. The "filesystem" argument specifies the driver implementing
    159 one of those filesystems, and the "source" and "data" arguments get fed to
    160 that driver. The "target" and "mountflags" arguments get parsed (and handled)
    161 by the generic VFS infrastructure. (The filesystem driver can peek at the
    162 VFS data, but generally doesn't need to care. The VFS tells the filesystem
    163 what to do, in response to what userspace said to do.)
    164