Home | History | Annotate | Download | only in ffsb-6.0-rc2
      1 Introduction:
      2 
      3 The Flexible Filesystem Benchmark (FFSB) is a filesystem performance
      4 measurement tool.  It is a multi-threaded application (using
      5 pthreads), written entirely in C with cross-platform portability in
      6 mind.  It differs from other filesystem benchmarks in that the user
      7 may supply a profile to create custom workloads, while most other
      8 filesystem benchmarks use a fixed set of workloads.
      9 
     10 As of version 5.1, it supports seven different basic operations, support
     11 for multiple groups of threads with different operation mixtures,
     12 support for operation across multiple filesystems, and support for
     13 filesystem aging prior to benchmarking.
     14 
     15 
     16 Differences from version 4.0 and older:
     17 
     18 Version 5.0 and above represent almost a total re-write and many
     19 things have changed.  In version 5.0 and above FFSB moved to a
     20 time-regulated run versus doing a set number of different operations
     21 and timing the whole thing.  This is primarily to better deal with the
     22 use of multiple threadgroups which would otherwise not be synchronized
     23 at termination time.
     24 
     25 Additionally, the FFSB configuration file format has changed in
     26 version 5.0, although we do support old-style configuration files
     27 along with a run-time passed on the command line.  In this mode,
     28 version 5.0 and above ignores the iterations parameter, and simply
     29 uses the time specified on the command line.
     30 
     31 Behaviorally, most of the old operations are the same -- sequential
     32 reads and sequential writes work as they did before.  One change in
     33 version 5.0 is the skip-read behavior of reading then seeking forward
     34 a fixed amount then reading again is removed, we now support fully
     35 randomized reads and writes from random offsets within the file.
     36 
     37 Version 4.0 didn't support overwrites (only appends) so we interpret
     38 writes in old config files to be append operations.
     39 
     40 On Linux, CPU utilization information will only be accurate for
     41 systems using NPTL, older Linuxthreads systems will probably only see
     42 zeros for CPU utilization because Linuxthreads is non-compliant to
     43 POSIX. Version 4.0 and older could be recompiled to work on
     44 Linuxthreads, but in 5.0 and later we no longer support this.
     45 
     46 We no longer support the "outputfile" on the command line.
     47 
     48 One should simply use tee or similar to capture the output.  FFSB
     49 unbuffers standard out for this purpose, and errors are sent on
     50 standard error.
     51 
     52 Global options:
     53 
     54 There are eight valid global options placed at the beginning of the
     55 profile.  Three of them are required: num_filesystems (number of
     56 filesystems), num_threadgroups (number of threadgroups), and time
     57 (running time of the benchmark).  The other five options are:
     58 
     59 directio   - each call to open will be made using O_DIRECT
     60 alignio    - aligns all block operations for random reads and writes
     61              on 4k boundaries.
     62 bufferedio - currently ignorred: it is intended to use libc
     63              fread,rwrite, instead of just unix read and write calls
     64 verbose    - currently ignored
     65 
     66 callout    - calls and external command and waits for its termination
     67 	     before FFSB begins the benchmark phase.
     68 	     This is useful for synchronizing distributed clients,
     69 	     starting profilers, etc.
     70 
     71 They must be specified in the above order (num_filesystems,
     72 num_threadgroups, time, directio, alignio, bufferedio, verbose,
     73 callout).
     74 
     75 
     76 
     77 Filesystems:
     78 
     79 Filesystems are specified to FFSB in the form of a directory.  FFSB
     80 assumes that the filesystem is mounted at this directory and will not
     81 do any verification of this fact beyond ensuring it can read/write to
     82 the location.  So be careful to ensure something with enough space to
     83 handle the dataset is in fact mounted at the specified location.
     84 
     85 In the filesystem clause of the profile, one may set the starting
     86 number of files and directories as well as a minimum and maximum
     87 filesize for the filesystem.  One may also specify the blocksize
     88 used for creating the files separately in the filesystem clause.
     89 
     90 Also, if a filesystem is to be aged, a special threadgroup clause may
     91 be embedded in a filesystem clause to specify the operation mixture
     92 and number of threads used to age the filesystem.  This threadgroup is
     93 run until filesystem utilization reaches the specified amount.
     94 
     95 Inheritance --  if you are using multiple filesystems, all attributes
     96 except the location should be inherited from the previous filesystem.
     97 This is done to make it easier to add groups of similar filesystems.
     98 In this case, only the location is required in the filesystem clause.
     99 
    100 As of version 5.1, filesystem re-use is supported if a given
    101 filesystem hasn't been modified beyond it's orginal specifications
    102 (number of files and directories is correct, and file sizes are within
    103 specifications).  This can be a huge time saver if one wishes to do
    104 multiple runs on the same data-set without altering it during a run,
    105 because the fileset doesn't need to be recreated before each run.
    106 
    107 To do this, specify "reuse=1" in the filesystem clause, and FFSB will
    108 verify the fileset first, and if it checks out it will use it.
    109 Otherwise, it will remove everything and re-create the filesets for
    110 that filesystem.
    111 
    112 Threadgroups:
    113 
    114 An arbitrary number of threadgroups with differing numbers of threads
    115 and operation mixes can be specified.  The operations are specified
    116 using a weighting for each operation, if an operation isn't specified
    117 it's weighting is assumed to be zero (not used).
    118 
    119 "Think-time" for a threadgroup may also be specified in millisecond
    120 amounts using the "op_delay" parameter, where every thread will wait
    121 for the specified amount between each operation.
    122 
    123 Operations:
    124 
    125 All operations begin by randomly selecting a filesystem from the list
    126 of filesystems specified in the profile.  The distribution aims to be
    127 uniform across all filesystems.
    128 
    129 
    130 The seven operations are:
    131 
    132 reads  - read() calls with an overall amount and a blocksize
    133          operates on existing files.  Care must be taken to ensure
    134          that the read amount is smaller than the size of any possible
    135          file.
    136 
    137 	 If random_read is specified, then the each individual blocks
    138          will be read starting from a random point with the file, and
    139          this will continune until the entire amount specifed has been
    140          read.  This offset of each random block will be totally
    141          random to the byte level, unless the "alignio" global parameter
    142          is on, and then the reads will be 4096 byte aligned.  This is
    143          generally recommended.
    144 
    145 
    146 readall - Very similar to read above, except it doesn't take an
    147           amount; it simply reads the entire file sequentially using the
    148           read_blocksize.   This is useful for situations where
    149 	  different filesystems have differently sized files, and sequential
    150 	  read patterns across all filesystems are desired.
    151 
    152 writes - write() calls with an overall amount and blocksize
    153          this is an overwrite operation and will not enlarge an existing
    154          file, again one must be careful not to specify a write amount
    155          that is larger than any possible file in the data set.
    156 
    157 	 If random_write is specified, then the each individual blocks
    158          will be written starting from a random point with the file, and
    159          this will continune until the entire amount specifed has been
    160          written out.  This offset of each random block will be totally
    161          random to the byte level, unless the "alignio" global parameter
    162          is on, and then the writes will be 4096 byte aligned.  This
    163          is generally recommended.
    164 
    165 	 If the fsync_flag parameter for the threadgroup is non-zero,
    166 	 then after all of the write calls are finished, fsync() will
    167 	 be called on the file descriptor before the file is closed.
    168 
    169 
    170 creates - creates a file using open() call and determines the size
    171           randomly between on the constraints (min_filesize and
    172           max_filesize) for the selected filesystem. Write operations will
    173           be done using the same blocksize as is specified for the
    174           write operation.
    175 deletes - calls unlink() on a filename and removes it from the
    176           internal data-structures.  One must be careful to ensure
    177           there are enough files to delete at all times or else the benchmark
    178           will terminate.
    179 appends - calls write() using the append flag with an overall amount
    180           and a blocksize to be appended onto a randomly chosen file.
    181 metas   - this is actually a mix of several different directory
    182           operations.  Each "meta" operation consists of two directory
    183           creates, one directory remove, and a directory rename.
    184           These operations are all carried out separately from the
    185           other 5 operations.
    186 
    187 Operation accounting:
    188 
    189 Each operation which uses a blocksize counts each read/write of a
    190 blocksize as an operation (reads,writes,creates, and appends) whereas
    191 deletes and metas are considered single operations.
    192 
    193 Running the benchmark:
    194 
    195 There are three phases to running the benchmark, aging, fileset
    196 creates, and the benchmark phase.
    197 
    198 The create phase is carried out across all filesystems simultanously
    199 with one dedicated thread per filesystem.
    200 
    201 After the create phase, sync() is called to ensure all dirty data gets
    202 written out before the benchmark phase begins, and sync() is again
    203 called at the end of the benchmark phase.  The time in sync() at the
    204 end of the benchmark phase is counted as part of the benchmark phase.
    205 
    206 Caveats/Holes/Bugs:
    207 
    208 Aging and aging across multiple filesystems simultaneously hasn't been tested
    209 very much.
    210 
    211 If *any* i/o operation or system call/libc call fails, the benchmark
    212 will terminate immediately.
    213 
    214 The parser doesn't handle mal-formed or incorrect profiles very well
    215 (or at all).
    216 
    217 The parser doesn't check to make sure all of the appropriate options
    218 have been specified.  For example, if writes are specified in a
    219 threadgroup but write_blocksize isn't specified, the parse won't catch
    220 it, but the benchmark run will fail later on.
    221 
    222 
    223 Configuration Files (new style):
    224 
    225 New Style Configuration allows for arbitrary newlines between lines,
    226 and comments using '#' at the start of a line.  Also it allows tabs,
    227 whitespace before and after configuration parameters.
    228 
    229 The new style configuration file is broken up into three main parts:
    230 
    231 global parameters, filesystems, and threadgroups
    232 
    233 The sections must be in the above order.
    234 
    235 Global parameters:
    236 
    237 Global Paramters are described above, the first three are always
    238 required. Example:
    239 
    240 ----------
    241 
    242 num_filesystems=1
    243 num_threadgroups=1
    244 time=30 		# time is in seconds
    245 
    246 directio=0 		# don't use direct io
    247 alignio=1  		# align random IOs to 4k
    248 bufferedio=0		# this does nothing right now
    249 verbose=0		# this does nothing right now
    250 
    251 			# calls and external command and waits
    252 			# everything until the newline is taken
    253 			# so you can have abritrary parmeters
    254 callout=synchronize.sh myhostname
    255 
    256 ---------
    257 
    258 All of these must appear in this order, though you can leave out the
    259 optional ones.
    260 
    261 Filesystems:
    262 
    263 Filesystems describe different logical sets of files residing in
    264 different directorys.  There is no strict requirement that they
    265 actually be on different filesystems, only that the directory
    266 specified already exists.
    267 
    268 Filesystems are specified by a clause with a filesystem number like
    269 this:
    270 
    271 [filesystem0]
    272 	location=/mnt/testing/
    273 	num_files=10
    274 	num_dirs=1
    275 	max_filesize=4096
    276 	min_filesize=4096
    277 [end0]
    278 
    279 
    280 The clause must always begin with [filesystemX] and end with [endX]
    281 where X is the number of that filesystem.
    282 
    283 You should start wiht X = 0, and increment by one for each following
    284 filesystem.  If they are out of order, things will likely break.
    285 
    286 The required information for each filesystem is: location, num_files,
    287 num_dirs, max_filesize, and min_filesize.  Beyond those the following
    288 four options are supported:
    289 
    290 
    291 
    292 reuse=1 # check the filesystem to see if it is reusable
    293 
    294 	# filesystem aging, three components required
    295 	# takes agefs=1 to turn it on
    296 	# then a valid threadgroup specification
    297 	# then a desired utilization percentage
    298 
    299 agefs=1 # age the filesystem according to the following threadgroup
    300 	[threadgroup0]
    301 		num_threads=10
    302 		write_size=40960
    303 		write_blocksize=4096
    304 		create_weight=10
    305 		append_weight=10
    306 		delete_weight=1
    307 	[end0]
    308 desired_util=0.20	# In this case, age until the fs is 20% full
    309 
    310 create_blocksize=4096   # specify the blocksize to write()
    311 		        # for creating the fileset, defaults to 4096
    312 
    313 age_blocksize=4096      # specify the blocksize to write() for aging
    314 
    315 
    316 Also, to allow lazy people to use lots of filesystems, we support
    317 filesystem inheritance, which simply copies all options but the
    318 location from the previous filesystem clause if nothing is specified.
    319 Obviously, this doesn't work for filesystem0. (May not work for aging
    320 either?)
    321 
    322 Full blown filesystem clause example:
    323 
    324 ----
    325 
    326 [filesystem0]
    327 
    328 	# required parts
    329 
    330 	location=/home/sonny/tmp
    331 	num_files=100
    332 	num_dirs=100
    333 	max_filesize=65536
    334 	min_filesize=4096
    335 
    336 	# aging part
    337 	agefs=0
    338 	[threadgroup0]
    339 		num_threads=10
    340 		write_size=40960
    341 		write_blocksize=4096
    342 		create_weight=10
    343 		append_weight=10
    344 		delete_weight=1
    345 	[end0]
    346 		desired_util=0.02	# age until 2% full
    347 
    348 	# other optional commands
    349 
    350 	create_blocksize=1024		# use a small create blocksize
    351 	age_blocksize=1024		# and smaller age create blocksize
    352 	reuse=0	                        # don't reuse it
    353 [end0]
    354 
    355 
    356 
    357 --
    358 
    359 Threadgroups:
    360 
    361 Threadgropus are very similar to filesystems in that any number of
    362 them can be specified in clauses, and they must be in order starting
    363 with threadgroup0.
    364 
    365 Example:
    366 
    367 ---
    368 
    369 [threadgroup0]
    370 	num_threads=32
    371 	read_weight=4
    372 	append_weight=1
    373 
    374 	write_size=4096
    375 	write_blocksize=4096
    376 
    377 	read_size=4096
    378 	read_blocksize=4096
    379 [end0]
    380 
    381 ---
    382 
    383 In a threadgroup clause, num_threads is required and must be at least
    384 1.  Then, at least one operation must be given a weight greater than 0
    385 to be a valid threadgroup.  Operations can be given a weighting of 0,
    386 and in this case they are ignored.
    387 
    388 Certain operations will also require other commands, for example, if
    389 read_weight is greater than zero, then one must also include a
    390 read_size and a read_blocksize.  Here's the table of requirements and
    391 options:
    392 
    393 
    394 Operation		Requirements			Options
    395 --			--				--
    396 read_weight		read_size, read_blocksize	read_random
    397 readall_weight		read_blocksize			none
    398 write_weight		write_size, write_blocksize	write_random,fsync_file
    399 create_weight		write_blocksize or create_blocksize	none
    400 append_weight		write_blocksize, write_size	none
    401 delete_weight		none				none
    402 meta_weight		none				none
    403 
    404 
    405 
    406 Other threadgroup options:
    407 
    408 op_delay=10  # specify a wait between operations in milli-seconds
    409 
    410 bindfs=3     # This allows you to restrict a threadgroup's operation
    411              # to a specific filesystem number.  Currently only
    412 	     # binding to one specific filesystem is supported
    413 
    414