1 .de EX 2 .nf 3 .ft CW 4 .. 5 .de EE 6 .br 7 .fi 8 .ft 1 9 .. 10 awk 11 .TH AWK 1 12 .CT 1 files prog_other 13 .SH NAME 14 awk \- pattern-directed scanning and processing language 15 .SH SYNOPSIS 16 .B awk 17 [ 18 .BI \-F 19 .I fs 20 ] 21 [ 22 .BI \-v 23 .I var=value 24 ] 25 [ 26 .I 'prog' 27 | 28 .BI \-f 29 .I progfile 30 ] 31 [ 32 .I file ... 33 ] 34 .SH DESCRIPTION 35 .I Awk 36 scans each input 37 .I file 38 for lines that match any of a set of patterns specified literally in 39 .IR prog 40 or in one or more files 41 specified as 42 .B \-f 43 .IR progfile . 44 With each pattern 45 there can be an associated action that will be performed 46 when a line of a 47 .I file 48 matches the pattern. 49 Each line is matched against the 50 pattern portion of every pattern-action statement; 51 the associated action is performed for each matched pattern. 52 The file name 53 .B \- 54 means the standard input. 55 Any 56 .IR file 57 of the form 58 .I var=value 59 is treated as an assignment, not a filename, 60 and is executed at the time it would have been opened if it were a filename. 61 The option 62 .B \-v 63 followed by 64 .I var=value 65 is an assignment to be done before 66 .I prog 67 is executed; 68 any number of 69 .B \-v 70 options may be present. 71 The 72 .B \-F 73 .IR fs 74 option defines the input field separator to be the regular expression 75 .IR fs. 76 .PP 77 An input line is normally made up of fields separated by white space, 78 or by regular expression 79 .BR FS . 80 The fields are denoted 81 .BR $1 , 82 .BR $2 , 83 \&..., while 84 .B $0 85 refers to the entire line. 86 If 87 .BR FS 88 is null, the input line is split into one field per character. 89 .PP 90 A pattern-action statement has the form 91 .IP 92 .IB pattern " { " action " } 93 .PP 94 A missing 95 .BI { " action " } 96 means print the line; 97 a missing pattern always matches. 98 Pattern-action statements are separated by newlines or semicolons. 99 .PP 100 An action is a sequence of statements. 101 A statement can be one of the following: 102 .PP 103 .EX 104 .ta \w'\f(CWdelete array[expression]'u 105 .RS 106 .nf 107 .ft CW 108 if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP 109 while(\fI expression \fP)\fI statement\fP 110 for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP 111 for(\fI var \fPin\fI array \fP)\fI statement\fP 112 do\fI statement \fPwhile(\fI expression \fP) 113 break 114 continue 115 {\fR [\fP\fI statement ... \fP\fR] \fP} 116 \fIexpression\fP #\fR commonly\fP\fI var = expression\fP 117 print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP 118 printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP 119 return\fR [ \fP\fIexpression \fP\fR]\fP 120 next #\fR skip remaining patterns on this input line\fP 121 nextfile #\fR skip rest of this file, open next, start at top\fP 122 delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP 123 delete\fI array\fP #\fR delete all elements of array\fP 124 exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP 125 .fi 126 .RE 127 .EE 128 .DT 129 .PP 130 Statements are terminated by 131 semicolons, newlines or right braces. 132 An empty 133 .I expression-list 134 stands for 135 .BR $0 . 136 String constants are quoted \&\f(CW"\ "\fR, 137 with the usual C escapes recognized within. 138 Expressions take on string or numeric values as appropriate, 139 and are built using the operators 140 .B + \- * / % ^ 141 (exponentiation), and concatenation (indicated by white space). 142 The operators 143 .B 144 ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?: 145 are also available in expressions. 146 Variables may be scalars, array elements 147 (denoted 148 .IB x [ i ] ) 149 or fields. 150 Variables are initialized to the null string. 151 Array subscripts may be any string, 152 not necessarily numeric; 153 this allows for a form of associative memory. 154 Multiple subscripts such as 155 .B [i,j,k] 156 are permitted; the constituents are concatenated, 157 separated by the value of 158 .BR SUBSEP . 159 .PP 160 The 161 .B print 162 statement prints its arguments on the standard output 163 (or on a file if 164 .BI > file 165 or 166 .BI >> file 167 is present or on a pipe if 168 .BI | cmd 169 is present), separated by the current output field separator, 170 and terminated by the output record separator. 171 .I file 172 and 173 .I cmd 174 may be literal names or parenthesized expressions; 175 identical string values in different statements denote 176 the same open file. 177 The 178 .B printf 179 statement formats its expression list according to the format 180 (see 181 .IR printf (3)) . 182 The built-in function 183 .BI close( expr ) 184 closes the file or pipe 185 .IR expr . 186 The built-in function 187 .BI fflush( expr ) 188 flushes any buffered output for the file or pipe 189 .IR expr . 190 .PP 191 The mathematical functions 192 .BR exp , 193 .BR log , 194 .BR sqrt , 195 .BR sin , 196 .BR cos , 197 and 198 .BR atan2 199 are built in. 200 Other built-in functions: 201 .TF length 202 .TP 203 .B length 204 the length of its argument 205 taken as a string, 206 or of 207 .B $0 208 if no argument. 209 .TP 210 .B rand 211 random number on (0,1) 212 .TP 213 .B srand 214 sets seed for 215 .B rand 216 and returns the previous seed. 217 .TP 218 .B int 219 truncates to an integer value 220 .TP 221 .BI substr( s , " m" , " n\fB) 222 the 223 .IR n -character 224 substring of 225 .I s 226 that begins at position 227 .IR m 228 counted from 1. 229 .TP 230 .BI index( s , " t" ) 231 the position in 232 .I s 233 where the string 234 .I t 235 occurs, or 0 if it does not. 236 .TP 237 .BI match( s , " r" ) 238 the position in 239 .I s 240 where the regular expression 241 .I r 242 occurs, or 0 if it does not. 243 The variables 244 .B RSTART 245 and 246 .B RLENGTH 247 are set to the position and length of the matched string. 248 .TP 249 .BI split( s , " a" , " fs\fB) 250 splits the string 251 .I s 252 into array elements 253 .IB a [1] , 254 .IB a [2] , 255 \&..., 256 .IB a [ n ] , 257 and returns 258 .IR n . 259 The separation is done with the regular expression 260 .I fs 261 or with the field separator 262 .B FS 263 if 264 .I fs 265 is not given. 266 An empty string as field separator splits the string 267 into one array element per character. 268 .TP 269 .BI sub( r , " t" , " s\fB) 270 substitutes 271 .I t 272 for the first occurrence of the regular expression 273 .I r 274 in the string 275 .IR s . 276 If 277 .I s 278 is not given, 279 .B $0 280 is used. 281 .TP 282 .B gsub 283 same as 284 .B sub 285 except that all occurrences of the regular expression 286 are replaced; 287 .B sub 288 and 289 .B gsub 290 return the number of replacements. 291 .TP 292 .BI sprintf( fmt , " expr" , " ...\fB ) 293 the string resulting from formatting 294 .I expr ... 295 according to the 296 .IR printf (3) 297 format 298 .I fmt 299 .TP 300 .BI system( cmd ) 301 executes 302 .I cmd 303 and returns its exit status 304 .TP 305 .BI tolower( str ) 306 returns a copy of 307 .I str 308 with all upper-case characters translated to their 309 corresponding lower-case equivalents. 310 .TP 311 .BI toupper( str ) 312 returns a copy of 313 .I str 314 with all lower-case characters translated to their 315 corresponding upper-case equivalents. 316 .PD 317 .PP 318 The ``function'' 319 .B getline 320 sets 321 .B $0 322 to the next input record from the current input file; 323 .B getline 324 .BI < file 325 sets 326 .B $0 327 to the next record from 328 .IR file . 329 .B getline 330 .I x 331 sets variable 332 .I x 333 instead. 334 Finally, 335 .IB cmd " | getline 336 pipes the output of 337 .I cmd 338 into 339 .BR getline ; 340 each call of 341 .B getline 342 returns the next line of output from 343 .IR cmd . 344 In all cases, 345 .B getline 346 returns 1 for a successful input, 347 0 for end of file, and \-1 for an error. 348 .PP 349 Patterns are arbitrary Boolean combinations 350 (with 351 .BR "! || &&" ) 352 of regular expressions and 353 relational expressions. 354 Regular expressions are as in 355 .IR egrep ; 356 see 357 .IR grep (1). 358 Isolated regular expressions 359 in a pattern apply to the entire line. 360 Regular expressions may also occur in 361 relational expressions, using the operators 362 .BR ~ 363 and 364 .BR !~ . 365 .BI / re / 366 is a constant regular expression; 367 any string (constant or variable) may be used 368 as a regular expression, except in the position of an isolated regular expression 369 in a pattern. 370 .PP 371 A pattern may consist of two patterns separated by a comma; 372 in this case, the action is performed for all lines 373 from an occurrence of the first pattern 374 though an occurrence of the second. 375 .PP 376 A relational expression is one of the following: 377 .IP 378 .I expression matchop regular-expression 379 .br 380 .I expression relop expression 381 .br 382 .IB expression " in " array-name 383 .br 384 .BI ( expr , expr,... ") in " array-name 385 .PP 386 where a relop is any of the six relational operators in C, 387 and a matchop is either 388 .B ~ 389 (matches) 390 or 391 .B !~ 392 (does not match). 393 A conditional is an arithmetic expression, 394 a relational expression, 395 or a Boolean combination 396 of these. 397 .PP 398 The special patterns 399 .B BEGIN 400 and 401 .B END 402 may be used to capture control before the first input line is read 403 and after the last. 404 .B BEGIN 405 and 406 .B END 407 do not combine with other patterns. 408 .PP 409 Variable names with special meanings: 410 .TF FILENAME 411 .TP 412 .B CONVFMT 413 conversion format used when converting numbers 414 (default 415 .BR "%.6g" ) 416 .TP 417 .B FS 418 regular expression used to separate fields; also settable 419 by option 420 .BI \-F fs. 421 .TP 422 .BR NF 423 number of fields in the current record 424 .TP 425 .B NR 426 ordinal number of the current record 427 .TP 428 .B FNR 429 ordinal number of the current record in the current file 430 .TP 431 .B FILENAME 432 the name of the current input file 433 .TP 434 .B RS 435 input record separator (default newline) 436 .TP 437 .B OFS 438 output field separator (default blank) 439 .TP 440 .B ORS 441 output record separator (default newline) 442 .TP 443 .B OFMT 444 output format for numbers (default 445 .BR "%.6g" ) 446 .TP 447 .B SUBSEP 448 separates multiple subscripts (default 034) 449 .TP 450 .B ARGC 451 argument count, assignable 452 .TP 453 .B ARGV 454 argument array, assignable; 455 non-null members are taken as filenames 456 .TP 457 .B ENVIRON 458 array of environment variables; subscripts are names. 459 .PD 460 .PP 461 Functions may be defined (at the position of a pattern-action statement) thus: 462 .IP 463 .B 464 function foo(a, b, c) { ...; return x } 465 .PP 466 Parameters are passed by value if scalar and by reference if array name; 467 functions may be called recursively. 468 Parameters are local to the function; all other variables are global. 469 Thus local variables may be created by providing excess parameters in 470 the function definition. 471 .SH EXAMPLES 472 .TP 473 .EX 474 length($0) > 72 475 .EE 476 Print lines longer than 72 characters. 477 .TP 478 .EX 479 { print $2, $1 } 480 .EE 481 Print first two fields in opposite order. 482 .PP 483 .EX 484 BEGIN { FS = ",[ \et]*|[ \et]+" } 485 { print $2, $1 } 486 .EE 487 .ns 488 .IP 489 Same, with input fields separated by comma and/or blanks and tabs. 490 .PP 491 .EX 492 .nf 493 { s += $1 } 494 END { print "sum is", s, " average is", s/NR } 495 .fi 496 .EE 497 .ns 498 .IP 499 Add up first column, print sum and average. 500 .TP 501 .EX 502 /start/, /stop/ 503 .EE 504 Print all lines between start/stop pairs. 505 .PP 506 .EX 507 .nf 508 BEGIN { # Simulate echo(1) 509 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 510 printf "\en" 511 exit } 512 .fi 513 .EE 514 .SH SEE ALSO 515 .IR lex (1), 516 .IR sed (1) 517 .br 518 A. V. Aho, B. W. Kernighan, P. J. Weinberger, 519 .I 520 The AWK Programming Language, 521 Addison-Wesley, 1988. ISBN 0-201-07981-X 522 .SH BUGS 523 There are no explicit conversions between numbers and strings. 524 To force an expression to be treated as a number add 0 to it; 525 to force it to be treated as a string concatenate 526 \&\f(CW""\fP to it. 527 .br 528 The scope rules for variables in functions are a botch; 529 the syntax is worse. 530