1 .de EX 2 .nf 3 .ft CW 4 .. 5 .de EE 6 .br 7 .fi 8 .ft 1 9 .. 10 .TH AWK 1 11 .CT 1 files prog_other 12 .SH NAME 13 awk \- pattern-directed scanning and processing language 14 .SH SYNOPSIS 15 .B awk 16 [ 17 .BI \-F 18 .I fs 19 ] 20 [ 21 .BI \-v 22 .I var=value 23 ] 24 [ 25 .I 'prog' 26 | 27 .BI \-f 28 .I progfile 29 ] 30 [ 31 .I file ... 32 ] 33 .SH DESCRIPTION 34 .I Awk 35 scans each input 36 .I file 37 for lines that match any of a set of patterns specified literally in 38 .I prog 39 or in one or more files 40 specified as 41 .B \-f 42 .IR progfile . 43 With each pattern 44 there can be an associated action that will be performed 45 when a line of a 46 .I file 47 matches the pattern. 48 Each line is matched against the 49 pattern portion of every pattern-action statement; 50 the associated action is performed for each matched pattern. 51 The file name 52 .B \- 53 means the standard input. 54 Any 55 .I file 56 of the form 57 .I var=value 58 is treated as an assignment, not a filename, 59 and is executed at the time it would have been opened if it were a filename. 60 The option 61 .B \-v 62 followed by 63 .I var=value 64 is an assignment to be done before 65 .I prog 66 is executed; 67 any number of 68 .B \-v 69 options may be present. 70 The 71 .B \-F 72 .I fs 73 option defines the input field separator to be the regular expression 74 .IR fs . 75 .PP 76 An input line is normally made up of fields separated by white space, 77 or by the regular expression 78 .BR FS . 79 The fields are denoted 80 .BR $1 , 81 .BR $2 , 82 \&..., while 83 .B $0 84 refers to the entire line. 85 If 86 .BR FS 87 is null, the input line is split into one field per character. 88 .PP 89 A pattern-action statement has the form: 90 .IP 91 .IB pattern " { " action " } 92 .PP 93 A missing 94 .BI { " action " } 95 means print the line; 96 a missing pattern always matches. 97 Pattern-action statements are separated by newlines or semicolons. 98 .PP 99 An action is a sequence of statements. 100 A statement can be one of the following: 101 .PP 102 .EX 103 .ta \w'\f(CWdelete array[expression]\fR'u 104 .RS 105 .nf 106 .ft CW 107 if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP 108 while(\fI expression \fP)\fI statement\fP 109 for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP 110 for(\fI var \fPin\fI array \fP)\fI statement\fP 111 do\fI statement \fPwhile(\fI expression \fP) 112 break 113 continue 114 {\fR [\fP\fI statement ... \fP\fR] \fP} 115 \fIexpression\fP #\fR commonly\fP\fI var = expression\fP 116 print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP 117 printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP 118 return\fR [ \fP\fIexpression \fP\fR]\fP 119 next #\fR skip remaining patterns on this input line\fP 120 nextfile #\fR skip rest of this file, open next, start at top\fP 121 delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP 122 delete\fI array\fP #\fR delete all elements of array\fP 123 exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP 124 .fi 125 .RE 126 .EE 127 .DT 128 .PP 129 Statements are terminated by 130 semicolons, newlines or right braces. 131 An empty 132 .I expression-list 133 stands for 134 .BR $0 . 135 String constants are quoted \&\f(CW"\ "\fR, 136 with the usual C escapes recognized within. 137 Expressions take on string or numeric values as appropriate, 138 and are built using the operators 139 .B + \- * / % ^ 140 (exponentiation), and concatenation (indicated by white space). 141 The operators 142 .B 143 ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?: 144 are also available in expressions. 145 Variables may be scalars, array elements 146 (denoted 147 .IB x [ i ] \fR) 148 or fields. 149 Variables are initialized to the null string. 150 Array subscripts may be any string, 151 not necessarily numeric; 152 this allows for a form of associative memory. 153 Multiple subscripts such as 154 .B [i,j,k] 155 are permitted; the constituents are concatenated, 156 separated by the value of 157 .BR SUBSEP . 158 .PP 159 The 160 .B print 161 statement prints its arguments on the standard output 162 (or on a file if 163 .BI > " file 164 or 165 .BI >> " file 166 is present or on a pipe if 167 .BI | " cmd 168 is present), separated by the current output field separator, 169 and terminated by the output record separator. 170 .I file 171 and 172 .I cmd 173 may be literal names or parenthesized expressions; 174 identical string values in different statements denote 175 the same open file. 176 The 177 .B printf 178 statement formats its expression list according to the 179 .I format 180 (see 181 .IR printf (3)). 182 The built-in function 183 .BI close( expr ) 184 closes the file or pipe 185 .IR expr . 186 The built-in function 187 .BI fflush( expr ) 188 flushes any buffered output for the file or pipe 189 .IR expr . 190 .PP 191 The mathematical functions 192 .BR atan2 , 193 .BR cos , 194 .BR exp , 195 .BR log , 196 .BR sin , 197 and 198 .B sqrt 199 are built in. 200 Other built-in functions: 201 .TF length 202 .TP 203 .B length 204 the length of its argument 205 taken as a string, 206 number of elements in an array for an array argument, 207 or length of 208 .B $0 209 if no argument. 210 .TP 211 .B rand 212 random number on (0,1) 213 .TP 214 .B srand 215 sets seed for 216 .B rand 217 and returns the previous seed. 218 .TP 219 .B int 220 truncates to an integer value 221 .TP 222 \fBsubstr(\fIs\fB, \fIm\fR [\fB, \fIn\^\fR]\fB)\fR 223 the 224 .IR n -character 225 substring of 226 .I s 227 that begins at position 228 .I m 229 counted from 1. 230 If no 231 .IR m , 232 use the rest of the string 233 .I 234 .TP 235 .BI index( s , " t" ) 236 the position in 237 .I s 238 where the string 239 .I t 240 occurs, or 0 if it does not. 241 .TP 242 .BI match( s , " r" ) 243 the position in 244 .I s 245 where the regular expression 246 .I r 247 occurs, or 0 if it does not. 248 The variables 249 .B RSTART 250 and 251 .B RLENGTH 252 are set to the position and length of the matched string. 253 .TP 254 \fBsplit(\fIs\fB, \fIa \fR[\fB, \fIfs\^\fR]\fB)\fR 255 splits the string 256 .I s 257 into array elements 258 .IB a [1] \fR, 259 .IB a [2] \fR, 260 \&..., 261 .IB a [ n ] \fR, 262 and returns 263 .IR n . 264 The separation is done with the regular expression 265 .I fs 266 or with the field separator 267 .B FS 268 if 269 .I fs 270 is not given. 271 An empty string as field separator splits the string 272 into one array element per character. 273 .TP 274 \fBsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB) 275 substitutes 276 .I t 277 for the first occurrence of the regular expression 278 .I r 279 in the string 280 .IR s . 281 If 282 .I s 283 is not given, 284 .B $0 285 is used. 286 .TP 287 \fBgsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB) 288 same as 289 .B sub 290 except that all occurrences of the regular expression 291 are replaced; 292 .B sub 293 and 294 .B gsub 295 return the number of replacements. 296 .TP 297 .BI sprintf( fmt , " expr" , " ...\fB) 298 the string resulting from formatting 299 .I expr ... 300 according to the 301 .IR printf (3) 302 format 303 .IR fmt . 304 .TP 305 .BI system( cmd ) 306 executes 307 .I cmd 308 and returns its exit status. This will be \-1 upon error, 309 .IR cmd 's 310 exit status upon a normal exit, 311 256 + 312 .I sig 313 upon death-by-signal, where 314 .I sig 315 is the number of the murdering signal, 316 or 512 + 317 .I sig 318 if there was a core dump. 319 .TP 320 .BI tolower( str ) 321 returns a copy of 322 .I str 323 with all upper-case characters translated to their 324 corresponding lower-case equivalents. 325 .TP 326 .BI toupper( str ) 327 returns a copy of 328 .I str 329 with all lower-case characters translated to their 330 corresponding upper-case equivalents. 331 .PD 332 .PP 333 The ``function'' 334 .B getline 335 sets 336 .B $0 337 to the next input record from the current input file; 338 .B getline 339 .BI < " file 340 sets 341 .B $0 342 to the next record from 343 .IR file . 344 .B getline 345 .I x 346 sets variable 347 .I x 348 instead. 349 Finally, 350 .IB cmd " | getline 351 pipes the output of 352 .I cmd 353 into 354 .BR getline ; 355 each call of 356 .B getline 357 returns the next line of output from 358 .IR cmd . 359 In all cases, 360 .B getline 361 returns 1 for a successful input, 362 0 for end of file, and \-1 for an error. 363 .PP 364 Patterns are arbitrary Boolean combinations 365 (with 366 .BR "! || &&" ) 367 of regular expressions and 368 relational expressions. 369 Regular expressions are as in 370 .IR egrep ; 371 see 372 .IR grep (1). 373 Isolated regular expressions 374 in a pattern apply to the entire line. 375 Regular expressions may also occur in 376 relational expressions, using the operators 377 .B ~ 378 and 379 .BR !~ . 380 .BI / re / 381 is a constant regular expression; 382 any string (constant or variable) may be used 383 as a regular expression, except in the position of an isolated regular expression 384 in a pattern. 385 .PP 386 A pattern may consist of two patterns separated by a comma; 387 in this case, the action is performed for all lines 388 from an occurrence of the first pattern 389 though an occurrence of the second. 390 .PP 391 A relational expression is one of the following: 392 .IP 393 .I expression matchop regular-expression 394 .br 395 .I expression relop expression 396 .br 397 .IB expression " in " array-name 398 .br 399 .BI ( expr , expr,... ") in " array-name 400 .PP 401 where a 402 .I relop 403 is any of the six relational operators in C, 404 and a 405 .I matchop 406 is either 407 .B ~ 408 (matches) 409 or 410 .B !~ 411 (does not match). 412 A conditional is an arithmetic expression, 413 a relational expression, 414 or a Boolean combination 415 of these. 416 .PP 417 The special patterns 418 .B BEGIN 419 and 420 .B END 421 may be used to capture control before the first input line is read 422 and after the last. 423 .B BEGIN 424 and 425 .B END 426 do not combine with other patterns. 427 They may appear multiple times in a program and execute 428 in the order they are read by 429 .IR awk . 430 .PP 431 Variable names with special meanings: 432 .TF FILENAME 433 .TP 434 .B ARGC 435 argument count, assignable. 436 .TP 437 .B ARGV 438 argument array, assignable; 439 non-null members are taken as filenames. 440 .TP 441 .B CONVFMT 442 conversion format used when converting numbers 443 (default 444 .BR "%.6g" ). 445 .TP 446 .B ENVIRON 447 array of environment variables; subscripts are names. 448 .TP 449 .B FILENAME 450 the name of the current input file. 451 .TP 452 .B FNR 453 ordinal number of the current record in the current file. 454 .TP 455 .B FS 456 regular expression used to separate fields; also settable 457 by option 458 .BI \-F fs\fR. 459 .TP 460 .BR NF 461 number of fields in the current record. 462 .TP 463 .B NR 464 ordinal number of the current record. 465 .TP 466 .B OFMT 467 output format for numbers (default 468 .BR "%.6g" ). 469 .TP 470 .B OFS 471 output field separator (default space). 472 .TP 473 .B ORS 474 output record separator (default newline). 475 .TP 476 .B RLENGTH 477 the length of a string matched by 478 .BR match . 479 .TP 480 .B RS 481 input record separator (default newline). 482 .TP 483 .B RSTART 484 the start position of a string matched by 485 .BR match . 486 .TP 487 .B SUBSEP 488 separates multiple subscripts (default 034). 489 .PD 490 .PP 491 Functions may be defined (at the position of a pattern-action statement) thus: 492 .IP 493 .B 494 function foo(a, b, c) { ...; return x } 495 .PP 496 Parameters are passed by value if scalar and by reference if array name; 497 functions may be called recursively. 498 Parameters are local to the function; all other variables are global. 499 Thus local variables may be created by providing excess parameters in 500 the function definition. 501 .SH EXAMPLES 502 .TP 503 .EX 504 length($0) > 72 505 .EE 506 Print lines longer than 72 characters. 507 .TP 508 .EX 509 { print $2, $1 } 510 .EE 511 Print first two fields in opposite order. 512 .PP 513 .EX 514 BEGIN { FS = ",[ \et]*|[ \et]+" } 515 { print $2, $1 } 516 .EE 517 .ns 518 .IP 519 Same, with input fields separated by comma and/or spaces and tabs. 520 .PP 521 .EX 522 .nf 523 { s += $1 } 524 END { print "sum is", s, " average is", s/NR } 525 .fi 526 .EE 527 .ns 528 .IP 529 Add up first column, print sum and average. 530 .TP 531 .EX 532 /start/, /stop/ 533 .EE 534 Print all lines between start/stop pairs. 535 .PP 536 .EX 537 .nf 538 BEGIN { # Simulate echo(1) 539 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 540 printf "\en" 541 exit } 542 .fi 543 .EE 544 .SH SEE ALSO 545 .IR grep (1), 546 .IR lex (1), 547 .IR sed (1) 548 .br 549 A. V. Aho, B. W. Kernighan, P. J. Weinberger, 550 .IR "The AWK Programming Language" , 551 Addison-Wesley, 1988. ISBN 0-201-07981-X. 552 .SH BUGS 553 There are no explicit conversions between numbers and strings. 554 To force an expression to be treated as a number add 0 to it; 555 to force it to be treated as a string concatenate 556 \&\f(CW""\fP to it. 557 .br 558 The scope rules for variables in functions are a botch; 559 the syntax is worse. 560 .br 561 POSIX-standard interval expressions in regular expressions are not supported. 562 .br 563 Only eight-bit characters sets are handled correctly. 564