Home | History | Annotate | Download | only in one-true-awk
      1 .de EX
      2 .nf
      3 .ft CW
      4 ..
      5 .de EE
      6 .br
      7 .fi
      8 .ft 1
      9 ..
     10 .TH AWK 1
     11 .CT 1 files prog_other
     12 .SH NAME
     13 awk \- pattern-directed scanning and processing language
     14 .SH SYNOPSIS
     15 .B awk
     16 [
     17 .BI \-F
     18 .I fs
     19 ]
     20 [
     21 .BI \-v
     22 .I var=value
     23 ]
     24 [
     25 .I 'prog'
     26 |
     27 .BI \-f
     28 .I progfile
     29 ]
     30 [
     31 .I file ...
     32 ]
     33 .SH DESCRIPTION
     34 .I Awk
     35 scans each input
     36 .I file
     37 for lines that match any of a set of patterns specified literally in
     38 .I prog
     39 or in one or more files
     40 specified as
     41 .B \-f
     42 .IR progfile .
     43 With each pattern
     44 there can be an associated action that will be performed
     45 when a line of a
     46 .I file
     47 matches the pattern.
     48 Each line is matched against the
     49 pattern portion of every pattern-action statement;
     50 the associated action is performed for each matched pattern.
     51 The file name 
     52 .B \-
     53 means the standard input.
     54 Any
     55 .I file
     56 of the form
     57 .I var=value
     58 is treated as an assignment, not a filename,
     59 and is executed at the time it would have been opened if it were a filename.
     60 The option
     61 .B \-v
     62 followed by
     63 .I var=value
     64 is an assignment to be done before
     65 .I prog
     66 is executed;
     67 any number of
     68 .B \-v
     69 options may be present.
     70 The
     71 .B \-F
     72 .I fs
     73 option defines the input field separator to be the regular expression
     74 .IR fs .
     75 .PP
     76 An input line is normally made up of fields separated by white space,
     77 or by the regular expression
     78 .BR FS .
     79 The fields are denoted
     80 .BR $1 ,
     81 .BR $2 ,
     82 \&..., while
     83 .B $0
     84 refers to the entire line.
     85 If
     86 .BR FS
     87 is null, the input line is split into one field per character.
     88 .PP
     89 A pattern-action statement has the form:
     90 .IP
     91 .IB pattern " { " action " }
     92 .PP
     93 A missing 
     94 .BI { " action " }
     95 means print the line;
     96 a missing pattern always matches.
     97 Pattern-action statements are separated by newlines or semicolons.
     98 .PP
     99 An action is a sequence of statements.
    100 A statement can be one of the following:
    101 .PP
    102 .EX
    103 .ta \w'\f(CWdelete array[expression]\fR'u
    104 .RS
    105 .nf
    106 .ft CW
    107 if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
    108 while(\fI expression \fP)\fI statement\fP
    109 for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
    110 for(\fI var \fPin\fI array \fP)\fI statement\fP
    111 do\fI statement \fPwhile(\fI expression \fP)
    112 break
    113 continue
    114 {\fR [\fP\fI statement ... \fP\fR] \fP}
    115 \fIexpression\fP	#\fR commonly\fP\fI var = expression\fP
    116 print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
    117 printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
    118 return\fR [ \fP\fIexpression \fP\fR]\fP
    119 next	#\fR skip remaining patterns on this input line\fP
    120 nextfile	#\fR skip rest of this file, open next, start at top\fP
    121 delete\fI array\fP[\fI expression \fP]	#\fR delete an array element\fP
    122 delete\fI array\fP	#\fR delete all elements of array\fP
    123 exit\fR [ \fP\fIexpression \fP\fR]\fP	#\fR exit immediately; status is \fP\fIexpression\fP
    124 .fi
    125 .RE
    126 .EE
    127 .DT
    128 .PP
    129 Statements are terminated by
    130 semicolons, newlines or right braces.
    131 An empty
    132 .I expression-list
    133 stands for
    134 .BR $0 .
    135 String constants are quoted \&\f(CW"\ "\fR,
    136 with the usual C escapes recognized within.
    137 Expressions take on string or numeric values as appropriate,
    138 and are built using the operators
    139 .B + \- * / % ^
    140 (exponentiation), and concatenation (indicated by white space).
    141 The operators
    142 .B
    143 ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
    144 are also available in expressions.
    145 Variables may be scalars, array elements
    146 (denoted
    147 .IB x  [ i ] \fR)
    148 or fields.
    149 Variables are initialized to the null string.
    150 Array subscripts may be any string,
    151 not necessarily numeric;
    152 this allows for a form of associative memory.
    153 Multiple subscripts such as
    154 .B [i,j,k]
    155 are permitted; the constituents are concatenated,
    156 separated by the value of
    157 .BR SUBSEP .
    158 .PP
    159 The
    160 .B print
    161 statement prints its arguments on the standard output
    162 (or on a file if
    163 .BI > " file
    164 or
    165 .BI >> " file
    166 is present or on a pipe if
    167 .BI | " cmd
    168 is present), separated by the current output field separator,
    169 and terminated by the output record separator.
    170 .I file
    171 and
    172 .I cmd
    173 may be literal names or parenthesized expressions;
    174 identical string values in different statements denote
    175 the same open file.
    176 The
    177 .B printf
    178 statement formats its expression list according to the
    179 .I format
    180 (see
    181 .IR printf (3)).
    182 The built-in function
    183 .BI close( expr )
    184 closes the file or pipe
    185 .IR expr .
    186 The built-in function
    187 .BI fflush( expr )
    188 flushes any buffered output for the file or pipe
    189 .IR expr .
    190 .PP
    191 The mathematical functions
    192 .BR atan2 ,
    193 .BR cos ,
    194 .BR exp ,
    195 .BR log ,
    196 .BR sin ,
    197 and
    198 .B sqrt
    199 are built in.
    200 Other built-in functions:
    201 .TF length
    202 .TP
    203 .B length
    204 the length of its argument
    205 taken as a string,
    206 number of elements in an array for an array argument,
    207 or length of
    208 .B $0
    209 if no argument.
    210 .TP
    211 .B rand
    212 random number on (0,1)
    213 .TP
    214 .B srand
    215 sets seed for
    216 .B rand
    217 and returns the previous seed.
    218 .TP
    219 .B int
    220 truncates to an integer value
    221 .TP
    222 \fBsubstr(\fIs\fB, \fIm\fR [\fB, \fIn\^\fR]\fB)\fR
    223 the
    224 .IR n -character
    225 substring of
    226 .I s
    227 that begins at position
    228 .I m 
    229 counted from 1.
    230 If no
    231 .IR m ,
    232 use the rest of the string
    233 .I 
    234 .TP
    235 .BI index( s , " t" )
    236 the position in
    237 .I s
    238 where the string
    239 .I t
    240 occurs, or 0 if it does not.
    241 .TP
    242 .BI match( s , " r" )
    243 the position in
    244 .I s
    245 where the regular expression
    246 .I r
    247 occurs, or 0 if it does not.
    248 The variables
    249 .B RSTART
    250 and
    251 .B RLENGTH
    252 are set to the position and length of the matched string.
    253 .TP
    254 \fBsplit(\fIs\fB, \fIa \fR[\fB, \fIfs\^\fR]\fB)\fR
    255 splits the string
    256 .I s
    257 into array elements
    258 .IB a [1] \fR,
    259 .IB a [2] \fR,
    260 \&...,
    261 .IB a [ n ] \fR,
    262 and returns
    263 .IR n .
    264 The separation is done with the regular expression
    265 .I fs
    266 or with the field separator
    267 .B FS
    268 if
    269 .I fs
    270 is not given.
    271 An empty string as field separator splits the string
    272 into one array element per character.
    273 .TP
    274 \fBsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB)
    275 substitutes
    276 .I t
    277 for the first occurrence of the regular expression
    278 .I r
    279 in the string
    280 .IR s .
    281 If
    282 .I s
    283 is not given,
    284 .B $0
    285 is used.
    286 .TP
    287 \fBgsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB)
    288 same as
    289 .B sub
    290 except that all occurrences of the regular expression
    291 are replaced;
    292 .B sub
    293 and
    294 .B gsub
    295 return the number of replacements.
    296 .TP
    297 .BI sprintf( fmt , " expr" , " ...\fB)
    298 the string resulting from formatting
    299 .I expr ...
    300 according to the
    301 .IR printf (3)
    302 format
    303 .IR fmt .
    304 .TP
    305 .BI system( cmd )
    306 executes
    307 .I cmd
    308 and returns its exit status. This will be \-1 upon error,
    309 .IR cmd 's
    310 exit status upon a normal exit,
    311 256 + 
    312 .I sig
    313 upon death-by-signal, where
    314 .I sig
    315 is the number of the murdering signal,
    316 or 512 +
    317 .I sig
    318 if there was a core dump.
    319 .TP
    320 .BI tolower( str )
    321 returns a copy of
    322 .I str
    323 with all upper-case characters translated to their
    324 corresponding lower-case equivalents.
    325 .TP
    326 .BI toupper( str )
    327 returns a copy of
    328 .I str
    329 with all lower-case characters translated to their
    330 corresponding upper-case equivalents.
    331 .PD
    332 .PP
    333 The ``function''
    334 .B getline
    335 sets
    336 .B $0
    337 to the next input record from the current input file;
    338 .B getline
    339 .BI < " file
    340 sets
    341 .B $0
    342 to the next record from
    343 .IR file .
    344 .B getline
    345 .I x
    346 sets variable
    347 .I x
    348 instead.
    349 Finally,
    350 .IB cmd " | getline
    351 pipes the output of
    352 .I cmd
    353 into
    354 .BR getline ;
    355 each call of
    356 .B getline
    357 returns the next line of output from
    358 .IR cmd .
    359 In all cases,
    360 .B getline
    361 returns 1 for a successful input,
    362 0 for end of file, and \-1 for an error.
    363 .PP
    364 Patterns are arbitrary Boolean combinations
    365 (with
    366 .BR "! || &&" )
    367 of regular expressions and
    368 relational expressions.
    369 Regular expressions are as in
    370 .IR egrep ; 
    371 see
    372 .IR grep (1).
    373 Isolated regular expressions
    374 in a pattern apply to the entire line.
    375 Regular expressions may also occur in
    376 relational expressions, using the operators
    377 .B ~
    378 and
    379 .BR !~ .
    380 .BI / re /
    381 is a constant regular expression;
    382 any string (constant or variable) may be used
    383 as a regular expression, except in the position of an isolated regular expression
    384 in a pattern.
    385 .PP
    386 A pattern may consist of two patterns separated by a comma;
    387 in this case, the action is performed for all lines
    388 from an occurrence of the first pattern
    389 though an occurrence of the second.
    390 .PP
    391 A relational expression is one of the following:
    392 .IP
    393 .I expression matchop regular-expression
    394 .br
    395 .I expression relop expression
    396 .br
    397 .IB expression " in " array-name
    398 .br
    399 .BI ( expr , expr,... ") in " array-name
    400 .PP
    401 where a
    402 .I relop
    403 is any of the six relational operators in C,
    404 and a
    405 .I matchop
    406 is either
    407 .B ~
    408 (matches)
    409 or
    410 .B !~
    411 (does not match).
    412 A conditional is an arithmetic expression,
    413 a relational expression,
    414 or a Boolean combination
    415 of these.
    416 .PP
    417 The special patterns
    418 .B BEGIN
    419 and
    420 .B END
    421 may be used to capture control before the first input line is read
    422 and after the last.
    423 .B BEGIN
    424 and
    425 .B END
    426 do not combine with other patterns.
    427 They may appear multiple times in a program and execute
    428 in the order they are read by
    429 .IR awk .
    430 .PP
    431 Variable names with special meanings:
    432 .TF FILENAME
    433 .TP
    434 .B ARGC
    435 argument count, assignable.
    436 .TP
    437 .B ARGV
    438 argument array, assignable;
    439 non-null members are taken as filenames.
    440 .TP
    441 .B CONVFMT
    442 conversion format used when converting numbers
    443 (default
    444 .BR "%.6g" ).
    445 .TP
    446 .B ENVIRON
    447 array of environment variables; subscripts are names.
    448 .TP
    449 .B FILENAME
    450 the name of the current input file.
    451 .TP
    452 .B FNR
    453 ordinal number of the current record in the current file.
    454 .TP
    455 .B FS
    456 regular expression used to separate fields; also settable
    457 by option
    458 .BI \-F fs\fR.
    459 .TP
    460 .BR NF
    461 number of fields in the current record.
    462 .TP
    463 .B NR
    464 ordinal number of the current record.
    465 .TP
    466 .B OFMT
    467 output format for numbers (default
    468 .BR "%.6g" ).
    469 .TP
    470 .B OFS
    471 output field separator (default space).
    472 .TP
    473 .B ORS
    474 output record separator (default newline).
    475 .TP
    476 .B RLENGTH
    477 the length of a string matched by
    478 .BR match .
    479 .TP
    480 .B RS
    481 input record separator (default newline).
    482 .TP
    483 .B RSTART
    484 the start position of a string matched by
    485 .BR match .
    486 .TP
    487 .B SUBSEP
    488 separates multiple subscripts (default 034).
    489 .PD
    490 .PP
    491 Functions may be defined (at the position of a pattern-action statement) thus:
    492 .IP
    493 .B
    494 function foo(a, b, c) { ...; return x }
    495 .PP
    496 Parameters are passed by value if scalar and by reference if array name;
    497 functions may be called recursively.
    498 Parameters are local to the function; all other variables are global.
    499 Thus local variables may be created by providing excess parameters in
    500 the function definition.
    501 .SH EXAMPLES
    502 .TP
    503 .EX
    504 length($0) > 72
    505 .EE
    506 Print lines longer than 72 characters.
    507 .TP
    508 .EX
    509 { print $2, $1 }
    510 .EE
    511 Print first two fields in opposite order.
    512 .PP
    513 .EX
    514 BEGIN { FS = ",[ \et]*|[ \et]+" }
    515       { print $2, $1 }
    516 .EE
    517 .ns
    518 .IP
    519 Same, with input fields separated by comma and/or spaces and tabs.
    520 .PP
    521 .EX
    522 .nf
    523 	{ s += $1 }
    524 END	{ print "sum is", s, " average is", s/NR }
    525 .fi
    526 .EE
    527 .ns
    528 .IP
    529 Add up first column, print sum and average.
    530 .TP
    531 .EX
    532 /start/, /stop/
    533 .EE
    534 Print all lines between start/stop pairs.
    535 .PP
    536 .EX
    537 .nf
    538 BEGIN	{	# Simulate echo(1)
    539 	for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
    540 	printf "\en"
    541 	exit }
    542 .fi
    543 .EE
    544 .SH SEE ALSO
    545 .IR grep (1), 
    546 .IR lex (1), 
    547 .IR sed (1)
    548 .br
    549 A. V. Aho, B. W. Kernighan, P. J. Weinberger,
    550 .IR "The AWK Programming Language" ,
    551 Addison-Wesley, 1988.  ISBN 0-201-07981-X.
    552 .SH BUGS
    553 There are no explicit conversions between numbers and strings.
    554 To force an expression to be treated as a number add 0 to it;
    555 to force it to be treated as a string concatenate
    556 \&\f(CW""\fP to it.
    557 .br
    558 The scope rules for variables in functions are a botch;
    559 the syntax is worse.
    560 .br
    561 POSIX-standard interval expressions in regular expressions are not supported.
    562 .br
    563 Only eight-bit characters sets are handled correctly.
    564