Home | History | Annotate | Download | only in one-true-awk
      1 .de EX
      2 .nf
      3 .ft CW
      4 ..
      5 .de EE
      6 .br
      7 .fi
      8 .ft 1
      9 ..
     10 awk
     11 .TH AWK 1
     12 .CT 1 files prog_other
     13 .SH NAME
     14 awk \- pattern-directed scanning and processing language
     15 .SH SYNOPSIS
     16 .B awk
     17 [
     18 .BI \-F
     19 .I fs
     20 ]
     21 [
     22 .BI \-v
     23 .I var=value
     24 ]
     25 [
     26 .I 'prog'
     27 |
     28 .BI \-f
     29 .I progfile
     30 ]
     31 [
     32 .I file ...
     33 ]
     34 .SH DESCRIPTION
     35 .I Awk
     36 scans each input
     37 .I file
     38 for lines that match any of a set of patterns specified literally in
     39 .IR prog
     40 or in one or more files
     41 specified as
     42 .B \-f
     43 .IR progfile .
     44 With each pattern
     45 there can be an associated action that will be performed
     46 when a line of a
     47 .I file
     48 matches the pattern.
     49 Each line is matched against the
     50 pattern portion of every pattern-action statement;
     51 the associated action is performed for each matched pattern.
     52 The file name 
     53 .B \-
     54 means the standard input.
     55 Any
     56 .IR file
     57 of the form
     58 .I var=value
     59 is treated as an assignment, not a filename,
     60 and is executed at the time it would have been opened if it were a filename.
     61 The option
     62 .B \-v
     63 followed by
     64 .I var=value
     65 is an assignment to be done before
     66 .I prog
     67 is executed;
     68 any number of
     69 .B \-v
     70 options may be present.
     71 The
     72 .B \-F
     73 .IR fs
     74 option defines the input field separator to be the regular expression
     75 .IR fs.
     76 .PP
     77 An input line is normally made up of fields separated by white space,
     78 or by regular expression
     79 .BR FS .
     80 The fields are denoted
     81 .BR $1 ,
     82 .BR $2 ,
     83 \&..., while
     84 .B $0
     85 refers to the entire line.
     86 If
     87 .BR FS
     88 is null, the input line is split into one field per character.
     89 .PP
     90 A pattern-action statement has the form
     91 .IP
     92 .IB pattern " { " action " }
     93 .PP
     94 A missing 
     95 .BI { " action " }
     96 means print the line;
     97 a missing pattern always matches.
     98 Pattern-action statements are separated by newlines or semicolons.
     99 .PP
    100 An action is a sequence of statements.
    101 A statement can be one of the following:
    102 .PP
    103 .EX
    104 .ta \w'\f(CWdelete array[expression]'u
    105 .RS
    106 .nf
    107 .ft CW
    108 if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
    109 while(\fI expression \fP)\fI statement\fP
    110 for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
    111 for(\fI var \fPin\fI array \fP)\fI statement\fP
    112 do\fI statement \fPwhile(\fI expression \fP)
    113 break
    114 continue
    115 {\fR [\fP\fI statement ... \fP\fR] \fP}
    116 \fIexpression\fP	#\fR commonly\fP\fI var = expression\fP
    117 print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
    118 printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
    119 return\fR [ \fP\fIexpression \fP\fR]\fP
    120 next	#\fR skip remaining patterns on this input line\fP
    121 nextfile	#\fR skip rest of this file, open next, start at top\fP
    122 delete\fI array\fP[\fI expression \fP]	#\fR delete an array element\fP
    123 delete\fI array\fP	#\fR delete all elements of array\fP
    124 exit\fR [ \fP\fIexpression \fP\fR]\fP	#\fR exit immediately; status is \fP\fIexpression\fP
    125 .fi
    126 .RE
    127 .EE
    128 .DT
    129 .PP
    130 Statements are terminated by
    131 semicolons, newlines or right braces.
    132 An empty
    133 .I expression-list
    134 stands for
    135 .BR $0 .
    136 String constants are quoted \&\f(CW"\ "\fR,
    137 with the usual C escapes recognized within.
    138 Expressions take on string or numeric values as appropriate,
    139 and are built using the operators
    140 .B + \- * / % ^
    141 (exponentiation), and concatenation (indicated by white space).
    142 The operators
    143 .B
    144 ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
    145 are also available in expressions.
    146 Variables may be scalars, array elements
    147 (denoted
    148 .IB x  [ i ] )
    149 or fields.
    150 Variables are initialized to the null string.
    151 Array subscripts may be any string,
    152 not necessarily numeric;
    153 this allows for a form of associative memory.
    154 Multiple subscripts such as
    155 .B [i,j,k]
    156 are permitted; the constituents are concatenated,
    157 separated by the value of
    158 .BR SUBSEP .
    159 .PP
    160 The
    161 .B print
    162 statement prints its arguments on the standard output
    163 (or on a file if
    164 .BI > file
    165 or
    166 .BI >> file
    167 is present or on a pipe if
    168 .BI | cmd
    169 is present), separated by the current output field separator,
    170 and terminated by the output record separator.
    171 .I file
    172 and
    173 .I cmd
    174 may be literal names or parenthesized expressions;
    175 identical string values in different statements denote
    176 the same open file.
    177 The
    178 .B printf
    179 statement formats its expression list according to the format
    180 (see
    181 .IR printf (3)) .
    182 The built-in function
    183 .BI close( expr )
    184 closes the file or pipe
    185 .IR expr .
    186 The built-in function
    187 .BI fflush( expr )
    188 flushes any buffered output for the file or pipe
    189 .IR expr .
    190 .PP
    191 The mathematical functions
    192 .BR exp ,
    193 .BR log ,
    194 .BR sqrt ,
    195 .BR sin ,
    196 .BR cos ,
    197 and
    198 .BR atan2 
    199 are built in.
    200 Other built-in functions:
    201 .TF length
    202 .TP
    203 .B length
    204 the length of its argument
    205 taken as a string,
    206 or of
    207 .B $0
    208 if no argument.
    209 .TP
    210 .B rand
    211 random number on (0,1)
    212 .TP
    213 .B srand
    214 sets seed for
    215 .B rand
    216 and returns the previous seed.
    217 .TP
    218 .B int
    219 truncates to an integer value
    220 .TP
    221 .BI substr( s , " m" , " n\fB)
    222 the
    223 .IR n -character
    224 substring of
    225 .I s
    226 that begins at position
    227 .IR m 
    228 counted from 1.
    229 .TP
    230 .BI index( s , " t" )
    231 the position in
    232 .I s
    233 where the string
    234 .I t
    235 occurs, or 0 if it does not.
    236 .TP
    237 .BI match( s , " r" )
    238 the position in
    239 .I s
    240 where the regular expression
    241 .I r
    242 occurs, or 0 if it does not.
    243 The variables
    244 .B RSTART
    245 and
    246 .B RLENGTH
    247 are set to the position and length of the matched string.
    248 .TP
    249 .BI split( s , " a" , " fs\fB)
    250 splits the string
    251 .I s
    252 into array elements
    253 .IB a [1] ,
    254 .IB a [2] ,
    255 \&...,
    256 .IB a [ n ] ,
    257 and returns
    258 .IR n .
    259 The separation is done with the regular expression
    260 .I fs
    261 or with the field separator
    262 .B FS
    263 if
    264 .I fs
    265 is not given.
    266 An empty string as field separator splits the string
    267 into one array element per character.
    268 .TP
    269 .BI sub( r , " t" , " s\fB)
    270 substitutes
    271 .I t
    272 for the first occurrence of the regular expression
    273 .I r
    274 in the string
    275 .IR s .
    276 If
    277 .I s
    278 is not given,
    279 .B $0
    280 is used.
    281 .TP
    282 .B gsub
    283 same as
    284 .B sub
    285 except that all occurrences of the regular expression
    286 are replaced;
    287 .B sub
    288 and
    289 .B gsub
    290 return the number of replacements.
    291 .TP
    292 .BI sprintf( fmt , " expr" , " ...\fB )
    293 the string resulting from formatting
    294 .I expr ...
    295 according to the
    296 .IR printf (3)
    297 format
    298 .I fmt
    299 .TP
    300 .BI system( cmd )
    301 executes
    302 .I cmd
    303 and returns its exit status
    304 .TP
    305 .BI tolower( str )
    306 returns a copy of
    307 .I str
    308 with all upper-case characters translated to their
    309 corresponding lower-case equivalents.
    310 .TP
    311 .BI toupper( str )
    312 returns a copy of
    313 .I str
    314 with all lower-case characters translated to their
    315 corresponding upper-case equivalents.
    316 .PD
    317 .PP
    318 The ``function''
    319 .B getline
    320 sets
    321 .B $0
    322 to the next input record from the current input file;
    323 .B getline
    324 .BI < file
    325 sets
    326 .B $0
    327 to the next record from
    328 .IR file .
    329 .B getline
    330 .I x
    331 sets variable
    332 .I x
    333 instead.
    334 Finally,
    335 .IB cmd " | getline
    336 pipes the output of
    337 .I cmd
    338 into
    339 .BR getline ;
    340 each call of
    341 .B getline
    342 returns the next line of output from
    343 .IR cmd .
    344 In all cases,
    345 .B getline
    346 returns 1 for a successful input,
    347 0 for end of file, and \-1 for an error.
    348 .PP
    349 Patterns are arbitrary Boolean combinations
    350 (with
    351 .BR "! || &&" )
    352 of regular expressions and
    353 relational expressions.
    354 Regular expressions are as in
    355 .IR egrep ; 
    356 see
    357 .IR grep (1).
    358 Isolated regular expressions
    359 in a pattern apply to the entire line.
    360 Regular expressions may also occur in
    361 relational expressions, using the operators
    362 .BR ~
    363 and
    364 .BR !~ .
    365 .BI / re /
    366 is a constant regular expression;
    367 any string (constant or variable) may be used
    368 as a regular expression, except in the position of an isolated regular expression
    369 in a pattern.
    370 .PP
    371 A pattern may consist of two patterns separated by a comma;
    372 in this case, the action is performed for all lines
    373 from an occurrence of the first pattern
    374 though an occurrence of the second.
    375 .PP
    376 A relational expression is one of the following:
    377 .IP
    378 .I expression matchop regular-expression
    379 .br
    380 .I expression relop expression
    381 .br
    382 .IB expression " in " array-name
    383 .br
    384 .BI ( expr , expr,... ") in " array-name
    385 .PP
    386 where a relop is any of the six relational operators in C,
    387 and a matchop is either
    388 .B ~
    389 (matches)
    390 or
    391 .B !~
    392 (does not match).
    393 A conditional is an arithmetic expression,
    394 a relational expression,
    395 or a Boolean combination
    396 of these.
    397 .PP
    398 The special patterns
    399 .B BEGIN
    400 and
    401 .B END
    402 may be used to capture control before the first input line is read
    403 and after the last.
    404 .B BEGIN
    405 and
    406 .B END
    407 do not combine with other patterns.
    408 .PP
    409 Variable names with special meanings:
    410 .TF FILENAME
    411 .TP
    412 .B CONVFMT
    413 conversion format used when converting numbers
    414 (default
    415 .BR "%.6g" )
    416 .TP
    417 .B FS
    418 regular expression used to separate fields; also settable
    419 by option
    420 .BI \-F fs.
    421 .TP
    422 .BR NF
    423 number of fields in the current record
    424 .TP
    425 .B NR
    426 ordinal number of the current record
    427 .TP
    428 .B FNR
    429 ordinal number of the current record in the current file
    430 .TP
    431 .B FILENAME
    432 the name of the current input file
    433 .TP
    434 .B RS
    435 input record separator (default newline)
    436 .TP
    437 .B OFS
    438 output field separator (default blank)
    439 .TP
    440 .B ORS
    441 output record separator (default newline)
    442 .TP
    443 .B OFMT
    444 output format for numbers (default
    445 .BR "%.6g" )
    446 .TP
    447 .B SUBSEP
    448 separates multiple subscripts (default 034)
    449 .TP
    450 .B ARGC
    451 argument count, assignable
    452 .TP
    453 .B ARGV
    454 argument array, assignable;
    455 non-null members are taken as filenames
    456 .TP
    457 .B ENVIRON
    458 array of environment variables; subscripts are names.
    459 .PD
    460 .PP
    461 Functions may be defined (at the position of a pattern-action statement) thus:
    462 .IP
    463 .B
    464 function foo(a, b, c) { ...; return x }
    465 .PP
    466 Parameters are passed by value if scalar and by reference if array name;
    467 functions may be called recursively.
    468 Parameters are local to the function; all other variables are global.
    469 Thus local variables may be created by providing excess parameters in
    470 the function definition.
    471 .SH EXAMPLES
    472 .TP
    473 .EX
    474 length($0) > 72
    475 .EE
    476 Print lines longer than 72 characters.
    477 .TP
    478 .EX
    479 { print $2, $1 }
    480 .EE
    481 Print first two fields in opposite order.
    482 .PP
    483 .EX
    484 BEGIN { FS = ",[ \et]*|[ \et]+" }
    485       { print $2, $1 }
    486 .EE
    487 .ns
    488 .IP
    489 Same, with input fields separated by comma and/or blanks and tabs.
    490 .PP
    491 .EX
    492 .nf
    493 	{ s += $1 }
    494 END	{ print "sum is", s, " average is", s/NR }
    495 .fi
    496 .EE
    497 .ns
    498 .IP
    499 Add up first column, print sum and average.
    500 .TP
    501 .EX
    502 /start/, /stop/
    503 .EE
    504 Print all lines between start/stop pairs.
    505 .PP
    506 .EX
    507 .nf
    508 BEGIN	{	# Simulate echo(1)
    509 	for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
    510 	printf "\en"
    511 	exit }
    512 .fi
    513 .EE
    514 .SH SEE ALSO
    515 .IR lex (1), 
    516 .IR sed (1)
    517 .br
    518 A. V. Aho, B. W. Kernighan, P. J. Weinberger,
    519 .I
    520 The AWK Programming Language,
    521 Addison-Wesley, 1988.  ISBN 0-201-07981-X
    522 .SH BUGS
    523 There are no explicit conversions between numbers and strings.
    524 To force an expression to be treated as a number add 0 to it;
    525 to force it to be treated as a string concatenate
    526 \&\f(CW""\fP to it.
    527 .br
    528 The scope rules for variables in functions are a botch;
    529 the syntax is worse.
    530