The awk Command
Introduction
awk is a pattern matching file processor. The simplest way to visualize what awk does is to think of a large colon-delimited text file. awk will process each line of the file, split the content up into fields, and then perform operations on those fields.While most of the Unix commands we write guides for have a one track mind, awk really doesn't have a standard 'format' or a use that is vastly more common than another. awk can be used like 'cut', to isolate fields in a file, or it can act as a fully functional, programmatic text processor. We'll give examples for both.
For now, we'll stick with a simple introduction.
awk Format:
There are two formats for awk, one allows you to write the AWK code right in the command line, the other allows you to use an AWK program file.AWK code in command line:
awk <program> <input file>AWK code contained in local file:
Where a 'program' is formatted like this:
/pattern/ {'action'}
awk -f <awkcode.awk> &ly;inputfile>
Quick Links!
A Programmer's Introduction to AWKawk's List of Operators
awk, nawk, gawk - What's the difference?
A simple awk example
Use awk to reformat the default unix date command
A Programmer's Intro to awk
- awk processes file input
- awk processes that input one line at a time
- You can specify a pattern in the command. If awk finds the pattern, it will operate on the line, if the pattern isn't found, awk moves on.
- You can program your awk operations right in the command line, or use a file to hold it all.
- awk has some variables you should be aware of
- $0 - This variable contains the entire current line that awk is processing
- $1, $2, $3, $4, etc - These variables contain the first, second, third, fourth fields of the current line.
- Awk's default field separator is a space but you can change that with the -F option.
- NF - Contains the number of fields that exist on the current line. $NF (note the dollar sign) represents the last field in a record / line of data. Great for looping over data.
- NR - Contains the number or lines in the file that you're processing. Also great for loops.
- FILENAME - Contains the name of the file that awk is processing.
- FS - Field Separator. The default field separator is a space, but you can change it with this variable. There are, indeed, more 'special variables' but their use is not nearly as common as the variables mentioned above.
- awk allows the following operators.
Ternary (mini if statement): ? : Increment and Decrement (Pre or Postfix): ++ -- Exponents: 25 = 2^5: ^ ** Logical Operators: && || Add, Subtract: + - Unary plus, minus, NOT: + - ! Multiply, Divide, Modulus: * / % Comparison: < <= == != > >= Assignment: = += -= *= /= %= ^= **= - If a command is associated with the BEGIN keyword, it's executed BEFORE any lines of the input file are processed. In the command line this would look like:
awk 'BEGIN {myvar=3} ...'
In a .awk file this may look like:
#!/usr/bin/awk -f
BEGIN{scriptDirectory="/scripts" }
awk ' ... END {print "Script Complete!"}'
In a .awk file this may look like:
#!/usr/bin/awk -f
...
...
END{print $results}
for(i=0;i<9;i++){
printf("output!");
}
while (i <= 20) {
print i;
i+=2;
}
awk, nawk, gawk - What's the difference?
In short:
As of recently, 'awk' should work just fine on most Linux / Unix systems. If your distribution contains awk, nawk, gawk, etc. There's typically a symlink from 'awk' to whichever incarnation your distribution uses.
Linux distributions can also rely on 'gawk' being present.
Unix systems typically just use 'awk'.
Debian and Ubuntu users can reliably use 'mawk', which is a very fast implementation based on a bytecode interpreter.
- awk was written by and named after: Alfred Aho, Peter Weinberger, and Brian Kernighan. (1977)
- Between 1985 and 1996, the original authors set out to expand the language, and Brian Kernighan continues to maintain the codebase to this day (Aug 2007). This version of awk is known as nawk or 'new awk'.
- gawk was released by the Free Software Foundation under the GNU and is the most common Linux version of the language.
Simple, yet thorough, example
This is the simplest awk example that I can think of that both, does justice to what the language actually does, and also doesn't completely confuse most folks. This example scans through each line of an input file, prints the second field, a dash, the number of fields on that line, and the current line number. Enjoy.
awk {$2 " - " NF " " NR} <input.log>
Output:
- 1 1
System - 6 2
system: - 3 3
- 1 4
Checking - 7 5
- 1 6
System - 6 7
system: - 3 8
- 1 9
Checking - 7 10
Remember that we're printing the second field on each line ($2), the Number of Fields (NF), and the Number of Records (or rows if you prefer) (NR)The rows of output without any data before the dash, simply didn't have anything in the second field for those records.
Common Uses
Reformat the date using awk
While the default unix date is handy, it's a bit verbose for many uses:Mon Sep 3 13:42:23 EDT 2007
Just a few seconds with awk can