🖥️awk

➡️This is a command-line reference manual for commands and command combinations that you don’t use often enough to remember it. This cheatsheet explains the awk command with important options and switches using examples.

▁ ▂ ▃ ▄ ꧁ 🔴☠ COMMANDLINE-KUNGFU WITH CHEATSHEETS ☠🔴꧂▅ ▃ ▂ ▁

#   █████╗ ██╗    ██╗██╗  ██╗
#  ██╔══██╗██║    ██║██║ ██╔╝
#  ███████║██║ █╗ ██║█████╔╝
#  ██╔══██║██║███╗██║██╔═██╗
#  ██║  ██║╚███╔███╔╝██║  ██╗
#  ╚═╝  ╚═╝ ╚══╝╚══╝ ╚═╝  ╚═╝

awk
    Aho, Weinberger, and Kernighan
    The awk scripting language was named by its authors, Al Aho, Peter Weinberger, and Brian Kernighan. For more, see ARCHIVED: What is awk, and how do I use it?

In awk jargon, each line of a file is a record. You can change the way awk splits files into records, but by default it splits on newlines. The awk variable $1 contains the first field in a record, $2 the second, $3 the third, etc. $0 contains all fields.
awk uses -F to specify a field separator. For example, -F: would say to use a colon as the field separator. sed and awk both use -f <filename> to specify a file of commands.

# awk - pattern-directed scanning and processing language
# if you're just looking to take columns, take a look at `cut`

# awk normally uses 'space' as a delimiter.
# To force the delimiter to a 'tab':
awk -F\\t '{ print $0 }' file.txt

# print full line with awk
awk -F\\t '{ print $0 }' file.txt

# taking columns with awk
awk -F\\t '{ print $1 }' file.txt
awk -F\\t '{ print $1"\t"$2 }' file.txt

# matching on conditionals
awk -F\\t '$1 == 1 { print $0 }' file.txt > matches_one.txt
awk -F\\t '$1 != 1 { print $0 }' file.txt > does_not_match_one.txt

# take all but the first column (sed will eliminate whitespace at beginning)
awk -F\\t '{ $1=""; print $0 }' file.txt | sed 's/^\s//'

# sum integers from a file or stdin, one integer per line:
printf '1\n2\n3\n' | awk '{ sum += $1} END {print sum}'

# using specific character as separator to sum integers from a file or stdin
printf '1:2:3' | awk -F ":" '{print $1+$2+$3}'

# print a multiplication table
seq 9 | sed 'H;g' | awk -v RS='' '{for(i=1;i<=NF;i++)printf("%dx%d=%d%s", i, NR, i*NR, i==NR?"\n":"\t")}'

# Specify output separator character
printf '1 2 3' | awk 'BEGIN {OFS=":"}; {print $1,$2,$3}'

# search for a paragraph containing string
awk -v RS='' '/42B/' file

# display only first column from multi-column text
echo "first-column  second-column  third-column" | awk '{print $1}'

#==============================##==============================#
# CMD AWK						       #
#==============================##==============================#

# USAGE:
#
#   	Unix: 		awk '/pattern/ {print "$1"}'    # standard Unix shells
# 	DOS/Win: 	awk '/pattern/ {print "$1"}'    # compiled with DJGPP, Cygwin
#    			awk "/pattern/ {print \"$1\"}"  # GnuWin32, UnxUtils, Mingw

### FILE SPACING:

# double space a file
awk '1;{print ""}'
awk 'BEGIN{ORS="\n\n"};1'

# double space a file which already has blank lines in it. Output file
# should contain no more than one blank line between lines of text.
# NOTE: On Unix systems, DOS lines which have only CRLF (\r\n) are
# often treated as non-blank, and thus 'NF' alone will return TRUE.
awk 'NF{print $0 "\n"}'

# triple space a file
awk '1;{print "\n"}'

### NUMBERING AND CALCULATIONS:

# precede each line by its line number FOR THAT FILE (left alignment).
# Using a tab (\t) instead of space will preserve margins.
awk '{print FNR "\t" $0}' files*

# precede each line by its line number FOR ALL FILES TOGETHER, with tab.
awk '{print NR "\t" $0}' files*

# number each line of a file (number on left, right-aligned)
# Double the percent signs if typing from the DOS command prompt.
awk '{printf("%5d : %s\n", NR,$0)}'

# number each line of file, but only print numbers if line is not blank
# Remember caveats about Unix treatment of \r (mentioned above)
awk 'NF{$0=++a " :" $0};1'
awk '{print (NF? ++a " :" :"") $0}'

# count lines (emulates "wc -l")
awk 'END{print NR}'

# print the sums of the fields of every line
awk '{s=0; for (i=1; i<=NF; i++) s=s+$i; print s}'

# add all fields in all lines and print the sum
awk '{for (i=1; i<=NF; i++) s=s+$i}; END{print s}'

# print every line after replacing each field with its absolute value
awk '{for (i=1; i<=NF; i++) if ($i < 0) $i = -$i; print }'
awk '{for (i=1; i<=NF; i++) $i = ($i < 0) ? -$i : $i; print }'

# print the total number of fields ("words") in all lines
awk '{ total = total + NF }; END {print total}' file

# print the total number of lines that contain "Beth"
awk '/Beth/{n++}; END {print n+0}' file

# print the largest first field and the line that contains it
# Intended for finding the longest string in field #1
awk '$1 > max {max=$1; maxline=$0}; END{ print max, maxline}'

# print the number of fields in each line, followed by the line
awk '{ print NF ":" $0 } '

# print the last field of each line
awk '{ print $NF }'

# print the last field of the last line
awk '{ field = $NF }; END{ print field }'

# print every line with more than 4 fields
awk 'NF > 4'

# print every line where the value of the last field is > 4
awk '$NF > 4'

### STRING CREATION:

# create a string of a specific length (e.g., generate 513 spaces)
awk 'BEGIN{while (a++<513) s=s " "; print s}'

# insert a string of specific length at a certain character position
# Example: insert 49 spaces after column #6 of each input line.
gawk --re-interval 'BEGIN{while(a++<49)s=s " "};{sub(/^.{6}/,"&" s)};1'

### ARRAY CREATION:

# These next 2 entries are not one-line scripts, but the technique
# is so handy that it merits inclusion here.

# create an array named "month", indexed by numbers, so that month[1]
# is 'Jan', month[2] is 'Feb', month[3] is 'Mar' and so on.
split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", month, " ")

# create an array named "mdigit", indexed by strings, so that
# mdigit["Jan"] is 1, mdigit["Feb"] is 2, etc. Requires "month" array
for (i=1; i<=12; i++) mdigit[month[i]] = i

### TEXT CONVERSION AND SUBSTITUTION:

# IN UNIX ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format
awk '{sub(/\r$/,"")};1'   # assumes EACH line ends with Ctrl-M

# IN UNIX ENVIRONMENT: convert Unix newlines (LF) to DOS format
awk '{sub(/$/,"\r")};1'

# IN DOS ENVIRONMENT: convert Unix newlines (LF) to DOS format
awk 1

# IN DOS ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format
# Cannot be done with DOS versions of awk, other than gawk:
gawk -v BINMODE="w" '1' infile >outfile

# Use "tr" instead.
tr -d \r <infile >outfile            # GNU tr version 1.22 or higher

# delete leading whitespace (spaces, tabs) from front of each line
# aligns all text flush left
awk '{sub(/^[ \t]+/, "")};1'

# delete trailing whitespace (spaces, tabs) from end of each line
awk '{sub(/[ \t]+$/, "")};1'

# delete BOTH leading and trailing whitespace from each line
awk '{gsub(/^[ \t]+|[ \t]+$/,"")};1'
awk '{$1=$1};1'           # also removes extra space between fields

# insert 5 blank spaces at beginning of each line (make page offset)
awk '{sub(/^/, "     ")};1'

# align all text flush right on a 79-column width
awk '{printf "%79s\n", $0}' file*

# center all text on a 79-character width
awk '{l=length();s=int((79-l)/2); printf "%"(s+l)"s\n",$0}' file*

# substitute (find and replace) "foo" with "bar" on each line
awk '{sub(/foo/,"bar")}; 1'           # replace only 1st instance
gawk '{$0=gensub(/foo/,"bar",4)}; 1'  # replace only 4th instance
awk '{gsub(/foo/,"bar")}; 1'          # replace ALL instances in a line

# substitute "foo" with "bar" ONLY for lines which contain "baz"
awk '/baz/{gsub(/foo/, "bar")}; 1'

# substitute "foo" with "bar" EXCEPT for lines which contain "baz"
awk '!/baz/{gsub(/foo/, "bar")}; 1'

# change "scarlet" or "ruby" or "puce" to "red"
awk '{gsub(/scarlet|ruby|puce/, "red")}; 1'

# reverse order of lines (emulates "tac")
awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }' file*

# if a line ends with a backslash, append the next line to it (fails if
# there are multiple lines ending with backslash...)
awk '/\\$/ {sub(/\\$/,""); getline t; print $0 t; next}; 1' file*

# print and sort the login names of all users
awk -F ":" '{print $1 | "sort" }' /etc/passwd

# print the first 2 fields, in opposite order, of every line
awk '{print $2, $1}' file

# switch the first 2 fields of every line
awk '{temp = $1; $1 = $2; $2 = temp}' file

# print every line, deleting the second field of that line
awk '{ $2 = ""; print }'

# print in reverse order the fields of every line
awk '{for (i=NF; i>0; i--) printf("%s ",$i);print ""}' file

# concatenate every 5 lines of input, using a comma separator
# between fields
awk 'ORS=NR%5?",":"\n"' file

### SELECTIVE PRINTING OF CERTAIN LINES:

# print first 10 lines of file (emulates behavior of "head")
awk 'NR < 11'

# print first line of file (emulates "head -1")
awk 'NR>1{exit};1'

# print the last 2 lines of a file (emulates "tail -2")
awk '{y=x "\n" $0; x=$0};END{print y}'

# print the last line of a file (emulates "tail -1")
awk 'END{print}'

# print only lines which match regular expression (emulates "grep")
awk '/regex/'

# print only lines which do NOT match regex (emulates "grep -v")
awk '!/regex/'

# print any line where field #5 is equal to "abc123"
awk '$5 == "abc123"'

# print only those lines where field #5 is NOT equal to "abc123"
# This will also print lines which have less than 5 fields.
awk '$5 != "abc123"'
awk '!($5 == "abc123")'

# matching a field against a regular expression
awk '$7  ~ /^[a-f]/'    # print line if field #7 matches regex
awk '$7 !~ /^[a-f]/'    # print line if field #7 does NOT match regex

# print the line immediately before a regex, but not the line
# containing the regex
awk '/regex/{print x};{x=$0}'
awk '/regex/{print (NR==1 ? "match on line 1" : x)};{x=$0}'

# print the line immediately after a regex, but not the line
# containing the regex
awk '/regex/{getline;print}'

# grep for AAA and BBB and CCC (in any order on the same line)
awk '/AAA/ && /BBB/ && /CCC/'

# grep for AAA and BBB and CCC (in that order)
awk '/AAA.*BBB.*CCC/'

# print only lines of 65 characters or longer
awk 'length > 64'

# print only lines of less than 65 characters
awk 'length < 64'

# print section of file from regular expression to end of file
awk '/regex/,0'
awk '/regex/,EOF'

# print section of file based on line numbers (lines 8-12, inclusive)
awk 'NR==8,NR==12'

# print line number 52
awk 'NR==52'
awk 'NR==52 {print;exit}'          # more efficient on large files

# print section of file between two regular expressions (inclusive)
awk '/Iowa/,/Montana/'             # case sensitive

### SELECTIVE DELETION OF CERTAIN LINES:

# delete ALL blank lines from a file (same as "grep '.' ")
awk NF
awk '/./'

# remove duplicate, consecutive lines (emulates "uniq")
awk 'a !~ $0; {a=$0}'

# remove duplicate, nonconsecutive lines
awk '!a[$0]++'                     # most concise script
awk '!($0 in a){a[$0];print}'      # most efficient script

awk '{ print $1}' access.log.2016-05-08 | sort | uniq -c | sort -nr | head -n 10
# Find Top 10 IP Addresses Accessing Your Apache Web Server
# awk - prints the access.log.2016-05-08 file.
#    sort - helps to sort lines in a access.log.2016-05-08 file, the -  n option compares lines based on the numerical value of strings and -r option reverses the outcome of the comparisons.
#    uniq - helps to report repeated lines and the -c option helps to prefix lines according to the number of occurrences.

# Addresses in sed specify which lines to act on. They can be line numbers or regular expressions. In awk, records are split into fields by a separator. By default fields are separated by white space. In awk jargon, each line of a file is a record. You can change the way awk splits files into records, but by default it splits on newlines. The awk variable $1 contains the first field in a record, $2 the second, $3 the third, etc. $0 contains all fields. awk uses -F to specify a field separator. For example, -F: would say to use a colon as the field separator. sed and awk both use -f <filename> to specify a file of commands. The -n option tells sed to only print output when told to by the p command. Perl has a utility a2p to convert awk scripts into perl scripts.The awk variable NR contains the current record number. The awk variable NF contains the number of fields in the current record. Awk can use a pair of regular expressions as a range, just like sed. In both languages, /foo/,/bar/ matches the same lines.

awk uses -F to specify a field separator. For example, -F: would say to use a colon as the field separator.
In awk, records are split into fields by a separator. By default fields are separated by white space.
In awk jargon, each line of a file is a record. You can change the way awk splits files into records, but by default it splits on newlines.

awk -F'","' '{print $3}' data.csv
#Use a multi-character field separator to get field 3 out of a CSV file that uses double quoted fields.

awk '$9!~/^[23]/{print $4}' access_log | cut -c1-12 | uniq -c
#Show the number of UNsuccessful requests per day. (Not HTTP code 2XX or 3XX)

awk -F, '{sqrt($4^2)}' data.csv
# Get the absolute value of the 4th column of numbers using the square root of the square trick.

awk -F, '((37.19 < $7 && $7 < 37.23) && (-115.81 < $8 && $8 < -115.73))' gpsdata.csv
#Print lines from file where GPS coords are in range.

awk '{for (i=2;i<=13;i++) {printf "%d-%02d,%d\n",$1,i-1,$i}}' data-table.txt
#Convert values from x=year,y=month table to linear CSV.

awk '{ print substr($0, index($0,$3)) }' mail.log
#Print all from 3rd field to end of line. Very useful for log parsing.

# Generate a list of the emails($3) of speakers($8) in an attendee file.
awk -F\\t -v IGNORECASE=1 '$8~/speaker/{print $3}' attendees.tsv 

# Get a list of 10/8 internal IPs showing up in col 3 or 4.
awk '$3~/^src:10\./{print $3};$4~/^dest:10\./{print $4}' network.log |sort |uniq 

awk -F: {'print $1 ":" $2'} messages |uniq -c
#Count syslog hits per minute in your messages log file. Useful for doing quick stats.

awk '{print $4}' apache_log|sort -n|cut -c1-15|uniq -c|awk '{b="";for(i=0;i<$1/10;i++){b=b"#"}; print $0 " " b;}'
# Request by hour graph.

awk '$9 == "404" {print $7}' access.log |sort|uniq -c|sort -rn| head -n 50
# list top 50 404 in descending order.

awk 'length < 140' quotes.txt
# Print only the lines from quotes.txt that are shorter than 140 characters.

awk '$10==404 {print $7}' access_log
# Print out the file requested for CLF log entries with HTTP 404 status code.

awk '{a[$1] += $10} END {for (h in a) print h " " a[h]}' access_log | sort -k 2 -nr | head -10
# Display top bandwidth hogs on website.

awk '!(NR % 10)' file
#Take every 10th line from a file: 

awk NF
#Delete blank lines

awk 'length > 64'
#Print only the lines that are 65 characters in length or longer

awk '{ total = total + NF }; END { print total+0 }'
#Awk script to count the number of words in a file

a2p
#Perl has a utility a2p to convert awk scripts into perl scripts.

awk '!(NR % 10)' file
#Take every 10th line from a file: 

awk '{ total = total + NF }; END { print total+0 }'
#Awk script to count the number of words in a file

awk '!(NR % 10)' file
#Take every 10th line from a file

awk 'length > 64'
#Print only the lines that are 65 characters in length or longer

awk '!a[$0]++' 
#Remove duplicate lines:

/foo/,/bar/
#Awk can use a pair of regular expressions as a range, just like sed. In both languages, /foo/,/bar/ matches the same lines.

# MAC address conversion using only awk. 
awk 'BEGIN{FS=""}{for(i=1;i<=NF;i+=2){ r=r $i $(i+1)":"}}END{sub(/:$/,"",r);print r}' file

awk '{if (t!=$0){print;t=$0}}' file
#Remove duplicate lines *without sorting*. Low memory version. That !a[$0]++ construct can get big.

awk '!a[$0]++' file
#Remove duplicate lines without sorting 'file'. $0 means whole line in awk. 'a' is an array. So print if not in array.

awk 'length > max { max=length;maxline=$0 } END { print maxline; }' quotes.txt 
# Print the longest line in quotes.txt 

awk '!a[$0]++' file 
# Print lines of file without printing ones already seen. $0 means whole line in awk. 'a' is an array. 

awk -F':' '!max{max=$2;}{r="";i=s=.025*$2/max;while(i-->0)r=r"-";printf "%40s | %4d | %s %s",$1,$2,r,"\n";}' 
# Histo gen by @dez_blanchfield

 
awk '{s+=$3} END {print s}' data.txt 
# Sum numbers in the third column of data.txt. 

awk '($3 == "64.39.106.131") || ($1 ~ /^#/)' conn.log
# Search 3rd field of conn.log for an IP and print the header line. 

awk '/^From:/{print;nextfile}' * 
# Print only the first From: line in each mail message file. 

awk '{print $7}' access_log | sort | uniq -c | sort -rn | head -100 
# Display top 100 files accessed on website. 

awk '{print $1}' data.txt 
# Print out just the first column (whitespace separated) of data.txt'

awk '!a[$0]++' file
# Remove duplicate lines without sorting 'file'. $0 means whole line in awk. 'a' is an array. So print if not in array.

awk 'length < 140' quotes.txt
# Print only the lines from quotes.txt that are shorter than 140 characters.

#####
# How to get started using awk
################################

#----------------------------------------------///
# awk cheat sheet
#-----------------------------------------------------------------------///

HANDY ONE-LINE SCRIPTS FOR AWK                               30 April 2008
Compiled by Eric Pement - eric [at] pement.org               version 0.27

Latest version of this file (in English) is usually at:
   http://www.pement.org/awk/awk1line.txt

This file will also be available in other languages:
   Chinese  - http://ximix.org/translation/awk1line_zh-CN.txt   

USAGE:

   Unix: awk '/pattern/ {print "$1"}'    # standard Unix shells
DOS/Win: awk '/pattern/ {print "$1"}'    # compiled with DJGPP, Cygwin
         awk "/pattern/ {print \"$1\"}"  # GnuWin32, UnxUtils, Mingw

Note that the DJGPP compilation (for DOS or Windows-32) permits an awk
script to follow Unix quoting syntax '/like/ {"this"}'. HOWEVER, if the
command interpreter is CMD.EXE or COMMAND.COM, single quotes will not
protect the redirection arrows (<, >) nor do they protect pipes (|).
These are special symbols which require "double quotes" to protect them
from interpretation as operating system directives. If the command
interpreter is bash, ksh or another Unix shell, then single and double
quotes will follow the standard Unix usage.

Users of MS-DOS or Microsoft Windows must remember that the percent
sign (%) is used to indicate environment variables, so this symbol must
be doubled (%%) to yield a single percent sign visible to awk.

If a script will not need to be quoted in Unix, DOS, or CMD, then I
normally omit the quote marks. If an example is peculiar to GNU awk,
the command 'gawk' will be used. Please notify me if you find errors or
new commands to add to this list (total length under 65 characters). I
usually try to put the shortest script first. To conserve space, I
normally use '1' instead of '{print}' to print each line. Either one
will work.

FILE SPACING:

 # double space a file
 awk '1;{print ""}'
 awk 'BEGIN{ORS="\n\n"};1'

 # double space a file which already has blank lines in it. Output file
 # should contain no more than one blank line between lines of text.
 # NOTE: On Unix systems, DOS lines which have only CRLF (\r\n) are
 # often treated as non-blank, and thus 'NF' alone will return TRUE.
 awk 'NF{print $0 "\n"}'

 # triple space a file
 awk '1;{print "\n"}'

NUMBERING AND CALCULATIONS:

 # precede each line by its line number FOR THAT FILE (left alignment).
 # Using a tab (\t) instead of space will preserve margins.
 awk '{print FNR "\t" $0}' files*

 # precede each line by its line number FOR ALL FILES TOGETHER, with tab.
 awk '{print NR "\t" $0}' files*

 # number each line of a file (number on left, right-aligned)
 # Double the percent signs if typing from the DOS command prompt.
 awk '{printf("%5d : %s\n", NR,$0)}'

 # number each line of file, but only print numbers if line is not blank
 # Remember caveats about Unix treatment of \r (mentioned above)
 awk 'NF{$0=++a " :" $0};1'
 awk '{print (NF? ++a " :" :"") $0}'

 # count lines (emulates "wc -l")
 awk 'END{print NR}'

 # print the sums of the fields of every line
 awk '{s=0; for (i=1; i<=NF; i++) s=s+$i; print s}'

 # add all fields in all lines and print the sum
 awk '{for (i=1; i<=NF; i++) s=s+$i}; END{print s}'

 # print every line after replacing each field with its absolute value
 awk '{for (i=1; i<=NF; i++) if ($i < 0) $i = -$i; print }'
 awk '{for (i=1; i<=NF; i++) $i = ($i < 0) ? -$i : $i; print }'

 # print the total number of fields ("words") in all lines
 awk '{ total = total + NF }; END {print total}' file

 # print the total number of lines that contain "Beth"
 awk '/Beth/{n++}; END {print n+0}' file

 # print the largest first field and the line that contains it
 # Intended for finding the longest string in field #1
 awk '$1 > max {max=$1; maxline=$0}; END{ print max, maxline}'

 # print the number of fields in each line, followed by the line
 awk '{ print NF ":" $0 } '

 # print the last field of each line
 awk '{ print $NF }'

 # print the last field of the last line
 awk '{ field = $NF }; END{ print field }'

 # print every line with more than 4 fields
 awk 'NF > 4'

 # print every line where the value of the last field is > 4
 awk '$NF > 4'

STRING CREATION:

 # create a string of a specific length (e.g., generate 513 spaces)
 awk 'BEGIN{while (a++<513) s=s " "; print s}'

 # insert a string of specific length at a certain character position
 # Example: insert 49 spaces after column #6 of each input line.
 gawk --re-interval 'BEGIN{while(a++<49)s=s " "};{sub(/^.{6}/,"&" s)};1'

ARRAY CREATION:

 # These next 2 entries are not one-line scripts, but the technique
 # is so handy that it merits inclusion here.
 
 # create an array named "month", indexed by numbers, so that month[1]
 # is 'Jan', month[2] is 'Feb', month[3] is 'Mar' and so on.
 split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", month, " ")

 # create an array named "mdigit", indexed by strings, so that
 # mdigit["Jan"] is 1, mdigit["Feb"] is 2, etc. Requires "month" array
 for (i=1; i<=12; i++) mdigit[month[i]] = i

TEXT CONVERSION AND SUBSTITUTION:

 # IN UNIX ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format
 awk '{sub(/\r$/,"")};1'   # assumes EACH line ends with Ctrl-M

 # IN UNIX ENVIRONMENT: convert Unix newlines (LF) to DOS format
 awk '{sub(/$/,"\r")};1'

 # IN DOS ENVIRONMENT: convert Unix newlines (LF) to DOS format
 awk 1

 # IN DOS ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format
 # Cannot be done with DOS versions of awk, other than gawk:
 gawk -v BINMODE="w" '1' infile >outfile

 # Use "tr" instead.
 tr -d \r <infile >outfile            # GNU tr version 1.22 or higher

 # delete leading whitespace (spaces, tabs) from front of each line
 # aligns all text flush left
 awk '{sub(/^[ \t]+/, "")};1'

 # delete trailing whitespace (spaces, tabs) from end of each line
 awk '{sub(/[ \t]+$/, "")};1'

 # delete BOTH leading and trailing whitespace from each line
 awk '{gsub(/^[ \t]+|[ \t]+$/,"")};1'
 awk '{$1=$1};1'           # also removes extra space between fields

 # insert 5 blank spaces at beginning of each line (make page offset)
 awk '{sub(/^/, "     ")};1'

 # align all text flush right on a 79-column width
 awk '{printf "%79s\n", $0}' file*

 # center all text on a 79-character width
 awk '{l=length();s=int((79-l)/2); printf "%"(s+l)"s\n",$0}' file*

 # substitute (find and replace) "foo" with "bar" on each line
 awk '{sub(/foo/,"bar")}; 1'           # replace only 1st instance
 gawk '{$0=gensub(/foo/,"bar",4)}; 1'  # replace only 4th instance
 awk '{gsub(/foo/,"bar")}; 1'          # replace ALL instances in a line

 # substitute "foo" with "bar" ONLY for lines which contain "baz"
 awk '/baz/{gsub(/foo/, "bar")}; 1'

 # substitute "foo" with "bar" EXCEPT for lines which contain "baz"
 awk '!/baz/{gsub(/foo/, "bar")}; 1'

 # change "scarlet" or "ruby" or "puce" to "red"
 awk '{gsub(/scarlet|ruby|puce/, "red")}; 1'

 # reverse order of lines (emulates "tac")
 awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }' file*

 # if a line ends with a backslash, append the next line to it (fails if
 # there are multiple lines ending with backslash...)
 awk '/\\$/ {sub(/\\$/,""); getline t; print $0 t; next}; 1' file*

 # print and sort the login names of all users
 awk -F ":" '{print $1 | "sort" }' /etc/passwd

 # print the first 2 fields, in opposite order, of every line
 awk '{print $2, $1}' file

 # switch the first 2 fields of every line
 awk '{temp = $1; $1 = $2; $2 = temp}' file

 # print every line, deleting the second field of that line
 awk '{ $2 = ""; print }'

 # print in reverse order the fields of every line
 awk '{for (i=NF; i>0; i--) printf("%s ",$i);print ""}' file

 # concatenate every 5 lines of input, using a comma separator
 # between fields
 awk 'ORS=NR%5?",":"\n"' file

## SELECTIVE PRINTING OF CERTAIN LINES:

 # print first 10 lines of file (emulates behavior of "head")
 awk 'NR < 11'

 # print first line of file (emulates "head -1")
 awk 'NR>1{exit};1'

  # print the last 2 lines of a file (emulates "tail -2")
 awk '{y=x "\n" $0; x=$0};END{print y}'

 # print the last line of a file (emulates "tail -1")
 awk 'END{print}'

 # print only lines which match regular expression (emulates "grep")
 awk '/regex/'

 # print only lines which do NOT match regex (emulates "grep -v")
 awk '!/regex/'

 # print any line where field #5 is equal to "abc123"
 awk '$5 == "abc123"'

 # print only those lines where field #5 is NOT equal to "abc123"
 # This will also print lines which have less than 5 fields.
 awk '$5 != "abc123"'
 awk '!($5 == "abc123")'

 # matching a field against a regular expression
 awk '$7  ~ /^[a-f]/'    # print line if field #7 matches regex
 awk '$7 !~ /^[a-f]/'    # print line if field #7 does NOT match regex

 # print the line immediately before a regex, but not the line
 # containing the regex
 awk '/regex/{print x};{x=$0}'
 awk '/regex/{print (NR==1 ? "match on line 1" : x)};{x=$0}'

 # print the line immediately after a regex, but not the line
 # containing the regex
 awk '/regex/{getline;print}'

 # grep for AAA and BBB and CCC (in any order on the same line)
 awk '/AAA/ && /BBB/ && /CCC/'

 # grep for AAA and BBB and CCC (in that order)
 awk '/AAA.*BBB.*CCC/'

 # print only lines of 65 characters or longer
 awk 'length > 64'

 # print only lines of less than 65 characters
 awk 'length < 64'

 # print section of file from regular expression to end of file
 awk '/regex/,0'
 awk '/regex/,EOF'

 # print section of file based on line numbers (lines 8-12, inclusive)
 awk 'NR==8,NR==12'

 # print line number 52
 awk 'NR==52'
 awk 'NR==52 {print;exit}'          # more efficient on large files

 # print section of file between two regular expressions (inclusive)
 awk '/Iowa/,/Montana/'             # case sensitive

## SELECTIVE DELETION OF CERTAIN LINES:

 # delete ALL blank lines from a file (same as "grep '.' ")
 awk NF
 awk '/./'

 # remove duplicate, consecutive lines (emulates "uniq")
 awk 'a !~ $0; {a=$0}'

 # remove duplicate, nonconsecutive lines
 awk '!a[$0]++'                     # most concise script
 awk '!($0 in a){a[$0];print}'      # most efficient script

## CREDITS AND THANKS:

# Special thanks to the late Peter S. Tillier (U.K.) for helping me with
# the first release of this FAQ file, and to Daniel Jana, Yisu Dong, and
# others for their suggestions and corrections.

# For additional syntax instructions, including the way to apply editing
# commands from a disk file instead of the command line, consult:

  # "sed & awk, 2nd Edition," by Dale Dougherty and Arnold Robbins
  # (O'Reilly, 1997)

  # "UNIX Text Processing," by Dale Dougherty and Tim O'Reilly (Hayden
  # Books, 1987)

  # "GAWK: Effective awk Programming," 3d edition, by Arnold D. Robbins
  # (O'Reilly, 2003) or at http://www.gnu.org/software/gawk/manual/

# To fully exploit the power of awk, one must understand "regular
# expressions." For detailed discussion of regular expressions, see
# "Mastering Regular Expressions, 3d edition" by Jeffrey Friedl (O'Reilly,
# 2006).

# The info and manual ("man") pages on Unix systems may be helpful (try
# "man awk", "man nawk", "man gawk", "man regexp", or the section on
# regular expressions in "man ed").

# USE OF '\t' IN awk SCRIPTS: For clarity in documentation, I have used
# '\t' to indicate a tab character (0x09) in the scripts.  All versions of
# awk should recognize this abbreviation.

#---end of file---

# awk, sed, and grep are three of my favorite tools in the Linux or UNIX command line. They are all pretty powerful. Today we’ll look at how to get cracking with awk to help you ease into using it. Then we’ll look at some useful awk one liners to make things a bit more fun for you. AWK is a programming language designed for processing text-based data, either in files or data streams. It was created at Bell Labs in the 1970s. Although it’s quite old, don’t get fooled by it’s age. It is extremely powerful and efficient at what it does. Let’s get our hands dirty now. Before we delve into the complex working and usage of awk let’s get you started on it’s basics. We’ll create and use a dummy file for this exercise. You can use pretty much any text file, such as a log from your system. I will be using an sample output from one of my favorite system monitoring tools – Dstat. Here’s the output: 
# This is an ideal output for awk to handle. awk is great with comma or tab separated content. You’ll see why soon. So either create some similar data or copy and paste my example the above into a dummy file called something like test.txt. Launch a terminal window on your Linux computer. Almost all flavors of Linux ship with awk. In case you have found one that does not have it for some reason please install it. On the terminal window type the following from the directory where you have stored the test.txt file –

awk {‘print’} test.txt
# The output should contain the entire contents of the text file. What’s the fun in that.

# Now let’s see how you can pick a column and print just that one. Execute the following command:
awk {‘print $1′} test.txt

# Now we are asking awk to print just the first column of the text file. It will automatically figure out that the file is a tab separated one and print just the first column of the contents. You should see something like this in the output:
	—-total-cpu-usage—-
	usr
	5
	13
	8
	0
	1
	1
	1
	0
	1
	1

# You can do the same for any column you like. If you want awk to print the third column change command above shown command to:
awk {‘print $3′} test.txt
# You can also have awk print multiple columns. So if you want the first, third, and seventh columns printed add them to the command separated by commas.

awk {‘print $1, $3, $7′} test.txt
# would do the trick for you:

	—-total-cpu-usage—- -net/total-
	usr idl read
	5 93 154k
	13 87 0
	8 92 0
	0 99 0
	1 97 0
	1 98 0
	1 99 0
	0 99 0
	1 99 0
	1 100 0

# If you have a trickier file like the /etc/password file where the data is separated by colons rather that spaces or tabs, awk doesn’t pick that up automatically. In such cases you can feed awk with the correct separator. Use a command like this to print the second column of the file:

awk -F’:’ {‘print $1′} /etc/passwd

# This command will give you an output of the usernames of all the users on your system:

	apple
	mango
	banana
	watermelon
	kiwi
	orange

# You can do the same with any other type of separators. You can also use awk to parse your log files. For example, if you want to view all the IP addresses and the related web URLs that have been accesses on your web server you can use awk to parse your web server’s access log to get this information. Use the following command:

awk ‘$9 == 200 { print $1, $7}’ access.log

	199.63.142.250 /2008/10/my-5-favourite-hangouts/
	220.180.94.221 /2009/02/querious-a-mysql-client-for-the-mac/
	67.190.114.46 /2009/05/
	173.234.43.110 /2009/01/bicycle-rental/
	173.234.38.110 /wp-comments-post.php

# Using parsing like this you can figure out if someone is visiting your website a lot, as they may be stealing information. You can also sort this information. Say you wanted to know how many times a particular IP address visited your website

awk ‘$9 == 200 { print $1}’ access.log | sort | uniq -c | sort -nr

	46 122.248.161.1
	35 122.248.161.2
	26 65.202.21.10
	24 67.195.111.46
	19 144.36.231.111
	18 59.183.121.71
	
	
# Addresses in sed specify which lines to act on. They can be line numbers or regular expressions.
# In awk, records are split into fields by a separator. By default fields are separated by white space.
# In awk jargon, each line of a file is a record. You can change the way awk splits files into records, but by default it splits on newlines.
# The awk variable $1 contains the first field in a record, $2 the second, $3 the third, etc. $0 contains all fields.

# awk uses -F to specify a field separator. For example, -F: would say to use a colon as the field separator.

# sed and awk both use -f <filename> to specify a file of commands.

# The -n option tells sed to only print output when told to by the p command.
# Perl has a utility a2p to convert awk scripts into perl scripts.

# The awk variable NR contains the current record number.
# The awk variable NF contains the number of fields in the current record.

# Awk can use a pair of regular expressions as a range, just like sed. In both languages, /foo/,/bar/ matches the same lines.

awk uses -F to specify a field separator. For example, -F: would say to use a colon as the field separator.

# In awk, records are split into fields by a separator. By default fields are separated by white space.
# In awk jargon, each line of a file is a record. You can change the way awk splits files into records, but by default it splits on newlines.

awk '!(NR % 10)' file
# Take every 10th line from a file: 

awk NF
# Delete blank lines

awk 'length > 64'
# Print only the lines that are 65 characters in length or longer

awk '{ total = total + NF }; END { print total+0 }'
# Awk script to count the number of words in a file

a2p
# Perl has a utility a2p to convert awk scripts into perl scripts.

awk '{ total = total + NF }; END { print total+0 }'
# Awk script to count the number of words in a file

awk '!(NR % 10)' file
# Take every 10th line from a file

awk 'length > 64'
# Print only the lines that are 65 characters in length or longer

awk '!a[$0]++' 
# Remove duplicate lines:

/foo/,/bar/
# Awk can use a pair of regular expressions as a range, just like sed. In both languages, /foo/,/bar/ matches the same lines.

#------------------------------------------------///
# awk_oneliners - Handy One-Line Scripts for Awk
#-----------------------------------------------------------------------///
 awk 'END{print NR}'

 # print the sums of the fields of every line
 awk '{s=0; for (i=1; i<=NF; i++) s=s+$i; print s}'

 # add all fields in all lines and print the sum
 awk '{for (i=1; i<=NF; i++) s=s+$i}; END{print s}'

 # print every line after replacing each field with its absolute value
 awk '{for (i=1; i<=NF; i++) if ($i < 0) $i = -$i; print }'
 awk '{for (i=1; i<=NF; i++) $i = ($i < 0) ? -$i : $i; print }'

 # print the total number of fields ("words") in all lines
 awk '{ total = total + NF }; END {print total}' file

 # print the total number of lines that contain "Beth"
 awk '/Beth/{n++}; END {print n+0}' file

 # print the largest first field and the line that contains it
 # Intended for finding the longest string in field #1
 awk '$1 > max {max=$1; maxline=$0}; END{ print max, maxline}'

 # print the number of fields in each line, followed by the line
 awk '{ print NF ":" $0 } '

 # print the last field of each line
 awk '{ print $NF }'

 # print the last field of the last line
 awk '{ field = $NF }; END{ print field }'

 # print every line with more than 4 fields
 awk 'NF > 4'

 # print every line where the value of the last field is > 4
 awk '$NF > 4'

## STRING CREATION:

 # create a string of a specific length (e.g., generate 513 spaces)
 awk 'BEGIN{while (a++<513) s=s " "; print s}'

 # insert a string of specific length at a certain character position
 # Example: insert 49 spaces after column #6 of each input line.
 gawk --re-interval 'BEGIN{while(a++<49)s=s " "};{sub(/^.{6}/,"&" s)};1'

## ARRAY CREATION:

 # These next 2 entries are not one-line scripts, but the technique
 # is so handy that it merits inclusion here.
 
 # create an array named "month", indexed by numbers, so that month[1]
 # is 'Jan', month[2] is 'Feb', month[3] is 'Mar' and so on.
 split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", month, " ")

 # create an array named "mdigit", indexed by strings, so that
 # mdigit["Jan"] is 1, mdigit["Feb"] is 2, etc. Requires "month" array
 for (i=1; i<=12; i++) mdigit[month[i]] = i

## TEXT CONVERSION AND SUBSTITUTION:

 # IN UNIX ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format
 awk '{sub(/\r$/,"")};1'   # assumes EACH line ends with Ctrl-M

 # IN UNIX ENVIRONMENT: convert Unix newlines (LF) to DOS format
 awk '{sub(/$/,"\r")};1'

 # IN DOS ENVIRONMENT: convert Unix newlines (LF) to DOS format
 awk 1

 # IN DOS ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format
 # Cannot be done with DOS versions of awk, other than gawk:
 gawk -v BINMODE="w" '1' infile >outfile

 # Use "tr" instead.
 tr -d \r <infile >outfile            # GNU tr version 1.22 or higher

 # delete leading whitespace (spaces, tabs) from front of each line
 # aligns all text flush left
 awk '{sub(/^[ \t]+/, "")};1'

 # delete trailing whitespace (spaces, tabs) from end of each line
 awk '{sub(/[ \t]+$/, "")};1'

 # delete BOTH leading and trailing whitespace from each line
 awk '{gsub(/^[ \t]+|[ \t]+$/,"")};1'
 awk '{$1=$1};1'           # also removes extra space between fields

 # insert 5 blank spaces at beginning of each line (make page offset)
 awk '{sub(/^/, "     ")};1'

 # align all text flush right on a 79-column width
 awk '{printf "%79s\n", $0}' file*

 # center all text on a 79-character width
 awk '{l=length();s=int((79-l)/2); printf "%"(s+l)"s\n",$0}' file*

 # substitute (find and replace) "foo" with "bar" on each line
 awk '{sub(/foo/,"bar")}; 1'           # replace only 1st instance
 gawk '{$0=gensub(/foo/,"bar",4)}; 1'  # replace only 4th instance
 awk '{gsub(/foo/,"bar")}; 1'          # replace ALL instances in a line

 # substitute "foo" with "bar" ONLY for lines which contain "baz"
 awk '/baz/{gsub(/foo/, "bar")}; 1'

 # substitute "foo" with "bar" EXCEPT for lines which contain "baz"
 awk '!/baz/{gsub(/foo/, "bar")}; 1'

 # change "scarlet" or "ruby" or "puce" to "red"
 awk '{gsub(/scarlet|ruby|puce/, "red")}; 1'

 # reverse order of lines (emulates "tac")
 awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }' file*

 # if a line ends with a backslash, append the next line to it (fails if
 # there are multiple lines ending with backslash...)
 awk '/\\$/ {sub(/\\$/,""); getline t; print $0 t; next}; 1' file*

 # print and sort the login names of all users
 awk -F ":" '{print $1 | "sort" }' /etc/passwd

 # print the first 2 fields, in opposite order, of every line
 awk '{print $2, $1}' file

 # switch the first 2 fields of every line
 awk '{temp = $1; $1 = $2; $2 = temp}' file

 # print every line, deleting the second field of that line
 awk '{ $2 = ""; print }'

 # print in reverse order the fields of every line
 awk '{for (i=NF; i>0; i--) printf("%s ",$i);print ""}' file

 # concatenate every 5 lines of input, using a comma separator
 # between fields
 awk 'ORS=NR%5?",":"\n"' file

## SELECTIVE PRINTING OF CERTAIN LINES:

 # print first 10 lines of file (emulates behavior of "head")
 awk 'NR < 11'

 # print first line of file (emulates "head -1")
 awk 'NR>1{exit};1'

  # print the last 2 lines of a file (emulates "tail -2")
 awk '{y=x "\n" $0; x=$0};END{print y}'

 # print the last line of a file (emulates "tail -1")
 awk 'END{print}'

 # print only lines which match regular expression (emulates "grep")
 awk '/regex/'

 # print only lines which do NOT match regex (emulates "grep -v")
 awk '!/regex/'

 # print any line where field #5 is equal to "abc123"
 awk '$5 == "abc123"'

 # print only those lines where field #5 is NOT equal to "abc123"
 # This will also print lines which have less than 5 fields.
 awk '$5 != "abc123"'
 awk '!($5 == "abc123")'

 # matching a field against a regular expression
 awk '$7  ~ /^[a-f]/'    # print line if field #7 matches regex
 awk '$7 !~ /^[a-f]/'    # print line if field #7 does NOT match regex

 # print the line immediately before a regex, but not the line
 # containing the regex
 awk '/regex/{print x};{x=$0}'
 awk '/regex/{print (NR==1 ? "match on line 1" : x)};{x=$0}'

 # print the line immediately after a regex, but not the line
 # containing the regex
 awk '/regex/{getline;print}'

 # grep for AAA and BBB and CCC (in any order on the same line)
 awk '/AAA/ && /BBB/ && /CCC/'

 # grep for AAA and BBB and CCC (in that order)
 awk '/AAA.*BBB.*CCC/'

 # print only lines of 65 characters or longer
 awk 'length > 64'

 # print only lines of less than 65 characters
 awk 'length < 64'

 # print section of file from regular expression to end of file
 awk '/regex/,0'
 awk '/regex/,EOF'

 # print section of file based on line numbers (lines 8-12, inclusive)
 awk 'NR==8,NR==12'

 # print line number 52
 awk 'NR==52'
 awk 'NR==52 {print;exit}'          # more efficient on large files

 # print section of file between two regular expressions (inclusive)
 awk '/Iowa/,/Montana/'             # case sensitive

## SELECTIVE DELETION OF CERTAIN LINES:

 # delete ALL blank lines from a file (same as "grep '.' ")
 awk NF
 awk '/./'

 # remove duplicate, consecutive lines (emulates "uniq")
 awk 'a !~ $0; {a=$0}'

 # remove duplicate, nonconsecutive lines
 awk '!a[$0]++'                     # most concise script
 awk '!($0 in a){a[$0];print}'      # most efficient script

## CREDITS AND THANKS:

# Special thanks to the late Peter S. Tillier (U.K.) for helping me with
# the first release of this FAQ file, and to Daniel Jana, Yisu Dong, and
# others for their suggestions and corrections.

# For additional syntax instructions, including the way to apply editing
# commands from a disk file instead of the command line, consult:

  # "sed & awk, 2nd Edition," by Dale Dougherty and Arnold Robbins
  # (O'Reilly, 1997)

  # "UNIX Text Processing," by Dale Dougherty and Tim O'Reilly (Hayden
  # Books, 1987)

  # "GAWK: Effective awk Programming," 3d edition, by Arnold D. Robbins
  # (O'Reilly, 2003) or at http://www.gnu.org/software/gawk/manual/

# To fully exploit the power of awk, one must understand "regular
# expressions." For detailed discussion of regular expressions, see
# "Mastering Regular Expressions, 3d edition" by Jeffrey Friedl (O'Reilly,
# 2006).

# The info and manual ("man") pages on Unix systems may be helpful (try
# "man awk", "man nawk", "man gawk", "man regexp", or the section on
# regular expressions in "man ed").

# USE OF '\t' IN awk SCRIPTS: For clarity in documentation, I have used
# '\t' to indicate a tab character (0x09) in the scripts.  All versions of
# awk should recognize this abbreviation.

# #---end of file---

#-----------------------------------------------------------------------///

#==============================#
# CMD AWK
#==============================##==============================#
awk -F, '((37.19 < $7 && $7 < 37.23) && (-115.81 < $8 && $8 < -115.73))' gpsdata.csv
#Print lines from file where GPS coords are in range.

awk '{for (i=2;i<=13;i++) {printf "%d-%02d,%d\n",$1,i-1,$i}}' data-table.txt
#Convert values from x=year,y=month table to linear CSV.

awk -F'","' '{print $3}' data.csv
#Use a multi-character field separator to get field 3 out of a CSV file that uses double quoted fields.

awk -F, '{sqrt($4^2)}' data.csv
#Get the absolute value of the 4th column of numbers using the square root of the square trick.

awk '$9!~/^[23]/{print $4}' access_log | cut -c1-12 | uniq -c
# Show the number of UNsuccessful requests per day. (Not HTTP code 2XX or 3XX)

awk '{a[$1] += $10} END {for (h in a) print h " " a[h]}' access_log | sort -k 2 -nr | head -10
#Display top bandwidth hogs on website.

awk '$10==404 {print $7}' access_log
#Print out the file requested for CLF log entries with HTTP 404 status code.

awk -F: {'print $1 ":" $2'} messages |uniq -c
#Count syslog hits per minute in your messages log file. Useful for doing quick stats.

awk '{if (t!=$0){print;t=$0}}' file
# Remove duplicate lines *without sorting*. Low memory version. That !a[$0]++ construct can get big.

awk '{ print substr($0, index($0,$3)) }' mail.log
#Print all from 3rd field to end of line. Very useful for log parsing.

awk '{print $4}' apache_log|sort -n|cut -c1-15|uniq -c|awk '{b="";for(i=0;i<$1/10;i++){b=b"#"}; print $0 " " b;}'
#Request by hour graph.

awk and sed are like the fuck and shit of the "unix language". Very powerful and can be used nearly anywhere in a sentence.

awk '$9 == "404" {print $7}' access.log |sort|uniq -c|sort -rn| head -n 50
#list top 50 404 in descending order.

|gawk '{printf("%s %s\n",strftime("%Y-%m-%d_%T", $1),$0)}' 
# Prefix each line with the local time based on epoch time in 1st column.

awk '{a[$1] += $10} END {for (h in a) print h " " a[h]}' access_log | sort -k 2 -nr | head -10 
# Display top bandwidth hogs on website.

awk '{sum+=$1;n++;if (n==3){print sum/3 "\t" $0;n=0;sum=0}}' garage-temp.log 
# Print the running average of the last 3 temp values in front

awk '$9!~/^[23]/{print $4}' access_log | cut -c1-12 | uniq -c 
# Show the number of UNsuccessful requests per day. (Not HTTP code 2XX or 3XX)

awk '!(NR % 10)' file
# Take every 10th line from a file

awk NF
# Delete blank lines

awk 'length > 64'
# Print only the lines that are 65 characters in length or longer

awk '!a[$0]++' 
# Remove duplicate lines

#=======================================================================================================
# AWK File Parsing One Liners
#=======================================================================================================

gawk '{ print $1 }' tabfile.txt | sort | uniq > firstcolvals.txt
# Get a sorted list of values from the first column of a tab-separated file

gawk 'BEGIN { FS="--" } ; { print $1 }' data.txt | sort | uniq > firstcolvals.txt
# Get a sorted list of column values from a file with fields split by dashes

gawk 'BEGIN { FS="\t" } ; { print $2 "\t" $1 }' data.txt  | sort -nr > sorted.txt
# Reverse fields in a tab-separated file and sort it by the numeric field

gawk 'BEGIN { FS="\t" } ; { print $2 "\t" $1 }' data.txt  | sort -nr > sorted.txt
# Reverse fields in a tab-separated file and sort it by the numeric field

gawk '{ freq[$0]++ } ; END { sort = "sort --key=2" ; for (word in freq) printf "%s\t%d\n", word, freq[word] | sort ; close(sort) }' allprops.txt
#

gawk 'BEGIN { FS="--" } ; { freq[$1]++ } ; END { sort = "sort" ; for (word in freq) printf "%s\t%d\n", word, freq[word] | sort ; close(sort) }' data.txt
# Extract the first field, collect the count of it, and output the word, counts sorted by name

gawk 'BEGIN { FS="--" } ; { freq[$1]++ } ; END { sort = "sort --key=2 -nr" ; for (word in freq) printf "%s\t%d\n", word, freq[word] | sort ; close(sort) }' data.txt
# Extract the first field, collect the count of it, and output the word, counts sorted by count

#.----------------------------------------------------------------------------.#
#|                                                                            |#
#|                       _      __   _   _                                    |#
#|                      | |    /_ | | | (_)                                   |#
#|     __ _  __      __ | | __  | | | |  _   _ __     ___                     |#
#|    / _` | \ \ /\ / / | |/ /  | | | | | | | '_ \   / _ \                    |#
#|   | (_| |  \ V  V /  |   <   | | | | | | | | | | |  __/                    |#
#|    \__,_|   \_/\_/   |_|\_\  |_| |_| |_| |_| |_|  \___|                    |#
#|                                                                            |#
#|                                                                            |#
#|                                                                            |#
#|                 HowTo about using awk one liners                           |#
#|                                                                            |#
#'----------------------------------------------------------------------------'#
#| oTTo ([email protected]), 20170122                                          |#
#| http://www.php-faq.eu   -  good coders code, great reuse                   |#
#|                                                                            |#
#| Released under the GNU Free Document License                               |#
#'----------------------------------------------------------------------------'#
#
# http://patorjk.com/software/taag/

awk cheat sheet
gistfile1.txt
HANDY ONE-LINE SCRIPTS FOR AWK                               30 April 2008
Compiled by Eric Pement - eric [at] pement.org               version 0.27

Latest version of this file (in English) is usually at:
   http://www.pement.org/awk/awk1line.txt

#> USAGE:

   Unix: awk '/pattern/ {print "$1"}'    # standard Unix shells
DOS/Win: awk '/pattern/ {print "$1"}'    # compiled with DJGPP, Cygwin
         awk "/pattern/ {print \"$1\"}"  # GnuWin32, UnxUtils, Mingw

# Note that the DJGPP compilation (for DOS or Windows-32) permits an awk script to follow Unix quoting syntax '/like/ {"this"}'. HOWEVER, if the command interpreter is CMD.EXE or COMMAND.COM, single quotes will not protect the redirection arrows (<, >) nor do they protect pipes (|). These are special symbols which require "double quotes" to protect them from interpretation as operating system directives. If the command interpreter is bash, ksh or another Unix shell, then single and double quotes will follow the standard Unix usage.

# Users of MS-DOS or Microsoft Windows must remember that the percent sign (%) is used to indicate environment variables, so this symbol must be doubled (%%) to yield a single percent sign visible to awk.

# If a script will not need to be quoted in Unix, DOS, or CMD, then I normally omit the quote marks. If an example is peculiar to GNU awk, the command 'gawk' will be used. Please notify me if you find errors or new commands to add to this list (total length under 65 characters). I usually try to put the shortest script first. To conserve space, I normally use '1' instead of '{print}' to print each line. Either one will work.

#> FILE SPACING:

# double space a file
awk '1;{print ""}'
awk 'BEGIN{ORS="\n\n"};1'

# double space a file which already has blank lines in it. Output file should contain no more than one blank line between lines of text. NOTE: On Unix systems, DOS lines which have only CRLF (\r\n) are often treated as non-blank, and thus 'NF' alone will return TRUE.
awk 'NF{print $0 "\n"}'

# triple space a file
awk '1;{print "\n"}'

#>NUMBERING AND CALCULATIONS:

# precede each line by its line number FOR THAT FILE (left alignment). Using a tab (\t) instead of space will preserve margins.
awk '{print FNR "\t" $0}' files*

# precede each line by its line number FOR ALL FILES TOGETHER, with tab.
awk '{print NR "\t" $0}' files*

# number each line of a file (number on left, right-aligned) Double the percent signs if typing from the DOS command prompt.
awk '{printf("%5d : %s\n", NR,$0)}'

# number each line of file, but only print numbers if line is not blank Remember caveats about Unix treatment of \r (mentioned above)
awk 'NF{$0=++a " :" $0};1'
awk '{print (NF? ++a " :" :"") $0}'

# count lines (emulates "wc -l")
awk 'END{print NR}'

# print the sums of the fields of every line
awk '{s=0; for (i=1; i<=NF; i++) s=s+$i; print s}'

# add all fields in all lines and print the sum
awk '{for (i=1; i<=NF; i++) s=s+$i}; END{print s}'

# print every line after replacing each field with its absolute value
awk '{for (i=1; i<=NF; i++) if ($i < 0) $i = -$i; print }'
awk '{for (i=1; i<=NF; i++) $i = ($i < 0) ? -$i : $i; print }'

# print the total number of fields ("words") in all lines
awk '{ total = total + NF }; END {print total}' file

# print the total number of lines that contain "Beth"
awk '/Beth/{n++}; END {print n+0}' file

# print the largest first field and the line that contains it Intended for finding the longest string in field #1
awk '$1 > max {max=$1; maxline=$0}; END{ print max, maxline}'

# print the number of fields in each line, followed by the line
awk '{ print NF ":" $0 } '

# print the last field of each line
awk '{ print $NF }'

# print the last field of the last line
awk '{ field = $NF }; END{ print field }'

# print every line with more than 4 fields
awk 'NF > 4'

# print every line where the value of the last field is > 4
awk '$NF > 4'

STRING CREATION:

# create a string of a specific length (e.g., generate 513 spaces)
awk 'BEGIN{while (a++<513) s=s " "; print s}'

# insert a string of specific length at a certain character position Example: insert 49 spaces after column #6 of each input line.
gawk --re-interval 'BEGIN{while(a++<49)s=s " "};{sub(/^.{6}/,"&" s)};1'

# ARRAY CREATION:
#-----------------#

# These next 2 entries are not one-line scripts, but the technique is so handy that it merits inclusion here. create an array named "month", indexed by numbers, so that month[1] is 'Jan', month[2] is 'Feb', month[3] is 'Mar' and so on.
split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", month, " ")

# create an array named "mdigit", indexed by strings, so that mdigit["Jan"] is 1, mdigit["Feb"] is 2, etc. Requires "month" array
for (i=1; i<=12; i++) mdigit[month[i]] = i

# TEXT CONVERSION AND SUBSTITUTION:
#-----------------------------------#

# IN UNIX ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format - assumes EACH line ends with Ctrl-M
awk '{sub(/\r$/,"")};1'

# IN UNIX ENVIRONMENT: convert Unix newlines (LF) to DOS format
awk '{sub(/$/,"\r")};1'

# IN DOS ENVIRONMENT: convert Unix newlines (LF) to DOS format
awk 1

# IN DOS ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format - Cannot be done with DOS versions of awk, other than gawk:
gawk -v BINMODE="w" '1' infile >outfile

# Use "tr" instead. - GNU tr version 1.22 or higher
tr -d \r <infile >outfile

# delete leading whitespace (spaces, tabs) from front of each line aligns all text flush left
awk '{sub(/^[ \t]+/, "")};1'

# delete trailing whitespace (spaces, tabs) from end of each line
awk '{sub(/[ \t]+$/, "")};1'

# delete BOTH leading and trailing whitespace from each line
awk '{gsub(/^[ \t]+|[ \t]+$/,"")};1'

# also removes extra space between fields
awk '{$1=$1};1'

# insert 5 blank spaces at beginning of each line (make page offset)
awk '{sub(/^/, "     ")};1'

# align all text flush right on a 79-column width
awk '{printf "%79s\n", $0}' file*

# center all text on a 79-character width
awk '{l=length();s=int((79-l)/2); printf "%"(s+l)"s\n",$0}' file*

# substitute (find and replace) "foo" with "bar" on each line
awk '{sub(/foo/,"bar")}; 1'           # replace only 1st instance
gawk '{$0=gensub(/foo/,"bar",4)}; 1'  # replace only 4th instance
awk '{gsub(/foo/,"bar")}; 1'          # replace ALL instances in a line

# substitute "foo" with "bar" ONLY for lines which contain "baz"
awk '/baz/{gsub(/foo/, "bar")}; 1'

# substitute "foo" with "bar" EXCEPT for lines which contain "baz"
awk '!/baz/{gsub(/foo/, "bar")}; 1'

# change "scarlet" or "ruby" or "puce" to "red"
awk '{gsub(/scarlet|ruby|puce/, "red")}; 1'

# reverse order of lines (emulates "tac")
awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }' file*

# if a line ends with a backslash, append the next line to it (fails if there are multiple lines ending with backslash...)
awk '/\\$/ {sub(/\\$/,""); getline t; print $0 t; next}; 1' file*

# print and sort the login names of all users
awk -F ":" '{print $1 | "sort" }' /etc/passwd

# print the first 2 fields, in opposite order, of every line
awk '{print $2, $1}' file

# switch the first 2 fields of every line
awk '{temp = $1; $1 = $2; $2 = temp}' file

# print every line, deleting the second field of that line
awk '{ $2 = ""; print }'

# print in reverse order the fields of every line
awk '{for (i=NF; i>0; i--) printf("%s ",$i);print ""}' file

# concatenate every 5 lines of input, using a comma separator between fields
awk 'ORS=NR%5?",":"\n"' file

SELECTIVE PRINTING OF CERTAIN LINES:

# print first 10 lines of file (emulates behavior of "head")
awk 'NR < 11'

# print first line of file (emulates "head -1")
awk 'NR>1{exit};1'

# print the last 2 lines of a file (emulates "tail -2")
awk '{y=x "\n" $0; x=$0};END{print y}'

# print the last line of a file (emulates "tail -1")
awk 'END{print}'

# print only lines which match regular expression (emulates "grep")
awk '/regex/'

# print only lines which do NOT match regex (emulates "grep -v")
awk '!/regex/'

# print any line where field #5 is equal to "abc123"
awk '$5 == "abc123"'

# print only those lines where field #5 is NOT equal to "abc123" This will also print lines which have less than 5 fields.
awk '$5 != "abc123"'
awk '!($5 == "abc123")'

# matching a field against a regular expression
awk '$7  ~ /^[a-f]/'    # print line if field #7 matches regex
awk '$7 !~ /^[a-f]/'    # print line if field #7 does NOT match regex

# print the line immediately before a regex, but not the line containing the regex
awk '/regex/{print x};{x=$0}'
awk '/regex/{print (NR==1 ? "match on line 1" : x)};{x=$0}'

# print the line immediately after a regex, but not the line containing the regex
awk '/regex/{getline;print}'

# grep for AAA and BBB and CCC (in any order on the same line)
awk '/AAA/ && /BBB/ && /CCC/'

# grep for AAA and BBB and CCC (in that order)
awk '/AAA.*BBB.*CCC/'

# print only lines of 65 characters or longer
awk 'length > 64'

# print only lines of less than 65 characters
awk 'length < 64'

# print section of file from regular expression to end of file
awk '/regex/,0'
awk '/regex/,EOF'

# print section of file based on line numbers (lines 8-12, inclusive)
awk 'NR==8,NR==12'

# print line number 52
awk 'NR==52'
awk 'NR==52 {print;exit}'          # more efficient on large files

# print section of file between two regular expressions (inclusive)
awk '/Iowa/,/Montana/'             # case sensitive

SELECTIVE DELETION OF CERTAIN LINES:

# delete ALL blank lines from a file (same as "grep '.' ")
awk NF
awk '/./'

# remove duplicate, consecutive lines (emulates "uniq")
awk 'a !~ $0; {a=$0}'

# remove duplicate, nonconsecutive lines
awk '!a[$0]++'                     # most concise script
awk '!($0 in a){a[$0];print}'      # most efficient script

SPECIAL APPLICATIONS:

 # remove nroff overstrikes (char, backspace) from man pages. The 'echo'
 # command may need an -e switch if you use Unix System V or bash shell.
 awk "s/.`echo \\\b`//g"    # double quotes required for Unix environment
 awk 's/.^H//g'             # in bash/tcsh, press Ctrl-V and then Ctrl-H
 awk 's/.\x08//g'           # hex expression for sed v1.5

 # get Usenet/e-mail message header
 awk '/^$/q'                # deletes everything after first blank line

 # get Usenet/e-mail message body
 awk '1,/^$/d'              # deletes everything up to first blank line

 # get Subject header, but remove initial "Subject: " portion
 awk '/^Subject: */!d; s///;q'

 # get return address header
 awk '/^Reply-To:/q; /^From:/h; /./d;g;q'

 # parse out the address proper. Pulls out the e-mail address by itself
 # from the 1-line return address header (see preceding script)
 awk 's/ *(.*)//; s/>.*//; s/.*[:<] *//'

 # add a leading angle bracket and space to each line (quote a message)
 awk 's/^/> /'

 # delete leading angle bracket & space from each line (unquote a message)
 awk 's/^> //'

 # remove most HTML tags (accommodates multiple-line tags)
 awk -e :a -e 's/<[^>]*>//g;/</N;//ba'

 # extract multi-part uuencoded binaries, removing extraneous header
 # info, so that only the uuencoded portion remains. Files passed to
 # sed must be passed in the proper order. Version 1 can be entered
 # from the command line; version 2 can be made into an executable
 # Unix shell script. (Modified from a script by Rahul Dhesi.)
 awk '/^end/,/^begin/d' file1 file2 ... fileX | uudecode   # vers. 1
 awk '/^end/,/^begin/d' "$@" | uudecode                    # vers. 2

 # zip up each .TXT file individually, deleting the source file and
 # setting the name of each .ZIP file to the basename of the .TXT file
 # (under DOS: the "dir /b" switch returns bare filenames in all caps).
 echo @echo off >zipup.bat
 dir /b *.txt | awk "s/^\(.*\)\.TXT/pkzip -mo \1 \1.TXT/" >>zipup.bat

TYPICAL USE:
Sed takes one or more editing commands and applies all of
them, in sequence, to each line of input. After all the commands have
been applied to the first input line, that line is output and a second
input line is taken for processing, and the cycle repeats. The
preceding examples assume that input comes from the standard input
device (i.e, the console, normally this will be piped input). One or
more filenames can be appended to the command line if the input does
not come from stdin. Output is sent to stdout (the screen). Thus:

 cat filename | sed '10q'        # uses piped input
 sed '10q' filename              # same effect, avoids a useless "cat"
 sed '10q' filename > newfile    # redirects output to disk

 awk one liners (collected from various sources by Eric Pement,
generally use gawk where possible)

 # print and sort the login names of all users
 awk 'BEGIN { FS = ":" }; { print $1 | "sort" }' /etc/passwd

 # print the number of fields in each line
 awk '{ print NF }'

 # print the last field of each line
 awk '{ print $NF }'

 # print the last field of the last line
 awk '{ field = $NF }; END{ print field }'

 # print every line with more than 4 fields
 awk 'NF > 4'

 # print every line where the value of the last field is > 4
 awk '$NF > 4'

 # print the total number of fields ("words") in all lines
 awk '{ total = total + NF }; END {print total}' file

 # print the total number of lines that contain "Beth"
 awk '/Beth/ { nlines++ }; END {print nlines}' file

 # print the largest first field and the line that contains it
 awk '$1 > max {max=$1; maxline=$0}; END{ print max, maxline}'

 # print the number of fields in each line, followed by the line
 awk '{ print NF ":" $0 } '

 # print the first 2 fields, in opposite order, of every line
 awk '{print $2, $1}' file

 # switch the first 2 fields of every line
 awk '{temp = $1; $1 = $2; $2 = temp}' file

 # print every line, deleting the second field of that line
 awk '{ $2 = ""; print }'

 # print in reverse order the fields of every line
 awk '{for (i=NF; i>0; i--) printf("%s ",i);printf ("\n")}' file

 # print the sums of the fields of every line
 awk '{sum=0; for (i=1; i<=NF; i++) sum = sum+i; print sum}'

 # add all fields in all lines and print the sum
 awk '{for (i=1; i<=NF; i++) sum = sum+1}; END {print sum}'

 # print every line after replacing each field with its absolute value
 awk '{for (i=1; i<=NF; i++) if ($i < 0) $i = -$i; print }'

 # remove duplicate, nonconsecutive lines from an unsorted file
 awk "! a[$0]++"                     # most concise script
 awk "!($0 in a) {a[$0];print}"      # most efficient script

#.----------------------------------------------------------------------------.#
#|                                                                            |#
#|                       _      __   _   _                                    |#
#|                      | |    /_ | | | (_)                                   |#
#|     __ _  __      __ | | __  | | | |  _   _ __     ___                     |#
#|    / _` | \ \ /\ / / | |/ /  | | | | | | | '_ \   / _ \                    |#
#|   | (_| |  \ V  V /  |   <   | | | | | | | | | | |  __/                    |#
#|    \__,_|   \_/\_/   |_|\_\  |_| |_| |_| |_| |_|  \___|                    |#
#|                                                                            |#
#|                                                                            |#
#|                                                                            |#
#|                 HowTo about using awk one liners                           |#
#|                                                                            |#
#'----------------------------------------------------------------------------'#
#| oTTo ([email protected]), 20170122                                          |#
#| http://www.php-faq.eu   -  good coders code, great reuse                   |#
#|                                                                            |#
#| Released under the GNU Free Document License                               |#
#'----------------------------------------------------------------------------'#
#
# http://patorjk.com/software/taag/

awk cheat sheet
gistfile1.txt
HANDY ONE-LINE SCRIPTS FOR AWK                               30 April 2008
Compiled by Eric Pement - eric [at] pement.org               version 0.27

Latest version of this file (in English) is usually at:
   http://www.pement.org/awk/awk1line.txt

#> USAGE:

   Unix: awk '/pattern/ {print "$1"}'    # standard Unix shells
DOS/Win: awk '/pattern/ {print "$1"}'    # compiled with DJGPP, Cygwin
         awk "/pattern/ {print \"$1\"}"  # GnuWin32, UnxUtils, Mingw

# Note that the DJGPP compilation (for DOS or Windows-32) permits an awk script to follow Unix quoting syntax '/like/ {"this"}'. HOWEVER, if the command interpreter is CMD.EXE or COMMAND.COM, single quotes will not protect the redirection arrows (<, >) nor do they protect pipes (|). These are special symbols which require "double quotes" to protect them from interpretation as operating system directives. If the command interpreter is bash, ksh or another Unix shell, then single and double quotes will follow the standard Unix usage.

# Users of MS-DOS or Microsoft Windows must remember that the percent sign (%) is used to indicate environment variables, so this symbol must be doubled (%%) to yield a single percent sign visible to awk.

# If a script will not need to be quoted in Unix, DOS, or CMD, then I normally omit the quote marks. If an example is peculiar to GNU awk, the command 'gawk' will be used. Please notify me if you find errors or new commands to add to this list (total length under 65 characters). I usually try to put the shortest script first. To conserve space, I normally use '1' instead of '{print}' to print each line. Either one will work.

#> FILE SPACING:

# double space a file
awk '1;{print ""}'
awk 'BEGIN{ORS="\n\n"};1'

# double space a file which already has blank lines in it. Output file should contain no more than one blank line between lines of text. NOTE: On Unix systems, DOS lines which have only CRLF (\r\n) are often treated as non-blank, and thus 'NF' alone will return TRUE.
awk 'NF{print $0 "\n"}'

# triple space a file
awk '1;{print "\n"}'

#>NUMBERING AND CALCULATIONS:

# precede each line by its line number FOR THAT FILE (left alignment). Using a tab (\t) instead of space will preserve margins.
awk '{print FNR "\t" $0}' files*

# precede each line by its line number FOR ALL FILES TOGETHER, with tab.
awk '{print NR "\t" $0}' files*

# number each line of a file (number on left, right-aligned) Double the percent signs if typing from the DOS command prompt.
awk '{printf("%5d : %s\n", NR,$0)}'

# number each line of file, but only print numbers if line is not blank Remember caveats about Unix treatment of \r (mentioned above)
awk 'NF{$0=++a " :" $0};1'
awk '{print (NF? ++a " :" :"") $0}'

# count lines (emulates "wc -l")
awk 'END{print NR}'

# print the sums of the fields of every line
awk '{s=0; for (i=1; i<=NF; i++) s=s+$i; print s}'

# add all fields in all lines and print the sum
awk '{for (i=1; i<=NF; i++) s=s+$i}; END{print s}'

# print every line after replacing each field with its absolute value
awk '{for (i=1; i<=NF; i++) if ($i < 0) $i = -$i; print }'
awk '{for (i=1; i<=NF; i++) $i = ($i < 0) ? -$i : $i; print }'

# print the total number of fields ("words") in all lines
awk '{ total = total + NF }; END {print total}' file

# print the total number of lines that contain "Beth"
awk '/Beth/{n++}; END {print n+0}' file

# print the largest first field and the line that contains it Intended for finding the longest string in field #1
awk '$1 > max {max=$1; maxline=$0}; END{ print max, maxline}'

# print the number of fields in each line, followed by the line
awk '{ print NF ":" $0 } '

# print the last field of each line
awk '{ print $NF }'

# print the last field of the last line
awk '{ field = $NF }; END{ print field }'

# print every line with more than 4 fields
awk 'NF > 4'

# print every line where the value of the last field is > 4
awk '$NF > 4'

STRING CREATION:

# create a string of a specific length (e.g., generate 513 spaces)
awk 'BEGIN{while (a++<513) s=s " "; print s}'

# insert a string of specific length at a certain character position Example: insert 49 spaces after column #6 of each input line.
gawk --re-interval 'BEGIN{while(a++<49)s=s " "};{sub(/^.{6}/,"&" s)};1'

# ARRAY CREATION:
#-----------------#

# These next 2 entries are not one-line scripts, but the technique is so handy that it merits inclusion here. create an array named "month", indexed by numbers, so that month[1] is 'Jan', month[2] is 'Feb', month[3] is 'Mar' and so on.
split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", month, " ")

# create an array named "mdigit", indexed by strings, so that mdigit["Jan"] is 1, mdigit["Feb"] is 2, etc. Requires "month" array
for (i=1; i<=12; i++) mdigit[month[i]] = i

# TEXT CONVERSION AND SUBSTITUTION:
#-----------------------------------#

# IN UNIX ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format - assumes EACH line ends with Ctrl-M
awk '{sub(/\r$/,"")};1'

# IN UNIX ENVIRONMENT: convert Unix newlines (LF) to DOS format
awk '{sub(/$/,"\r")};1'

# IN DOS ENVIRONMENT: convert Unix newlines (LF) to DOS format
awk 1

# IN DOS ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format - Cannot be done with DOS versions of awk, other than gawk:
gawk -v BINMODE="w" '1' infile >outfile

# Use "tr" instead. - GNU tr version 1.22 or higher
tr -d \r <infile >outfile

# delete leading whitespace (spaces, tabs) from front of each line aligns all text flush left
awk '{sub(/^[ \t]+/, "")};1'

# delete trailing whitespace (spaces, tabs) from end of each line
awk '{sub(/[ \t]+$/, "")};1'

# delete BOTH leading and trailing whitespace from each line
awk '{gsub(/^[ \t]+|[ \t]+$/,"")};1'

# also removes extra space between fields
awk '{$1=$1};1'

# insert 5 blank spaces at beginning of each line (make page offset)
awk '{sub(/^/, "     ")};1'

# align all text flush right on a 79-column width
awk '{printf "%79s\n", $0}' file*

# center all text on a 79-character width
awk '{l=length();s=int((79-l)/2); printf "%"(s+l)"s\n",$0}' file*

# substitute (find and replace) "foo" with "bar" on each line
awk '{sub(/foo/,"bar")}; 1'           # replace only 1st instance
gawk '{$0=gensub(/foo/,"bar",4)}; 1'  # replace only 4th instance
awk '{gsub(/foo/,"bar")}; 1'          # replace ALL instances in a line

# substitute "foo" with "bar" ONLY for lines which contain "baz"
awk '/baz/{gsub(/foo/, "bar")}; 1'

# substitute "foo" with "bar" EXCEPT for lines which contain "baz"
awk '!/baz/{gsub(/foo/, "bar")}; 1'

# change "scarlet" or "ruby" or "puce" to "red"
awk '{gsub(/scarlet|ruby|puce/, "red")}; 1'

# reverse order of lines (emulates "tac")
awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }' file*

# if a line ends with a backslash, append the next line to it (fails if there are multiple lines ending with backslash...)
awk '/\\$/ {sub(/\\$/,""); getline t; print $0 t; next}; 1' file*

# print and sort the login names of all users
awk -F ":" '{print $1 | "sort" }' /etc/passwd

# print the first 2 fields, in opposite order, of every line
awk '{print $2, $1}' file

# switch the first 2 fields of every line
awk '{temp = $1; $1 = $2; $2 = temp}' file

# print every line, deleting the second field of that line
awk '{ $2 = ""; print }'

# print in reverse order the fields of every line
awk '{for (i=NF; i>0; i--) printf("%s ",$i);print ""}' file

# concatenate every 5 lines of input, using a comma separator between fields
awk 'ORS=NR%5?",":"\n"' file

SELECTIVE PRINTING OF CERTAIN LINES:

# print first 10 lines of file (emulates behavior of "head")
awk 'NR < 11'

# print first line of file (emulates "head -1")
awk 'NR>1{exit};1'

# print the last 2 lines of a file (emulates "tail -2")
awk '{y=x "\n" $0; x=$0};END{print y}'

# print the last line of a file (emulates "tail -1")
awk 'END{print}'

# print only lines which match regular expression (emulates "grep")
awk '/regex/'

# print only lines which do NOT match regex (emulates "grep -v")
awk '!/regex/'

# print any line where field #5 is equal to "abc123"
awk '$5 == "abc123"'

# print only those lines where field #5 is NOT equal to "abc123" This will also print lines which have less than 5 fields.
awk '$5 != "abc123"'
awk '!($5 == "abc123")'

# matching a field against a regular expression
awk '$7  ~ /^[a-f]/'    # print line if field #7 matches regex
awk '$7 !~ /^[a-f]/'    # print line if field #7 does NOT match regex

# print the line immediately before a regex, but not the line containing the regex
awk '/regex/{print x};{x=$0}'
awk '/regex/{print (NR==1 ? "match on line 1" : x)};{x=$0}'

# print the line immediately after a regex, but not the line containing the regex
awk '/regex/{getline;print}'

# grep for AAA and BBB and CCC (in any order on the same line)
awk '/AAA/ && /BBB/ && /CCC/'

# grep for AAA and BBB and CCC (in that order)
awk '/AAA.*BBB.*CCC/'

# print only lines of 65 characters or longer
awk 'length > 64'

# print only lines of less than 65 characters
awk 'length < 64'

# print section of file from regular expression to end of file
awk '/regex/,0'
awk '/regex/,EOF'

# print section of file based on line numbers (lines 8-12, inclusive)
awk 'NR==8,NR==12'

# print line number 52
awk 'NR==52'
awk 'NR==52 {print;exit}'          # more efficient on large files

# print section of file between two regular expressions (inclusive)
awk '/Iowa/,/Montana/'             # case sensitive

SELECTIVE DELETION OF CERTAIN LINES:

# delete ALL blank lines from a file (same as "grep '.' ")
awk NF
awk '/./'

# remove duplicate, consecutive lines (emulates "uniq")
awk 'a !~ $0; {a=$0}'

# remove duplicate, nonconsecutive lines
awk '!a[$0]++'                     # most concise script
awk '!($0 in a){a[$0];print}'      # most efficient script

SPECIAL APPLICATIONS:

 # remove nroff overstrikes (char, backspace) from man pages. The 'echo'
 # command may need an -e switch if you use Unix System V or bash shell.
 awk "s/.`echo \\\b`//g"    # double quotes required for Unix environment
 awk 's/.^H//g'             # in bash/tcsh, press Ctrl-V and then Ctrl-H
 awk 's/.\x08//g'           # hex expression for sed v1.5

 # get Usenet/e-mail message header
 awk '/^$/q'                # deletes everything after first blank line

 # get Usenet/e-mail message body
 awk '1,/^$/d'              # deletes everything up to first blank line

 # get Subject header, but remove initial "Subject: " portion
 awk '/^Subject: */!d; s///;q'

 # get return address header
 awk '/^Reply-To:/q; /^From:/h; /./d;g;q'

 # parse out the address proper. Pulls out the e-mail address by itself
 # from the 1-line return address header (see preceding script)
 awk 's/ *(.*)//; s/>.*//; s/.*[:<] *//'

 # add a leading angle bracket and space to each line (quote a message)
 awk 's/^/> /'

 # delete leading angle bracket & space from each line (unquote a message)
 awk 's/^> //'

 # remove most HTML tags (accommodates multiple-line tags)
 awk -e :a -e 's/<[^>]*>//g;/</N;//ba'

 # extract multi-part uuencoded binaries, removing extraneous header
 # info, so that only the uuencoded portion remains. Files passed to
 # sed must be passed in the proper order. Version 1 can be entered
 # from the command line; version 2 can be made into an executable
 # Unix shell script. (Modified from a script by Rahul Dhesi.)
 awk '/^end/,/^begin/d' file1 file2 ... fileX | uudecode   # vers. 1
 awk '/^end/,/^begin/d' "$@" | uudecode                    # vers. 2

 # zip up each .TXT file individually, deleting the source file and
 # setting the name of each .ZIP file to the basename of the .TXT file
 # (under DOS: the "dir /b" switch returns bare filenames in all caps).
 echo @echo off >zipup.bat
 dir /b *.txt | awk "s/^\(.*\)\.TXT/pkzip -mo \1 \1.TXT/" >>zipup.bat

TYPICAL USE:
Sed takes one or more editing commands and applies all of
them, in sequence, to each line of input. After all the commands have
been applied to the first input line, that line is output and a second
input line is taken for processing, and the cycle repeats. The
preceding examples assume that input comes from the standard input
device (i.e, the console, normally this will be piped input). One or
more filenames can be appended to the command line if the input does
not come from stdin. Output is sent to stdout (the screen). Thus:

 cat filename | sed '10q'        # uses piped input
 sed '10q' filename              # same effect, avoids a useless "cat"
 sed '10q' filename > newfile    # redirects output to disk

 awk one liners (collected from various sources by Eric Pement,
generally use gawk where possible)

 # print and sort the login names of all users
 awk 'BEGIN { FS = ":" }; { print $1 | "sort" }' /etc/passwd

 # print the number of fields in each line
 awk '{ print NF }'

 # print the last field of each line
 awk '{ print $NF }'

 # print the last field of the last line
 awk '{ field = $NF }; END{ print field }'

 # print every line with more than 4 fields
 awk 'NF > 4'

 # print every line where the value of the last field is > 4
 awk '$NF > 4'

 # print the total number of fields ("words") in all lines
 awk '{ total = total + NF }; END {print total}' file

 # print the total number of lines that contain "Beth"
 awk '/Beth/ { nlines++ }; END {print nlines}' file

 # print the largest first field and the line that contains it
 awk '$1 > max {max=$1; maxline=$0}; END{ print max, maxline}'

 # print the number of fields in each line, followed by the line
 awk '{ print NF ":" $0 } '

 # print the first 2 fields, in opposite order, of every line
 awk '{print $2, $1}' file

 # switch the first 2 fields of every line
 awk '{temp = $1; $1 = $2; $2 = temp}' file

 # print every line, deleting the second field of that line
 awk '{ $2 = ""; print }'

 # print in reverse order the fields of every line
 awk '{for (i=NF; i>0; i--) printf("%s ",i);printf ("\n")}' file

 # print the sums of the fields of every line
 awk '{sum=0; for (i=1; i<=NF; i++) sum = sum+i; print sum}'

 # add all fields in all lines and print the sum
 awk '{for (i=1; i<=NF; i++) sum = sum+1}; END {print sum}'

 # print every line after replacing each field with its absolute value
 awk '{for (i=1; i<=NF; i++) if ($i < 0) $i = -$i; print }'

 # remove duplicate, nonconsecutive lines from an unsorted file
 awk "! a[$0]++"                     # most concise script
 awk "!($0 in a) {a[$0];print}"      # most efficient script

#.----------------------------------------------------------------------------.#
#|                   _                                                        |#
#|                  | |                                                       |#
#|     __ ___      _| | __  _   _ ___  ___                                    |#
#|    / _` \ \ /\ / / |/ / | | | / __|/ _ \                                   |#
#|   | (_| |\ V  V /|   <  | |_| \__ \  __/                                   |#
#|    \__,_| \_/\_/ |_|\_\  \__,_|___/\___|                                   |#
#|                                                                            |#
#|            HowTo about using awk                                           |#
#|                                                                            |#
#'----------------------------------------------------------------------------'#
#| oTTo ([email protected]), 20170122                                          |#
#| http://www.php-faq.eu   -  good coders code, great reuse                   |#
#|                                                                            |#
#| Released under the GNU Free Document License                               |#
#'----------------------------------------------------------------------------'#

#> AWK syntax:
awk [-Fs] "program" [file1 file2...]   # commands come from DOS cmdline
awk 'program{print "foo"}' file1       # single quotes around double quotes
# NOTIZ: Don't use single quotes alone if the embedded info will contain the vertical bar or redirection arrows! Either use double quotes, or (if using 4DOS) use backticks around the single quotes:  `'NF>1'`

# NOTIZ: since awk will accept single quotes around arguments from the DOS command line, this means that DOS filenames which contain a single quote cannot be found by awk, even though they are legal names under MS-DOS. To get awk to find a file named foo'bar, the name must be entered as foo"'"bar.

awk [-Fs] -f pgmfile [file1 file2...]   # commands come from DOS file

# If file1 is omitted, input comes from stdin (console). Option -Fz sets the field separator FS to letter "z".

# AWK notes:
# "pattern {action}"
if {action} is omitted, {print $0} is assumed
if "pattern" is omitted, each line is selected for {action}.

Fields are separated by 1 or more spaces or tabs: "field1   field2"
If the commands come from a file, the quotes below can be omitted.

# Basic AWK commands:
# -------------------#
"NR == 5" file             # show rec. no. (line) 5.  NOTIZ: "==" is equals.
{FOO = 5}                  # single = assigns "5" to the variable FOO
"$2 == 0 {print $1}"       # if 2d field is 0, print 1st field
"$3 < 10"                  # if 3d field < 10, numeric comparison; print line
'$3 < "10" '               # use single quotes for string comparison!, or
-f pgmfile [$3 < "10"]     # use "-f pgmfile"  for string comparison
"$3 ~ /regexp/"            # if /regexp/ matches 3d field, print the line
'$3 ~ "regexp" '           # regexp can appear in double-quoted string*
                           # * If double-quoted, 2 backslashes for every 1 in regexps
					  # * Double-quoted strings require the match (~) character.
"NF > 4"                   # print all lines with 5 or more fields
"$NF > 4"                  # print lines where the last field is 5 or more
"{print NF}"               # tell us how many fields (words) are on each line
"{print $NF}"              # print last field of each line

"/regexp/"                 # Only print lines containing "regexp"
"/text|file/"              # Lines containing "text" or "file" (CASE SENSITIVE!)

"/foo/{print "za", NR}"    # FAILS on DOS/4DOS command line!!
'/foo/{print "za", NR}'    # WORKS on DOS/4DOS command line!!
					  # If lines matches "foo", print word and line no.
`"/foo/{print \"za\",NR}"` # WORKS on 4DOS cmd line: escape internal quotes with slash and backticks; for historical interest only.

"$3 ~ /B/ {print $2,$3}"   # If 3d field contains "B", print 2d + 3d fields
"$4 !~ /R/"                # Print lines where 4th field does NOT contain "R"

'$1=$1'                    # Del extra white space between fields & blank lines
'{$1=$1;print}'            # Del extra white space between fields, keep blanks
'NF'                       # Del all blank lines

# AND(&&), OR(||), NOT(!)
#--------------------------------------------------------------------------------------------#
"$2 >= 4 || $3 <= 20"      # lines where 2d field >= 4 .OR. 3d field <= 20
"NR > 5 && /with/"         # lines containing "with" for lines 6 or beyond
"/x/ && NF > 2"            # lines containing "x" with more than 2 fields

"$3/$2 != 5"               # not equal to "value" or "string"
"$3 !~ /regexp/"           # regexp does not match in 3d field
"!($3 == 2 && $1 ~ /foo/)" # print lines that do NOT match condition

"{print NF, $1, $NF}"      # print no. of fields, 1st field, last field
"{print NR, $0}"           # prefix a line number to each line
'{print NR ": " $0}'       # prefix a line number, colon, space to each line

"NR == 10, NR == 20"       # print records (lines) 10 - 20, inclusive
"/start/, /stop/"          # print lines between "start" and "stop"

"length($0) > 72"          # print all lines longer than 72 chars
"{print $2, $1}"           # invert first 2 fields, delete all others
"{print substr($0,index($0,$3))}"  # print field #3 to end of the line

# END{...} usage
#---------------#     END reads all input first.

#(1.) END { print NR }                     	# same output as "wc -l"

#(2.) {s = s + $1 }                        	# print sum, ave. of all figures in col. 1
END {print "sum is", s, "average is", s/NR}

#(3.) {names=names $1 " " }      	     	# converts all fields in col 1 to
END { print names }        	    			# concatenated fields in 1 line, e.g.
   +---Beth   4.00 0         				#
input |   Mary   3.75 0         			# infile is converted to:
file |   Kathy  4.00  10       			#   "Beth Mary Kathy Mark" on output
   +---Mark   5.00  30       				#

#(4.)  { field = $NF }               		# print the last field of the last line
END { print field }

#> PRINT, PRINTF:   print expressions, print formatted
print expr1, expr2, ..., exprn    			# parens() needed if the expression contains
print(expr1, expr2, ..., exprn)   			# any relational operator: <, <=, ==, >, >=

print                             			# an abbreviation for {print $0}
print ""                          			# print only a blank line
printf(expr1,expr2,expr3,\n}      			# add newline to printf statements

# FORMAT CONVERSION:
# ------------------#
BEGIN{ RS="";    FS="\n";        			# takes records sep. by blank lines, fields
  ORS="\n"; OFS="," }        				# sep. by newlines, and converts to records
{$1=$1; print }                 	 		# sep. by newlines, fields sep. by commas.

# PARAGRAPHS:
#-----------
'BEGIN{RS="";ORS="\n\n"};/foo/'  			# print paragraph if 'foo' is there.
'BEGIN{RS="";ORS="\n\n"};/foo/&&/bar/'  	# need both
'BEGIN{RS="";ORS="\n\n"};/foo|bar/'     	# need either

# PASSING VARIABLES:
# ------------------#
gawk -v var="/regexp/" 'var{print "Here it is"}'   # var is a regexp
gawk -v var="regexp" '$0~var{print "Here it is"}'  # var is a quoted string
gawk -v num=50 '$5 == num'                         # var is a numeric value

#> Built-in variables:
ARGC       # number of command-line arguments
ARGV       # array of command-line arguments (ARGV[0...ARVC-1])
FILENAME   # name of current input file
FNR        # input record number in current file
FS         # input field separator (default blank)
NF         # number of fields in current input record
NR         # input record number since beginning
OFMT       # output format for numbers (default "%.6g")
OFS        # output field separator (default blank)
ORS        # output record separator (default newline)
RLENGTH    # length of string matched by regular expression in match
RS         # input record separator (default newline)
RSTART     # beginning position of string matched by match
SUBSEP     # separator for array subscripts of form [i,j,...] (default ^\)

#> Escape sequences:
\b       						# backspace (^H)
\f       						# formfeed (^L)
\n       						# newline (DOS, CR/LF; Unix, LF)
\r       						# carriage return
\t       						# tab (^I)
\ddd     						# octal value `ddd', where `ddd' is 1-3 digits, from 0 to 7
\c       						# any other character is a literal, eg, \" for " and \\ for \

#> Awk string functions:
`r' is a regexp, `s' and `t' are strings, `i' and `n' are integers
`&' in replacement string in SUB or GSUB is replaced by the matched string`

gsub(r,s,t)    				# globally replace regex r with string s, applied to data t;
							# return no. of substitutions; if t is omitted, $0 is used.
gensub(r,s,h,t) 				# replace regex r with string s, on match number h, applied
							# to data t; if h is 'g', do globally; if t is omitted, $0 is
							# used. Return the converted pattern, not the no. of changes.
index(s,t)     				# return the index of t in s, or 0 if s does not contain t
length(s)      				# return the length of s
match(s,r)     				# return index of where s matches r, or 0 if there is no
							# match; set RSTART and RLENGTH
split(s,a,fs)  				# split s into array a on fs, return no. of fields; if fs is
							# omitted, FS is used in its place
sprintf(fmt,expr-list)     		# return expr-list formatted according to fmt
sub(r,s,t)     				# like gsub but only the first matched substring is replaced
substr(s,i,n)  				# return the n-character substring of s starting at i; if n
							# is omitted, return the suffix of s starting at i

Arithmetic functions:
atan2(y,x)     				# arctangent of y/x in radians in the range of -� to �
cos(x)         				# cosine (angle in radians)
exp(n)         				# exponential e� (n need not be an integer)
int(x)         				# truncate to integer
log(x)         				# natural logarithm
rand()         				# pseudo-random number r, 0 � r � 1
sin(x)        				 	# sine (angle in radians)
sqrt(x)        				# square root
srand(x)       				# set new seed for random number generator; uses time of day
							# if no x given

[end-of-file]

# How to slurp up whole files in awk
#====================================#

# To slurp up an entire file at once in awk, set the RS (record  separator) variable to an unused character that is not in the file. E.g.,

BEGIN { RS = "\f" }       # \f is formfeed or Ctrl-L

# However, if you plan to pass multiple files on the command-line to  awk, like this:

awk -f myscript.awk file*

awk 'BEGIN{RS="\f"}; {more-to-come...}' file*

# you may also need a way to cycle through each file one at a time. It is NOT possible to use the END{...} block in awk, since the END{...} block is invoked only once, after all the files have been read. Therefore, a different technique must be used.

# If reading whole files, use the FNR variable, which is like NR (number of the current record) relative to the current file. Thus, when FNR==1 in slurp mode, you know that the entire file has been read. Thus, here is an example:

BEGIN{ RS = "\f" }
FNR==1 && /foo.*bar/ { print "Both terms occur in " FILENAME }

which can be invoked like this:

[end-of-file]

# awktip2.msg - In a textfile where paragraphs are separated by a blank line, locate all instances of one-line paragraphs.
#------------------------------------------------------------------------------------------------------------------#

gawk -f script.awk myfile

# filename:  script.awk
BEGIN { RS=""; FS="\n"}
NF==1 { print NR ": " $0 }
#---end of script---

   or, in 4dos:
 gawk `"BEGIN{RS=\"\";FS=\"\\n\"}; NF==1{print NR \": \" $0}"` myfile
 
 [end-of-file]
 
 
 
 

 
 #--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--#
# filename: awktail.txt - author: Eric Pement - date: 2002 Oct 13
#--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--=--#

# Beware about making unsafe assumptions about the "tail" or end of a line. I used to need to refer to the end of a line from the 3rd field to the end, and I have used a variable like this:

      tail = substr($0,index($0,$3))

# Note that I was attempting create a tail variable which begins at $3 and goes to the EOL. However, in a line like this:

      aa aabbcc bb dd ee

# that technique will assign "bbcc bb dd ee" to the 'tail' variable, and not "bb dd ee", as I often expected. Here are some potential solutions to the 'tail problem':

# SOLUTION 1:
#-------------------------------------------------///
    # Works if you don't mind deleting $1 and $2, and if you know
    # the space delimiters between them:
    $1 = ""
    $2 = ""
    tail = substr($0, 3)    # remove superfluous delimiter after $1, $2

# SOLUTION 2:
#-------------------------------------------------///
    # Does not require deleting $1 and $2
    # Still assumes there is only 1 space between words.
    # Get the tail, starting at field $3
    tail = substr($0,length($1 $2) + 3)

    # One more example: Get the tail, starting at field $4
    tail = substr($0,length($1 $2 $3) +4)

# SOLUTION 3:
#-------------------------------------------------///
    # Works if there are variable spaces between words
    # get tail beginning at $4
    tail=$0; for (i=1;i<4;i++) sub($i,"",tail); sub(/ */,"",tail)
    
 [end-of-file]
 
 
 
# awksys - Using awk system() command to process part of a file
#------------------------------------------------------

# Example: I want to use awk to select PART of a file for processing, but I want to use a different command (like sed or fmt) to do the work. Here is how to do it against myfile.

                                        
awk '$6 > 4 {system("echo " $0 "| sed -f script.sed");next};1' myfile

# or more succinctly:

awk -f myscript.awk myfile

# where my myscript.awk is:

     #---begin  myscript.awk ---
     $6 > 4 {
       system ("echo " $0 "| sed -f script.sed")
       next
     }
     1
     #--- end of script ---

# Every line will be printed, but each line in which the value of the 6th field ($6) is greater than 4 will be passed to sed for processing. Note that the space after 'echo' is significant: "echo ".
# The '1' just before the end of the script tells awk to print all the lines normally that were not matched by the pattern. Technically, '1' is a pattern itself that resolves to TRUE, and if a pattern is specified in awk without an action, the line will be printed when the pattern matches or is true. Thus, '1' works to print every line.
# Finally, note that this is a line-oriented script. To pass a group of lines to an external program, collect them in a variable and echo the variable, piping the results to the program you specify.
 [end-of-file]
 
 

# Using awk system() command to process part of a file
#------------------------------------------------------

# Example: I want to use awk to select PART of a file for processing, but I want to use a different command (like sed or fmt) to do the work. Here's how to do it against myfile.

                                        
awk '$6 > 4 {system("echo " $0 "| sed -f script.sed");next};1' myfile

# or more succinctly:

     awk -f myscript.awk myfile

# where my myscript.awk is:

     #---begin  myscript.awk ---
     $6 > 4 {
       system ("echo " $0 "| sed -f script.sed")
       next
     }
     1
     #--- end of script ---

# Every line will be printed, but each line in which the value of the 6th field ($6) is greater than 4 will be passed to sed for processing. Note that the space after 'echo' is significant: "echo ".
# The '1' just before the end of the script tells awk to print all the lines normally that were not matched by the pattern. Technically, '1' is a pattern itself that resolves to TRUE, and if a pattern is specified in awk without an action, the line will be printed when the pattern matches or is true. Thus, '1' works to print every line.
# Finally, note that this is a line-oriented script. To pass a group of lines to an external program, collect them in a variable and echo the variable, piping the results to the program you specify.

 [end-of-file]
 
 
 
 
 

 
 
 
 #==============================#
# CMD AWK
#==============================##==============================#

# USAGE:
#
#   	Unix: 	awk '/pattern/ {print "$1"}'    # standard Unix shells
# 	DOS/Win: 	awk '/pattern/ {print "$1"}'    # compiled with DJGPP, Cygwin
#    		awk "/pattern/ {print \"$1\"}"  # GnuWin32, UnxUtils, Mingw

### FILE SPACING:

# double space a file
awk '1;{print ""}'
awk 'BEGIN{ORS="\n\n"};1'

# double space a file which already has blank lines in it. Output file
# should contain no more than one blank line between lines of text.
# NOTE: On Unix systems, DOS lines which have only CRLF (\r\n) are
# often treated as non-blank, and thus 'NF' alone will return TRUE.
awk 'NF{print $0 "\n"}'

# triple space a file
awk '1;{print "\n"}'

### NUMBERING AND CALCULATIONS:

# precede each line by its line number FOR THAT FILE (left alignment).
# Using a tab (\t) instead of space will preserve margins.
awk '{print FNR "\t" $0}' files*

# precede each line by its line number FOR ALL FILES TOGETHER, with tab.
awk '{print NR "\t" $0}' files*

# number each line of a file (number on left, right-aligned)
# Double the percent signs if typing from the DOS command prompt.
awk '{printf("%5d : %s\n", NR,$0)}'

# number each line of file, but only print numbers if line is not blank
# Remember caveats about Unix treatment of \r (mentioned above)
awk 'NF{$0=++a " :" $0};1'
awk '{print (NF? ++a " :" :"") $0}'

# count lines (emulates "wc -l")
awk 'END{print NR}'

# print the sums of the fields of every line
awk '{s=0; for (i=1; i<=NF; i++) s=s+$i; print s}'

# add all fields in all lines and print the sum
awk '{for (i=1; i<=NF; i++) s=s+$i}; END{print s}'

# print every line after replacing each field with its absolute value
awk '{for (i=1; i<=NF; i++) if ($i < 0) $i = -$i; print }'
awk '{for (i=1; i<=NF; i++) $i = ($i < 0) ? -$i : $i; print }'

# print the total number of fields ("words") in all lines
awk '{ total = total + NF }; END {print total}' file

# print the total number of lines that contain "Beth"
awk '/Beth/{n++}; END {print n+0}' file

# print the largest first field and the line that contains it
# Intended for finding the longest string in field #1
awk '$1 > max {max=$1; maxline=$0}; END{ print max, maxline}'

# print the number of fields in each line, followed by the line
awk '{ print NF ":" $0 } '

# print the last field of each line
awk '{ print $NF }'

# print the last field of the last line
awk '{ field = $NF }; END{ print field }'

# print every line with more than 4 fields
awk 'NF > 4'

# print every line where the value of the last field is > 4
awk '$NF > 4'

### STRING CREATION:

# create a string of a specific length (e.g., generate 513 spaces)
awk 'BEGIN{while (a++<513) s=s " "; print s}'

# insert a string of specific length at a certain character position
# Example: insert 49 spaces after column #6 of each input line.
gawk --re-interval 'BEGIN{while(a++<49)s=s " "};{sub(/^.{6}/,"&" s)};1'

### ARRAY CREATION:

# These next 2 entries are not one-line scripts, but the technique
# is so handy that it merits inclusion here.

# create an array named "month", indexed by numbers, so that month[1]
# is 'Jan', month[2] is 'Feb', month[3] is 'Mar' and so on.
split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", month, " ")

# create an array named "mdigit", indexed by strings, so that
# mdigit["Jan"] is 1, mdigit["Feb"] is 2, etc. Requires "month" array
for (i=1; i<=12; i++) mdigit[month[i]] = i

### TEXT CONVERSION AND SUBSTITUTION:

# IN UNIX ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format
awk '{sub(/\r$/,"")};1'   # assumes EACH line ends with Ctrl-M

# IN UNIX ENVIRONMENT: convert Unix newlines (LF) to DOS format
awk '{sub(/$/,"\r")};1'

# IN DOS ENVIRONMENT: convert Unix newlines (LF) to DOS format
awk 1

# IN DOS ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format
# Cannot be done with DOS versions of awk, other than gawk:
gawk -v BINMODE="w" '1' infile >outfile

# Use "tr" instead.
tr -d \r <infile >outfile            # GNU tr version 1.22 or higher

# delete leading whitespace (spaces, tabs) from front of each line
# aligns all text flush left
awk '{sub(/^[ \t]+/, "")};1'

# delete trailing whitespace (spaces, tabs) from end of each line
awk '{sub(/[ \t]+$/, "")};1'

# delete BOTH leading and trailing whitespace from each line
awk '{gsub(/^[ \t]+|[ \t]+$/,"")};1'
awk '{$1=$1};1'           # also removes extra space between fields

# insert 5 blank spaces at beginning of each line (make page offset)
awk '{sub(/^/, "     ")};1'

# align all text flush right on a 79-column width
awk '{printf "%79s\n", $0}' file*

# center all text on a 79-character width
awk '{l=length();s=int((79-l)/2); printf "%"(s+l)"s\n",$0}' file*

# substitute (find and replace) "foo" with "bar" on each line
awk '{sub(/foo/,"bar")}; 1'           # replace only 1st instance
gawk '{$0=gensub(/foo/,"bar",4)}; 1'  # replace only 4th instance
awk '{gsub(/foo/,"bar")}; 1'          # replace ALL instances in a line

# substitute "foo" with "bar" ONLY for lines which contain "baz"
awk '/baz/{gsub(/foo/, "bar")}; 1'

# substitute "foo" with "bar" EXCEPT for lines which contain "baz"
awk '!/baz/{gsub(/foo/, "bar")}; 1'

# change "scarlet" or "ruby" or "puce" to "red"
awk '{gsub(/scarlet|ruby|puce/, "red")}; 1'

# reverse order of lines (emulates "tac")
awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }' file*

# if a line ends with a backslash, append the next line to it (fails if
# there are multiple lines ending with backslash...)
awk '/\\$/ {sub(/\\$/,""); getline t; print $0 t; next}; 1' file*

# print and sort the login names of all users
awk -F ":" '{print $1 | "sort" }' /etc/passwd

# print the first 2 fields, in opposite order, of every line
awk '{print $2, $1}' file

# switch the first 2 fields of every line
awk '{temp = $1; $1 = $2; $2 = temp}' file

# print every line, deleting the second field of that line
awk '{ $2 = ""; print }'

# print in reverse order the fields of every line
awk '{for (i=NF; i>0; i--) printf("%s ",$i);print ""}' file

# concatenate every 5 lines of input, using a comma separator
# between fields
awk 'ORS=NR%5?",":"\n"' file

### SELECTIVE PRINTING OF CERTAIN LINES:

# print first 10 lines of file (emulates behavior of "head")
awk 'NR < 11'

# print first line of file (emulates "head -1")
awk 'NR>1{exit};1'

# print the last 2 lines of a file (emulates "tail -2")
awk '{y=x "\n" $0; x=$0};END{print y}'

# print the last line of a file (emulates "tail -1")
awk 'END{print}'

# print only lines which match regular expression (emulates "grep")
awk '/regex/'

# print only lines which do NOT match regex (emulates "grep -v")
awk '!/regex/'

# print any line where field #5 is equal to "abc123"
awk '$5 == "abc123"'

# print only those lines where field #5 is NOT equal to "abc123"
# This will also print lines which have less than 5 fields.
awk '$5 != "abc123"'
awk '!($5 == "abc123")'

# matching a field against a regular expression
awk '$7  ~ /^[a-f]/'    # print line if field #7 matches regex
awk '$7 !~ /^[a-f]/'    # print line if field #7 does NOT match regex

# print the line immediately before a regex, but not the line
# containing the regex
awk '/regex/{print x};{x=$0}'
awk '/regex/{print (NR==1 ? "match on line 1" : x)};{x=$0}'

# print the line immediately after a regex, but not the line
# containing the regex
awk '/regex/{getline;print}'

# grep for AAA and BBB and CCC (in any order on the same line)
awk '/AAA/ && /BBB/ && /CCC/'

# grep for AAA and BBB and CCC (in that order)
awk '/AAA.*BBB.*CCC/'

# print only lines of 65 characters or longer
awk 'length > 64'

# print only lines of less than 65 characters
awk 'length < 64'

# print section of file from regular expression to end of file
awk '/regex/,0'
awk '/regex/,EOF'

# print section of file based on line numbers (lines 8-12, inclusive)
awk 'NR==8,NR==12'

# print line number 52
awk 'NR==52'
awk 'NR==52 {print;exit}'          # more efficient on large files

# print section of file between two regular expressions (inclusive)
awk '/Iowa/,/Montana/'             # case sensitive

### SELECTIVE DELETION OF CERTAIN LINES:

# delete ALL blank lines from a file (same as "grep '.' ")
awk NF
awk '/./'

# remove duplicate, consecutive lines (emulates "uniq")
awk 'a !~ $0; {a=$0}'

# remove duplicate, nonconsecutive lines
awk '!a[$0]++'                     # most concise script
awk '!($0 in a){a[$0];print}'      # most efficient script

# Addresses in sed specify which lines to act on. They can be line numbers or regular expressions. In awk, records are split into fields by a separator. By default fields are separated by white space. In awk jargon, each line of a file is a record. You can change the way awk splits files into records, but by default it splits on newlines. The awk variable $1 contains the first field in a record, $2 the second, $3 the third, etc. $0 contains all fields. awk uses -F to specify a field separator. For example, -F: would say to use a colon as the field separator. sed and awk both use -f <filename> to specify a file of commands. The -n option tells sed to only print output when told to by the p command. Perl has a utility a2p to convert awk scripts into perl scripts.The awk variable NR contains the current record number. The awk variable NF contains the number of fields in the current record. Awk can use a pair of regular expressions as a range, just like sed. In both languages, /foo/,/bar/ matches the same lines.

awk uses -F to specify a field separator. For example, -F: would say to use a colon as the field separator.
In awk, records are split into fields by a separator. By default fields are separated by white space.
In awk jargon, each line of a file is a record. You can change the way awk splits files into records, but by default it splits on newlines.

awk -F'","' '{print $3}' data.csv
# Use a multi-character field separator to get field 3 out of a CSV file that uses double quoted fields.

awk '$9!~/^[23]/{print $4}' access_log | cut -c1-12 | uniq -c
# Show the number of UNsuccessful requests per day. (Not HTTP code 2XX or 3XX)

awk -F, '{sqrt($4^2)}' data.csv
# Get the absolute value of the 4th column of numbers using the square root of the square trick.

awk -F, '((37.19 < $7 && $7 < 37.23) && (-115.81 < $8 && $8 < -115.73))' gpsdata.csv
# Print lines from file where GPS coords are in range.

awk '{for (i=2;i<=13;i++) {printf "%d-%02d,%d\n",$1,i-1,$i}}' data-table.txt
# Convert values from x=year,y=month table to linear CSV.

awk '{ print substr($0, index($0,$3)) }' mail.log
# Print all from 3rd field to end of line. Very useful for log parsing.

awk -F: {'print $1 ":" $2'} messages |uniq -c
# Count syslog hits per minute in your messages log file. Useful for doing quick stats.

awk '{print $4}' apache_log|sort -n|cut -c1-15|uniq -c|awk '{b="";for(i=0;i<$1/10;i++){b=b"#"}; print $0 " " b;}'
# Request by hour graph.

awk '$9 == "404" {print $7}' access.log |sort|uniq -c|sort -rn| head -n 50
# list top 50 404 in descending order.

awk 'length < 140' quotes.txt
# Print only the lines from quotes.txt that are shorter than 140 characters.

awk '$10==404 {print $7}' access_log
# Print out the file requested for CLF log entries with HTTP 404 status code.

awk '{a[$1] += $10} END {for (h in a) print h " " a[h]}' access_log | sort -k 2 -nr | head -10
# Display top bandwidth hogs on website.

awk '!(NR % 10)' file
# Take every 10th line from a file: 

awk NF
# Delete blank lines

awk 'length > 64'
# Print only the lines that are 65 characters in length or longer

awk '{ total = total + NF }; END { print total+0 }'
# Awk script to count the number of words in a file

a2p
# Perl has a utility a2p to convert awk scripts into perl scripts.

awk '!(NR % 10)' file
# Take every 10th line from a file: 

awk '{ total = total + NF }; END { print total+0 }'
# Awk script to count the number of words in a file

awk '!(NR % 10)' file
# Take every 10th line from a file

awk 'length > 64'
# Print only the lines that are 65 characters in length or longer

awk '!a[$0]++' 
# Remove duplicate lines:

/foo/,/bar/
# Awk can use a pair of regular expressions as a range, just like sed. In both languages, /foo/,/bar/ matches the same lines.

# MAC address conversion using only awk. 
awk 'BEGIN{FS=""}{for(i=1;i<=NF;i+=2){ r=r $i $(i+1)":"}}END{sub(/:$/,"",r);print r}' file

awk '{if (t!=$0){print;t=$0}}' file
#Remove duplicate lines *without sorting*. Low memory version. That !a[$0]++ construct can get big.

awk '!a[$0]++' file
#Remove duplicate lines without sorting 'file'. $0 means whole line in awk. 'a' is an array. So print if not in array.

awk 'length > max { max=length;maxline=$0 } END { print maxline; }' quotes.txt 
# Print the longest line in quotes.txt 

awk '!a[$0]++' file 
# Print lines of file without printing ones already seen. $0 means whole line in awk. 'a' is an array. 

awk -F':' '!max{max=$2;}{r="";i=s=.025*$2/max;while(i-->0)r=r"-";printf "%40s | %4d | %s %s",$1,$2,r,"\n";}' 
# Histo gen by @dez_blanchfield

awk '{s+=$3} END {print s}' data.txt 
# Sum numbers in the third column of data.txt. 

awk '($3 == "64.39.106.131") || ($1 ~ /^#/)' conn.log
# Search 3rd field of conn.log for an IP and print the header line. 

awk '/^From:/{print;nextfile}' * 
# Print only the first From: line in each mail message file. 

awk '{print $7}' access_log | sort | uniq -c | sort -rn | head -100 
# Display top 100 files accessed on website. 

awk '{print $1}' data.txt 
# Print out just the first column (whitespace separated) of data.txt'

awk '{ print $1}' access.log.2016-05-08 | sort | uniq -c | sort -nr | head -n 10
# Find Top 10 IP Addresses Accessing Your Apache Web Server
#    awk � prints the access.log.2016-05-08 file.
#    sort � helps to sort lines in a access.log.2016-05-08 file, the -n option compares lines based on the numerical value of strings and -r option reverses the outcome of the comparisons.
#    uniq � helps to report repeated lines and the -c option helps to prefix lines according to the number of occurrences.

awk '!a[$0]++' file
# Remove duplicate lines without sorting 'file'. $0 means whole line in awk. 'a' is an array. So print if not in array.

awk 'length < 140' quotes.txt
# Print only the lines from quotes.txt that are shorter than 140 characters.

#####
# How to get started using awk
################################

# awk, sed, and grep are three of my favorite tools in the Linux or UNIX command line. They are all pretty powerful. Today we’ll look at how to get cracking with awk to help you ease into using it. Then we’ll look at some useful awk one liners to make things a bit more fun for you. AWK is a programming language designed for processing text-based data, either in files or data streams. It was created at Bell Labs in the 1970s. Although it’s quite old, don’t get fooled by it’s age. It is extremely powerful and efficient at what it does. Let’s get our hands dirty now. Before we delve into the complex working and usage of awk let’s get you started on it’s basics. We’ll create and use a dummy file for this exercise. You can use pretty much any text file, such as a log from your system. I will be using an sample output from one of my favorite system monitoring tools – Dstat. Here’s the output: 
# This is an ideal output for awk to handle. awk is great with comma or tab separated content. You’ll see why soon. So either create some similar data or copy and paste my example the above into a dummy file called something like test.txt. Launch a terminal window on your Linux computer. Almost all flavors of Linux ship with awk. In case you have found one that does not have it for some reason please install it. On the terminal window type the following from the directory where you have stored the test.txt file –

awk {‘print’} test.txt
# The output should contain the entire contents of the text file. What’s the fun in that.

# Now let’s see how you can pick a column and print just that one. Execute the following command:
awk {‘print $1′} test.txt

# Now we are asking awk to print just the first column of the text file. It will automatically figure out that the file is a tab separated one and print just the first column of the contents. You should see something like this in the output:
	—-total-cpu-usage—-
	usr
	5
	13
	8
	0
	1
	1
	1
	0
	1
	1

# You can do the same for any column you like. If you want awk to print the third column change command above shown command to:
awk {‘print $3′} test.txt
# You can also have awk print multiple columns. So if you want the first, third, and seventh columns printed add them to the command separated by commas.

awk {‘print $1, $3, $7′} test.txt
# would do the trick for you:

	—-total-cpu-usage—- -net/total-
	usr idl read
	5 93 154k
	13 87 0
	8 92 0
	0 99 0
	1 97 0
	1 98 0
	1 99 0
	0 99 0
	1 99 0
	1 100 0

# If you have a trickier file like the /etc/password file where the data is separated by colons rather that spaces or tabs, awk doesn’t pick that up automatically. In such cases you can feed awk with the correct separator. Use a command like this to print the second column of the file:

awk -F’:’ {‘print $1′} /etc/passwd

# This command will give you an output of the usernames of all the users on your system:

	apple
	mango
	banana
	watermelon
	kiwi
	orange

# You can do the same with any other type of separators. You can also use awk to parse your log files. For example, if you want to view all the IP addresses and the related web URLs that have been accesses on your web server you can use awk to parse your web server’s access log to get this information. Use the following command:

awk ‘$9 == 200 { print $1, $7}’ access.log

	199.63.142.250 /2008/10/my-5-favourite-hangouts/
	220.180.94.221 /2009/02/querious-a-mysql-client-for-the-mac/
	67.190.114.46 /2009/05/
	173.234.43.110 /2009/01/bicycle-rental/
	173.234.38.110 /wp-comments-post.php

# Using parsing like this you can figure out if someone is visiting your website a lot, as they may be stealing information. You can also sort this information. Say you wanted to know how many times a particular IP address visited your website

awk ‘$9 == 200 { print $1}’ access.log | sort | uniq -c | sort -nr

	46 122.248.161.1
	35 122.248.161.2
	26 65.202.21.10
	24 67.195.111.46
	19 144.36.231.111
	18 59.183.121.71
	
	
# Addresses in sed specify which lines to act on. They can be line numbers or regular expressions.
# In awk, records are split into fields by a separator. By default fields are separated by white space.
# In awk jargon, each line of a file is a record. You can change the way awk splits files into records, but by default it splits on newlines.
# The awk variable $1 contains the first field in a record, $2 the second, $3 the third, etc. $0 contains all fields.

# awk uses -F to specify a field separator. For example, -F: would say to use a colon as the field separator.

# sed and awk both use -f <filename> to specify a file of commands.

# The -n option tells sed to only print output when told to by the p command.
# Perl has a utility a2p to convert awk scripts into perl scripts.

# The awk variable NR contains the current record number.
# The awk variable NF contains the number of fields in the current record.

# Awk can use a pair of regular expressions as a range, just like sed. In both languages, /foo/,/bar/ matches the same lines.

awk uses -F to specify a field separator. For example, -F: would say to use a colon as the field separator.

# In awk, records are split into fields by a separator. By default fields are separated by white space.
# In awk jargon, each line of a file is a record. You can change the way awk splits files into records, but by default it splits on newlines.

awk '!(NR % 10)' file
# Take every 10th line from a file: 

awk NF
# Delete blank lines

awk 'length > 64'
# Print only the lines that are 65 characters in length or longer

awk '{ total = total + NF }; END { print total+0 }'
# Awk script to count the number of words in a file

a2p
# Perl has a utility a2p to convert awk scripts into perl scripts.

awk '{ total = total + NF }; END { print total+0 }'
# Awk script to count the number of words in a file

awk '!(NR % 10)' file
# Take every 10th line from a file

awk 'length > 64'
# Print only the lines that are 65 characters in length or longer

awk '!a[$0]++' 
# Remove duplicate lines:

/foo/,/bar/
# Awk can use a pair of regular expressions as a range, just like sed. In both languages, /foo/,/bar/ matches the same lines.

#==============================#
# CMD AWK
#==============================##==============================#
awk -F, '((37.19 < $7 && $7 < 37.23) && (-115.81 < $8 && $8 < -115.73))' gpsdata.csv
#Print lines from file where GPS coords are in range.

awk '{for (i=2;i<=13;i++) {printf "%d-%02d,%d\n",$1,i-1,$i}}' data-table.txt
#Convert values from x=year,y=month table to linear CSV.

awk -F'","' '{print $3}' data.csv
#Use a multi-character field separator to get field 3 out of a CSV file that uses double quoted fields.

awk -F, '{sqrt($4^2)}' data.csv
#Get the absolute value of the 4th column of numbers using the square root of the square trick.

awk '$9!~/^[23]/{print $4}' access_log | cut -c1-12 | uniq -c
# Show the number of UNsuccessful requests per day. (Not HTTP code 2XX or 3XX)

awk '{a[$1] += $10} END {for (h in a) print h " " a[h]}' access_log | sort -k 2 -nr | head -10
# Display top bandwidth hogs on website.

awk '$10==404 {print $7}' access_log
# Print out the file requested for CLF log entries with HTTP 404 status code.

awk -F: {'print $1 ":" $2'} messages |uniq -c
# Count syslog hits per minute in your messages log file. Useful for doing quick stats.

awk '{if (t!=$0){print;t=$0}}' file
# Remove duplicate lines *without sorting*. Low memory version. That !a[$0]++ construct can get big.

awk '{ print substr($0, index($0,$3)) }' mail.log
# Print all from 3rd field to end of line. Very useful for log parsing.

awk '{print $4}' apache_log|sort -n|cut -c1-15|uniq -c|awk '{b="";for(i=0;i<$1/10;i++){b=b"#"}; print $0 " " b;}'
# Request by hour graph.

awk and sed are like the fuck and shit of the "unix language". Very powerful and can be used nearly anywhere in a sentence.

awk '$9 == "404" {print $7}' access.log |sort|uniq -c|sort -rn| head -n 50
# list top 50 404 in descending order.

|gawk '{printf("%s %s\n",strftime("%Y-%m-%d_%T", $1),$0)}' 
# Prefix each line with the local time based on epoch time in 1st column.

awk '{a[$1] += $10} END {for (h in a) print h " " a[h]}' access_log | sort -k 2 -nr | head -10 
# Display top bandwidth hogs on website.

awk '{sum+=$1;n++;if (n==3){print sum/3 "\t" $0;n=0;sum=0}}' garage-temp.log 
# Print the running average of the last 3 temp values in front

awk '$9!~/^[23]/{print $4}' access_log | cut -c1-12 | uniq -c 
# Show the number of UNsuccessful requests per day. (Not HTTP code 2XX or 3XX)

awk '!(NR % 10)' file
# Take every 10th line from a file

awk NF
# Delete blank lines

awk 'length > 64'
# Print only the lines that are 65 characters in length or longer

awk '!a[$0]++' 
# Remove duplicate lines

awk -F\\t '/^#fields/{for (i=2;i<=NF;i++){bro[$i]=i-1}} {print $bro["ts"] FS $bro["id.resp_h"]}' http.log 
# Bro_IDS cols by name in awk

awk 'a[$1 $2]++ < 5' /var/log/syslog 
# Print the first 5 log lines from each day in syslog. 

awk 'length > max { max=length;maxline=$0 } END { print maxline; }' quotes.txt 
# Print the longest line in quotes.txt 

# awk-tips.
awk '{sub($1 FS,"" );print}' 		# => remove 1st field and take the remaining.
awk '{print $NF}' 					# => return last field
awk '{print ( $(NF-1) )}' 			# => return 2nd to last field

awk '{a[$1] += $10} END {for (h in a) print h " " a[h]}' access_log | sort -k 2 -nr | head -10 
# Display top bandwidth hogs on website.

awk '/itdance.gif/{sum+=$10} END { print sum }' access_log 
# Sum up the bandwidth consumed by requests for matched lines. BTW, it was 35GB.

awk '$6' scan.txt 
# Print the line if it has a 6th column of text in it.

awk 'NR%2' data.txt 
# Print the odd numbered lines of the file data. Lines 1, 3, 5, etc. if you do not know what odd means. ;-)

awk '{a[$1] += $10} END {for (h in a) print h " " a[h]}' access_log | sort -k 2 -nr | head -10 
# Display top bandwidth hogs on website.

awk '{print $4}' apache_log|sort -n|cut -c1-15|uniq -c|awk '{b="";for(i=0;i<$1/10;i++){b=b"#"}; print $0 " " b;}' 
# Request by hour graph.

awk '$9 == "404" {print $7}' access.log |sort|uniq -c|sort -rn| head -n 50 
# list top 50 404's in descending order.

awk -F: {'print $1 ":" $2'} messages |uniq -c 
# Count syslog hits per minute in your messages log file. Useful for doing quick stats.

# Compare 2FA log's user and device columns to see what numbers users are using to see if there any suspicious additional numbers in use.
awk -F, '{print $2 "," $8}' 2FAlog.csv | tail -n+2 | sort | uniq -c | sort -k2 

# Take a list of scores in the last column and put them in the first column so they can be sorted to rank (b/c varied column numbers due to spaces), then create a rank number using nl. TIMTOWTDI
awk '{print $NF " " $0}' sci2019.txt | sort -nr | nl 

# Check log files for access from IPs for which there was NOT a successful login. This won't work on Apache httpd though b/c it logs unsuccessful too.
awk '$3!="-"{print $1}' access_log.2019-* |sort|uniq > ~/login-ips.txt ; grep -v -F -f ~/login-ips.txt access_log.2019-* |less -S 

#=======================================================================================================
# AWK File Parsing One Liners
#=======================================================================================================

gawk '{ print $1 }' tabfile.txt | sort | uniq > firstcolvals.txt
# Get a sorted list of values from the first column of a tab-separated file

gawk 'BEGIN { FS="--" } ; { print $1 }' data.txt | sort | uniq > firstcolvals.txt
# Get a sorted list of column values from a file with fields split by dashes

gawk 'BEGIN { FS="\t" } ; { print $2 "\t" $1 }' data.txt  | sort -nr > sorted.txt
# Reverse fields in a tab-separated file and sort it by the numeric field

gawk 'BEGIN { FS="\t" } ; { print $2 "\t" $1 }' data.txt  | sort -nr > sorted.txt
# Reverse fields in a tab-separated file and sort it by the numeric field

gawk '{ freq[$0]++ } ; END { sort = "sort --key=2" ; for (word in freq) printf "%s\t%d\n", word, freq[word] | sort ; close(sort) }' allprops.txt
#

gawk 'BEGIN { FS="--" } ; { freq[$1]++ } ; END { sort = "sort" ; for (word in freq) printf "%s\t%d\n", word, freq[word] | sort ; close(sort) }' data.txt
# Extract the first field, collect the count of it, and output the word, counts sorted by name

gawk 'BEGIN { FS="--" } ; { freq[$1]++ } ; END { sort = "sort --key=2 -nr" ; for (word in freq) printf "%s\t%d\n", word, freq[word] | sort ; close(sort) }' data.txt
# Extract the first field, collect the count of it, and output the word, counts sorted by count

> I'm currently doing a paper in HTML, which I've done before, using
> Emacs, html-helper-mode, and Netscape. However _this_ paper needs to
> have endnotes. I can do this also, via two-way anchors, though hints
> toward automating the process (especially the numbering) would
> certainly be accepted. The problem is that, if I rearrange the text, I
> need to renumber the endnotes, which gets messy. Accordingly I (for
> now) simply tag my endnote text, e.g.
>  
> [[Endnote text here.]]
>  
> and then, when I need to output, I number and embed the anchors, move
> the text, and nuke the tags as needed (in a copy of the file with the
> tagged notes).
>  
> 
> This appears suboptimal :-) However, it beats StarOffice 5.2: while it
> worked, it nicely handled the endnote renumbering, but then it began
> to GPF whenever I inserted a new endnote. So: is there a better way to
> do endnotes in HTML in Emacs? Or another tool one would recommend?
> (Please, no non-freeware M$ products: I don't have a license, and
> won't be able to get one until after the paper's due.)
>  
>Please reply directly to me as well as the list/group, and TIA,
>[email protected]

   Tom, here's an idea that works. The following paragraph will show
you how it's done.[##] First, select a string which will *never* occur
in your file. The string can be one character or longer; it will be
used to generate the sequential[##] endnote digits in the output file.
The string will be used two times: once for the note numbers in the
text, and once for the corresponding note numbers in the endnotes.
[[
##. I must give credit to this idea where it's due. It's taken
entirely from Eric Meyer (author of VDE, a DOS text editor). He also
wrote a program called WSNOTE using a system similar to the one I
describe here. WSNOTE, now issued as freeware, was designed for
WordStar files and has special features for that word processor.

##. This will be endnote reference 2. It corresponds to the word
"sequential" in the text paragraph above, whose in-text pound signs
will be replaced by the numeral 2.
]]

   Below each paragraph of body text (ideally, on the line following
the paragraph), add the endnote references, properly spelled out. Put
them in the same order as the notes for that paragraph. In editing the
source file, you no longer need to worry about moving the "[[Endnote
text]]" to the end of the file yourself. Awk will do it for you.

   You can rearrange paragraphs, cut-and-paste, and edit more quickly.
Your only concern is that the body text has the same number of note
symbols[##] in it as the "[[Endnote text block]]" has. The Endnote
block should be placed immediately below that paragraph. In my system,
the "[[" and "]]" stand alone on a line. It makes it easier to move
whole lines that way.
[[
##. By "note symbols", I mean the in-text placeholders, which will
later be converted to incrementing numbers. I've placed them in square
brackets, but brackets or braces aren't necessary.
]]

   Go ahead and edit your thesis. When done, save the file, and run
this awk script on it.[##] From a shell prompt (or a DOS command
prompt), use a command-line that looks something like this:
[[
##. Binaries for GNU awk v3.0.6 for Win32 are available here:

   ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/gwk306b.zip

]]

        awk -f endnote.awk infile >outfile

Here is the awk script "endnote.awk", written in haste for this
particular task. It outputs to plain text.

# Filename: endnote.awk
#  Version: 1.0
#   Author: Eric Pement - [email protected]
#     Date: 13 Dec. 2000
#  Purpose: To convert in-text notes and references to endnotes
# Requires: GNU awk; blank lines between paragraphs of input file
#   Output: Plain ASCII text. Can be modified for HTML.

BEGIN { a=b=0; str = "" }  # initialize some variables

/^\[\[$/,/^]]$/ {          # handles all endnote references
  if ( /^\[\[$/) next
  sub( /^]]$/, "")
  gsub(/\#\#/, ++a)
  str = str $0 "\n"
  next
}

{                          # handles all in-text note numbers
  gsub(/\#\#/, ++b)
  print
}

END {
  print "--------------------\nENDNOTES:\n"
  print str
  print "[end of file]"
  if ( a != b ) {          # error handling. "\a\a" beeps the console
    print "\a\a\n",
"Your in-text numbers don't match your endnote numbers!" > "/dev/stderr"
  }
}
#---end of script---

   When the script is done running, the newly created outfile will
have incrementing in-text note numbers (1, 2, 3...) and at the bottom
of the file, you will see a section looks like this:

     --------------------
     ENDNOTES:

only flush left, not indented like here. Following that will be the
references themselves, all neatly gathered together.

   As I said, this is a pretty basic awk script. The only
error-checking it does is to make sure that you have the same number
of references in the body of your paper as you have in the Endnotes
section of the paper. It does not check for mismatched '[[' or ']]'
brackets, so if you forgot to close the brackets, then you can expect
to see hunks of your paper in the Endnote section.

   I usually write my papers in plain ASCII with Emacs and then
convert to HTML later on. If you want to accommodate this script to
print HTML codes with hyperlinks, that's easily done (in two places).
Change:

   gsub(/\#\#/, ++a)

to

   gsub(/\#\#/, "<B><A HREF=\"#T" ++a "\" NAME=\"E" a "\">" a "</A></B>")

and also change

   gsub(/\#\#/, ++b)

to
   gsub(/\#\#/,
   "<SUP><SMALL><A HREF=\"#E" ++b "\" NAME=\"T" b "\">" b "</A></SMALL></SUP>")

   I know, it looks pretty hard to read, but the end result is to
create clickable note references, so that clicking on the in-text
reference (which appears as a raised superior figure) will take you to
the corresponding number in the Endnote section. And vice versa.

   Truth be told, I did appreciate Kai's suggestion that a file
written in Texinfo can be converted to plaintext and also HTML, but I
find it a bit daunting to learn texinfo right now. I prefer something
that I can have more control over, and the awk script does that for
me. Please let me know if you have found another solution, or if the
script I'm sending you needs some tweaking.

Kind regards,

Eric Pement
             ENDNOTE.AWK and ENDNOTE.PL (version 1.3)

      A Practical Method for Handling Endnotes in Text Files
                          by Eric Pement

   Writers who work in plain ASCII text sometimes need footnotes
in their files. While editing, it's nice to move, add, or delete
paragraphs with notes without manually renumbering all the
footnotes. Commercial word processors like Microsoft Word or
Corel Word Perfect have an easy method of handling notes in which
deletions or rearranging text is no problem. The notes are
automatically renumbered as needed. Is such a system available to
users of vim, Emacs, Vedit, VDE, PFE, TDE, EditPad, Notepad,
NoteTabs, or other ASCII editors?

   Yes! If you have Perl or Awk available, you can use ENDNOTE.PL