2008/08/01

Awk and other text processing tips

awk is great for working with data that is in several columns. 

How to sum the third column?

e.g., calculate total tps across all physical disks from iostat -d output:

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 3.73 30.38 67.80 4864950 10857636
sdb 3.82 30.39 71.74 4866793 11488576
sdc 0.00 0.05 0.00 7208 8


iostat -d |egrep "sd.\ " | awk 'BEGIN {x=0} {x+=$2} END {print x}'

or, less elegantly,

iostat -d| egrep "sd.\ " | awk 'BEGIN {ORS=""}; {print $2"+"}' | ( cat; echo 0)|bc

How to grab just certain columns?

How do I use awk to print the first column, and then the third through the end, for example to grab just the fields I want from an apache log file?

awk '{ print $1" " substr($0, index($0,$6)) }' /var/log/httpd/access_log*

gives us something like

10.95.10.20 "POST /license/associateproduct.php HTTP/1.1" 200 8 "-" "Java/1.6.0_17"
10.95.14.248 "POST /license/authorize.php HTTP/1.1" 200 84 "-" "PycURL/7.19.5"