Grepping by date

Q I've set up a Cron job that runs a Bash script to check a certain server's health - its disk space, load and so on - then email the results to me. The script uses the grep command to find a log file and cat to output lines containing a certain string. All of this is working well, but I only want grep to tell me about the last few weeks or days of the log file that contains my string, not everything from its creation. Can you tell me if there's any way of doing this?

A You can use awk to extract the date from the start of each line and convert it to a Unix timestamp that can be compared with your start point. This is expensive in processing time, so run this after you have used grep to save resources:

#!/bin/sh
DAYS=7
FROM=$(($(date +%s) - 86400 * $DAYS))
grep whatever /var/log/messages | while read LINE; do
DATE=$(date -d "$(echo $LINE | awk '{print $1, $2, $3}')" +%s)
if [[ $DATE -gt $FROM ]]; then
dowhatyouwantwith $LINE
fi
done

What this does is set a cutoff date based on the value you set for DAYS (86,400 is the number of seconds in a day). Then it reads each line of the grep output and uses awk to grab the first three items separated by spaces of the line. The standard syslog line format is

Jun 26 12:30:37 zaphod dhcpcd[4037]: eth0: renewing lease of 192.168.1.1

so the awk command returns Jun 26 12:30:37. The echo | awk section is enclosed in $(...), which runs the command between the brackets and substitutes its output before running the command containing it. With this example line, that command would become:

DATE=$(date -d "Jun 26 12:30:37" +%s)

The %s tells date to return the date as standard Unix time, seconds since 1 Jan 1970, and the outer $(...) set means this value is passed to DATE=, which is set to 1246015837 here. Now we simply compare this with the FROM value to see if this log entry is more recent and process it if it is.

You can also use backticks instead of $(...) for command substitution, but we use $(...) for two reasons. Firstly, it's more readable. Secondly, you can nest it, which you can't do with backticks.

You could make this slightly more readable (at the expense of making it marginally slower) by separating the two commands like so:

DATESTR=$(echo $LINE | awk '{print $1, $2, $3}')
DATENUM=$(date -d "$DATESTR" +%s)
if [[ $DATENUM -gt $FROM ]]; then

Pick whichever suits you best.

Back to the list