Automate backups in crontab

Q Back up, back up, back up is the first rule of the trade. Besides my business, I run a small home web server and tend to back up my files on it as often as possible. I use rsync on my backup machine:

rsync -avz myserver:var/www /backup

What I would like to do is schedule the process so that it runs every four hours or so. Any advice is welcome - my data needs you!

A Well, we know that scheduling tasks in Linux is possible, because, as most of us have found to our cost, the /tmp directory automagically deletes files with tmpwatch at defined intervals - often before their usefulness has expired! So let's explore how that is implemented. First we'll find the tmpwatch configuration file:

[root@carve ~]# rpm -ql tmpwatch
/etc/cron.daily/tmpwatch
/usr/sbin/tmpwatch
/usr/share/man/man8/ tmpwatch.8.gz

It looks like the only file in /etc is there under cron.daily. Let's see what's in there:

[root@carve ~]# cat /etc/cron.daily/tmpwatch
/usr/sbin/tmpwatch -x /tmp/.X1      1-
unix -x /tmp/.XIM-unix -x /tmp/
font-unix          -x /tmp/.ICE-unix -x /
tmp/.Test-unix 240 /tmp
/usr/sbin/tmpwatch 720 /var/tmp
for d in /var/{cache/man,catman}/
{cat?,X1    1R6/cat?,local/cat?}; do
if [ -d "$d" ]; then
/usr/sbin/tmpwatch -f 720 $d
fi
done

If you checked the man pages for tmpwatch, you'd find no switches for setting a time or an interval for running it, so obviously something else is driving this program every so often. Let's query the rpm database for the owner of the parent directory:

[root@carve ~]# rpm -qf /etc/cron.
daily/
crontabs-1.10-7

A quick search on the web tells us that crontabs are actually a bunch of files that allow us to run commands at an hourly, daily or monthly interval. Ah, we're getting somewhere. On closer inspection of the files provided by the same package, we get:

[root@carve ~]# rpm -ql crontabs
/etc/cron.daily
/etc/cron.hourly
/etc/cron.monthly
/etc/cron.weekly
/etc/crontab
/usr/bin/run-parts

Looking at the code, /usr/bin/run-parts is just a script that makes those intervals work with crond just by adding a script in the first four directories in the listing above. run-parts does not in any way make scheduling possible, which leaves us with /etc/crontab.

[root@carve ~]# cat /etc/crontab
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/
# run-parts
01 * * * * root run-parts /etc/cron.
hourly
02 4 * * * root run-parts /etc/cron.
daily
22 4 * * 0 root run-parts /etc/cron.
weekly
42 4 1 * * root run-parts /etc/cron.
monthly

The listing now makes it obvious that our answer is in those encrypted-looking lines just after the variable assignments. Let's inspect the first line: 01 * * * * root run-parts /etc/cron.hourly. A quick rummage in the manual pages provides us with an explanation. The first entry on the line is the hour at which a command should be run, the second entry is the minute, the third is the day of the month, the fourth is the month and the fifth is the day of the week. Any interval can be defined providing we use the * character to tell cron (the running daemon that's configured via /etc/ crontab) that it should run the command we want at every iteration of the field type - in other words, every second or every day. The sixth entry (root) is the user under which the command should be run. The seventh and last field is the command, which in the example above is run-parts /etc/cron.hourly.

Let's go through the rest of the crontab entries to further understand the format of /etc/ crontab. In the line 02 4 * * * root run-parts /etc/cron.daily, it tells crond that it should run run-parts /etc/cron.daily every 02 minutes when the hour is 4, no matter what day of the month, month or day of the week it is, as the user root. So run-parts /etc/cron.daily runs daily at 4:02. Moving on to the next line, 22 4 * 0 root run-parts /etc/cron.weekly, we can tell already that run-parts /etc/cron.weekly will run as the user root at 4:22 no matter what day of the month or month it is, at day 0 of the week. Or, translated, run-parts /etc/cron.weekly will run at 4:22 on Sunday. A description of the values you can use in each field is available in section five of the crontab manual pages. You should read it, as some fields start with 0, others with 1.

Now, let's set up a cron job for you. You said you want the job to be run every four hours no matter what day or month it is. Fortunately, crontab's hour field accepts not only defined values, but also ranges and lists of values. Ranges are defined as <start>- <end>: lists are defined as a group of comma-separated values, such as 'value1, value2, value3...'. So the command you want to run can be implemented with the following lines of code:

* 0,4,8,12,16,20 * * * root rsync -avz myserver:var/www /backup

Alternatively, you may use a step-type hour definition:

* 0-20/4 * * * root rsync -avz myserver:var/www /backup

This is an instruction to go through the integers from 0 to 20, incrementing the counter by four each time, which would equate to exactly the hour definition above. Alternatively, you could use an 'under-privileged' or non root crontab, which can be done with the userspace tool crontab.

Back to the list