There’s an old adage about backups:
There are two kinds of people: people who’ve never lost data, and people who’ll never lose data again.
If you’ve ever experienced data loss, you will instantly become passionate about backups. To prevent bad experiences with your data, you want backups that are comprehensive, manageable, versioned, automated, and secure. Let’s break that down:
- comprehensive: They should include everything by default. It’s certainly legitimate to exclude OS directories, temp files, etc. but you don’t want a system where you have to manually add directories as you add applications and data. Inevitably, you’ll forget something and not know it until it’s too late.
- manageable: If you have a 1TB server and take a full backup every day and retain them for a month, that’s 30TB in a month. You need a system that allows for regular pruning.
- versioned: If you have a system that simply copies everything from system A to system B once a night, that’s better than nothing, but on Monday you trash a file and don’t notice it until Thursday, you can’t recover.
- automated: Because humans are lazy.
- secure: It’s annoying to be hacked. It’s heartbreaking to find the hacker also destroyed your backups.
In this tutorial, we’ll show you how to setup backups using rsnapshot. Quoting rsnapshot.org:
rsnapshot is a filesystem snapshot utility based on rsync. rsnapshot makes it easy to make periodic snapshots of local machines, and remote machines over ssh. The code makes extensive use of hard links whenever possible, to greatly reduce the disk space required.
This means if you have 500MB of files, you want to retain 30 days’ backups, and your change rate is 10% over that period, you don’t need 30 * 500 = 15,000MB but rather only 550MB. Beautifully, you still have point-in-time recovery (depending on your backup schedule) throughout that period.
In this tutorial, we’ll setup the following:
- server1.lowend.party has a directory called /data with lots of valuable files.
- We want to back it up to backup.lowend.party using a scheme of hourly/daily/weekly/monthly backups. These are stored in /backups/server1.lowend.party
- backup.lowend.party has other hosts it backs up as well.
- We’re using passwordless ssh keys for authentication so we can run everything out of cron.
Before we start, there’s one more key concept.
I’ve long advocated pull backups. In other words, the backup server comes along and backs up the client. In this scenario, backup.lowend.party initiates the backups and contacts server1.lowend.party to get the data. This is in contrast to push backups, where server1.lowend.party contacts backup.lowend.party and pushes the backups to it.
What’s the difference? Imagine server1 is hacked. If we’re using push backups, it would be trivial for the hacker to use the passwordless ssh keys to nuke the backups as well. In a pull-based model, backup.lowend.party can authenticate to server1, but not vice-versa, so the hacker is out of luck.
On Debian, it’s as easy as
apt install rsnapshot
rsnapshot’s config lives in /etc/rsnapshot.conf. I recommend making a backup of it before you start changing things:
mv /etc/rsnapshot.conf /etc/rsnapsnap.conf.default
There are different philosophies about how to setup rsnapshot configs. I prefer to have a separate config file for each client (system being backed up). If you only have one system to backup, this is not necessary. You can backup multiple systems in one config file, but you lose some flexibility. Experiment and decide which you like. In my case, I do this:
cp /etc/rsnapshot.conf.default /etc/rsnapshot.conf.server1
Now modify as follows. Important Note: rsnapshot.conf requires TABs between elements. So “cmd /usr/bin/ssh” is “cmd<TAB>/usr/bin/ssh”.
Enable remote backups:
Add these backup intervals:
interval hourly 6
interval daily 7
interval weekly 4
interval monthly 3
I’m using a passwordless ssh key stored in /root/.ssh/backup. I also use a different ssh port. So make this change:
ssh_args -p 8989 -i ~/.ssh/backup
These two commands are for reporting (see below):
rsync_long_args --stats --delete --numeric-ids --relative --delete-excluded
Now I tell rsnapshot where to save backups:
Finally, I add the backup definition:
backup firstname.lastname@example.org:/data/ .
This will keep files in /backups/server1.lowend.party/hourly.0, etc.
I want to exclude /data/cache on my backups:
And in that file I put:
OK, we’re ready to go. Now because I’m not using the default /etc/rsnapshot.conf name, I need to use the -c parameter for all rsnapshot commands. Let’s start by testing the config:
root@backup:/etc# rsnapshot -c /etc/rsnapshot.conf.server1 configtest
Now we can run a simulation:
root@backup:/etc# rsnapshot -c /etc/rsnapshot.conf.server1 -t hourly
echo 9633 > /var/run/rsnapshot.pid
mkdir -m 0755 -p /backups/hourly.0/
/usr/bin/rsync -a --stats --delete --numeric-ids --relative --delete-excluded
--exclude-from=/etc/rsnapshot.server1.exclude --rsh=/usr/bin/ssh -p 8989
-i ~/.ssh/backup email@example.com:/data/
One more thing to do. I like to use rsnapshot’s reporting tool, so let’s enable it:
cp /usr/share/doc/rsnapshot/examples/utils/rsnapreport.pl /usr/local/bin
chmod 755 /usr/local/bin/rsnapreport.pl
We’re good to go!
On server1, I have 547MB in /data, and 30MB in /data/cache which will be excluded:
root@server1:~# du -sm /data
root@server1:~# du -sm /data/cache
Let’s run our first rsnapshot backup:
root@backup:/backups/server1.lowend.party# rsnapshot -c /etc/rsnapshot.conf.server1 hourly
Setting locale to POSIX "C"
echo 10012 > /var/run/rsnapshot.pid
mkdir -m 0755 -p /backups/server1.lowend.party/hourly.0/
/usr/bin/rsync -av --stats --delete --numeric-ids --relative
--rsh=/usr/bin/ssh -p 8989 -i ~/.ssh/backup
receiving incremental file list
Number of files: 10,982 (reg: 10,980, dir: 2)
Number of created files: 10,982 (reg: 10,980, dir: 2)
Number of deleted files: 0
Number of regular files transferred: 10,980
Total file size: 518,702,282 bytes
Total transferred file size: 518,702,282 bytes
Literal data: 518,702,282 bytes
Matched data: 0 bytes
File list size: 611,123
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 208,691
Total bytes received: 519,874,481
sent 208,691 bytes received 519,874,481 bytes 80,012,795.69 bytes/sec
total size is 518,702,282 speedup is 1.00
rm -f /var/run/rsnapshot.pid
/usr/bin/logger -p user.info -t rsnapshot(10012) /usr/bin/rsnapshot -c
/etc/rsnapshot.conf.server1 hourly: completed successfully
Now I can also run that using the rsnapshotreport.pl script we setup. If I do, the output will look like this (the TOTAL MB is a little different because I ran these at different times):
# rsnapshot -c /etc/rsnapshot.conf.server1 hourly | /usr/local/bin/rsnapshotreport.pl
SOURCE TOTAL FILES FILES TRANS TOTAL MB MB TRANS LIST GEN TIME FILE XFER TIME
server1.lowend.party:/data/ 11982 1 564.81 46.10 0.001 seconds 0.000 seconds
Now if I continue running hourly backups, I see new directories being created in /backups/server1.lowend.party:
drwxr-xr-x 3 root root 4096 Jul 12 16:03 hourly.0
drwxr-xr-x 3 root root 4096 Jul 12 16:01 hourly.1
drwxr-xr-x 3 root root 4096 Jul 12 15:58 hourly.2
Interestingly, hourly.0 is 500-odd MB, will the rest are only 1MB. Why? Because hourly.1, hourly.2, etc. are simply hard links back to hourly.0. This is a huge space savings.
If I nuke some files on server1’s /data and run another couple backups, you’ll see this:
root@backup:/backups/server1.lowend.party# du -sm *
rsnapshot is retaining data in hourly.1 because it’s needed to reconstruct the backups for that hour.
Setting up automated backups is as easy as putting jobs in cron. For example:
0 * * * * root /usr/bin/rsnapshot -c /etc/rsnapshot.conf.server1 hourly 2>&1 | /usr/local/bin/rsnapreport.pl
0 3 * * * root /usr/bin/rsnapshot -c /etc/rsnapshot.conf.server1 daily 2>&1 | /usr/local/bin/rsnapreport.pl
0 3 * * 1 root /usr/bin/rsnapshot -c /etc/rsnapshot.conf.server1 weekly 2>&1 | /usr/local/bin/rsnapreport.pl
30 2 1 * * root /usr/bin/rsnapshot -c /etc/rsnapshot.conf.server1 monthly 2>&1 | /usr/local/bin/rsnapreport.pl