Backup automated using rdiff-backup

One day your blog, code or pretty much anything may crash, and sadly, your most valuable information could be irredeemably lost ! Consider the consequences if this ever happens (touch wood!). Pictured them? Scary, right? Now, just imagine how relaxed you would have been instead, if only you’d bothered to make a backup.

Today I’m going to show you my personal backup method. I use the awesome rdiff-backup tool which combines an incremental backup with a mirror.
You can read more about this tool on the official page.

What is it?
rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup.

Installation

rdiff-backup is available in the most important linux distribution. In my case, I’m using an ArchLinux distributions (Manjaro) and the yay package (Yet another Yogurt — An AUR Helper written in Go) to install the tool.

yay rdiff-backup

If you use another distribution, this software can also be installed:

apt-get install rdiff-backupyum install rdiff-backup

Using rdiff-backup

Making backups is very easy when you using rdiff-backup. You may picture this tool as similar to the cp command. In other words, rdiff-backup has two arguments:

  • source directory.
  • target directory.

Both directories can be local or remote disk. For example, if you want to use rdiff-backup in a local directory you would use the following command:

In the same way, if any of the directories are in a remote server, you need only to indicate the path using the classic way: user@server::PATH. The following commands show how either the remote or local servers can be used in both the source and target directories:

When using these commands, the remote machine will probably request the user’s password (for the previous commands, carloscaballero and luisgarcia respectively). You can omit this step by configuring an SSH Key-Based Authentication on a Linux Server.

The real power of this tool is truly appreciated when wanting to restore the information. If you list the contents of the directory in which you made your copy, you will see the contents that you’d previously copied, and futhermore, you will find a directory named rdiff-backup-data. This directory is very important, since it stores the incremental backups of our data.

In this directory, the contents shown consist of the last version of our backup, plus the incremental copies, which are stored in the rdiff-backup-data/increments directory.

Now imagine that I’ve created a file called file1.txt which contains a single sentence. A copy is done using rdiff-backup and, a few minutes after another copy is done. Now, we shown the list of files in our system wich is the following:

You may note that the file file1.txthas an incremental copy in the increments directory.

Restoring backups

We can restore a copy with the rdiff-backup comand, or by directly using the cp command, since the copy is neither compressed, nor has any of its metadata altered. Therefore, the files are in the same state as when they were copied. Although, you may use the cp command, the rdiff-backup tool is better to use, due to the data restoration being more flexible.

The use of the command for restoring backups is similar to the one to make the backup, with the added the option of (restore-as-of, -r) , as well as the timestamp to restore. The timestamp is very flexible, since the acceptible time strings are intervals, like "3D64s"; w3-datetime strings, like "2002-04-26T04:22:01-07:00" (strings like "2002-04-26T04:22:01" are also acceptable - rdiff-backup will use the current time zone); or ordinary dates like 2/4/1997 or 2001-04-23 (various combinations are acceptable, bearing in mind that the month must always precede the day).

For example, the following command restores the copy made on 23 January 2010.

As you already know, the rdiff-backup command makes an incremental backup, which entails a large amount of space disk being consumed. Therefore, it is highly recommended to remove old backups (as long as you have other, more recent backups, of course).

The rdiff-backup tool has the remove-older-than option, which removes any backups older than that the date used in the argument. A good example is removing any backups older than 1 year:

Filter Options

Most of the time, we are required to include o exclude files to our backup. The most common options which can be used in the rdiff-backup are:

**- include.

  • include-file-list
  • exclude.
  • exclude-file-list**

As well as these, there are plenty more filter options to make our backups, such as:

In this example we exclude /mnt/backup to avoid an infinite loop, even though rdiff-backup can automatically detect simple loops like the one above. This is just an example, in reality it would be important to exclude /proc as well.

There may be a time when we need information about the backup (metadata). rdiff-backup allows us to obtain this information. The most common options for this are the following:

  • list-increments
  • list-changed-since
  • list-at-time
  • compare
  • compare-at-time

Since they are quite descriptive, it isn’t hard to imagine what the goal of each of the different options is. Despite this, I will show several examples applying each of them:

Using in cron

A good practice is automating the backups in our system. To do this, we may use the cron service.

Prior to using cron, we must remember to make sure that the script used in cron doesn’t output anything, otherwise:

  • cron will assume there is an error
  • if there is any error, you will not be able to see it

The command which we used in our script is the following:

The content of the files_backup.txt file is the following:

+ /root/ghost - **

It is important to know that both success and error logs are saved in the same logfile, named rdiff-backup.log. Another interesting point is that I've used the filter option include-globbing-filelist which allows the use of a file as argument. This file contains the directories which will be backed up by using the string + or - to express that said directory must be either included or excluded. Note that the backups older than 1 year are deleted to perserve disk space.

Finally, edit the cron file using the crontab -e command.

0 1 * * * sh /root/rdiff-backup-configuration/rdiff-backup.sh

Conclusions

In this post I’ve explained the rdiff-backup tool, which allows us to make incremental backups. I've also shown you the script I use to backup my projects, which is executed by cron one time a day.

Originally published at www.carloscaballero.io on January 25, 2019.

Hi! My name is Carlos Caballero and I’m PhD. in Computer Science from Málaga, Spain. Teaching developers and degree/master computer science how to be experts!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store