Cross-platform automated backups

Today marks the last day of our backups week. Hopefully by now you’ve realized how important it is to keep updated backups of your important files, and, as this weeklong feature has hopefully demonstrated, getting a working, efficient backup system into place is anything but difficult.

So far this week we’ve discussed how to setup and use rsync on Linux, Solaris, Windows, and MacOS X to create a useful, automated backups system. We’ve focused on using a single Linux box as the rsync server, but that doesn’t have to be the case: any of these platforms (and others, as rsync has been ported to many different platforms) can be used as an rsync server with nearly identical setup to that which we’ve already explored.


In addition, we didn’t spend much time on different storage methods (for instance, keeping files on a HD vs. tape), but the great thing about rsync is the storage system can be entirely left up to you – if you want to burn DVDs of your files, rsync can help you assemble all of your files into one place, making the burning process very simple.

Finally, remember to keep in mind that no backup system is perfect. You should regularly check that your important files are indeed making it to the backup server, and, if possible, to setup some kind of monitoring system to know if backups stop working for some reason. We all know the adage: if it can break, it will – and an automated backup system such as this, if it ever broke, could be very bad news.

Today, we’re going to be wrapping up by taking a look at some of the remaining advanced features available with rsync. If you’re doing remote backups (i.e. sending files over the Internet) – and you should consider it so you don’t have all your eggs in one basket—er, building – then keeping things secure should be a priority.

Encrypting communications
Without too much trouble, rsync can use SSH to transfer data. This increases security by providing encryption to your transfers, as by default rsync sends data unencrypted, potentially allowing unauthorized parties to view data. Another benefit to using SSH is that it means one less port needs to be accessible from outside the network.

To tell rsync to use SSH as its remote shell, just include “-e ssh” in the rsync command line. For example:

rsync -vrtpog --password-file=/etc/rsync.secret --delete /home/costa costa@10.0.0.2::zeta

Just type in the user’s SSH password when prompted and rsync will connect using SSH. The problem with this is that if you’re automating rsync backups using cron (or a similar scheduler) you won’t be able to type your password. We can allow SSH connections without passwords by creating a pair of public/private keys. Although it’s somewhat less secure than requiring passwords, this method is better than including passwords in plaintext, and is still more secure than using rsync without SSH at all.

Generating public and private key files is very simple. On the machine that will be running rsync (the client machine) run the following command:

ssh-keygen -t dsa -b 2048 -f rsync-key

This will create two files, rsync-key and rsync-key.pub. When prompted for passphrases just hit enter. Be sure to use correct permissions to keep the private key away from unauthorized users (such as chmod 600). The next step is to copy the public keyfile over to the rsync server. On the server machine, create a directory called “~/.ssh” if it doesn’t already exist, as well as a file called authorized_keys inside that directory. Then add the contents of the public keyfile to authorized_keys:

cat rsync-key.pub >> .ssh/authorized_keys

Whenever you connect to the backup server with that account via SSH, you’ll notice you no longer need to enter a password – this holds true, as well, for rsync connections, which can be run without any user intervention.

Rsync Server Security
Rsyncd, the rsync daemon, provides a number of security measures that can be added in its configuration file, rsyncd.conf. These options can help stop unauthorized users from gaining access to your files. Also, all of the options discussed below are used on a per-module basis (that is, a particular profile). This allows you to have different levels of security depending on the type of backup involved.

The first option is “hosts allow”. This lets you specify a list of hosts that can access the server. If a connecting machine doesn’t match the list it will be rejected. Each pattern in the list can be one of a few forms:

- An IP address (e.g. 192.168.0.1)
- An IP address with a bitmask (e.g. 192.168.0.1/24)
- An IP address with a netmask (e.g. 192.168.0.1/255.255.255.0)
- A hostname (e.g. mirror.example.com)
- A hostname pattern with wildcards (e.g. *.example.com)

A “hosts allow” line can also be combined with “hosts deny”, which explicitly states hosts should be rejected. If “hosts allow” and “hosts deny” are used, a host that matches neither of the lists is allowed.

Another security feature in rsyncd is “auth users”. It allows you to specify a list of users that are allowed to connect. You should be careful to use this option in nearly all cases, since without it anonymous access is provided, and anyone could access your backups. The connecting clients are required to supply a password to authenticate. As we’ve seen before, these passwords are kept in the secret file, and rsync can be given the location of a password file using “–password-file” over the command line.

You should also be aware of the argument called “refuse options”. This allows you to specify rsync commands that should be refused if attempted by clients. For example, including “refuse options = delete*” would prevent clients from attempting to delete files (on the client end, we’ve been using the –delete option to keep both sides of the backup in sync, but this also means clients can delete all of the backed-up files on the server end if all files were deleted on the client end).

Other Useful Rsync Options
Beyond the basic options, rsync provides a number of other options that you might find useful depending on the situation.

-c, --checksum

This option requires checksums be done before each transfer. If the file on the receiving end has the same checksum and size, the transfer isn’t done. This can be very useful, however computing checksums for each file can be slow.

-a, --archive

This option is equivalent to -rlptgoD, and should be sufficient for most simple backups.

--max-delete=NUM

This specifies that rsync shouldn’t delete more than NUM files or directories. This can be useful to make sure you can’t accidentally delete all your backups!

-C, --cvs-exclude

This tells rsync to ignore files using the same algorithm as CVS. For example, in a development situation using this option would cause rsync to ignore C object files, core dumps, and CVS directories. You can add additional exclude patterns to $HOME/.cvsignore or use the “–exclude” option. If you’re backing up your entire hard disk, you may want to use the –exclude option to cut out a lot of extraneous (and disk-waisting) directories; in Linux, for instance, you probably shouldn’t backup /dev, and you probably don’t need to keep /lib or other system-installed stuff.

--bwlimit=KBPS

Personally I found this to be one of the most useful rsync options that I didn’t know existed. It allows you to specify the maximum transfer rate in kB/s. It’s especially useful if you have an upload limit in place, as large files can take some time to transfer, or if uploading at the maximum (when doing remote backups) causes other connections on your network to lag when accessing the Internet.

For a complete list of rsync options, check out the rsync manpage by typing “man rsync”.

Conclusion
Thanks for checking out our weeklong backups feature. Hopefully you found it informative, and you realize the importance of a regular backup system. With computer components continually falling in price, and the availability of cheap high-capacity hard drives, nearly anyone can afford to implement a backup system such as this – and anyone with important files should! Be sure to leave a comment or let us know if you found these articles useful.

  • kai

    Thanks for the writeup. I found it very usefull

  • Kendle

    Very helpful. Matches well with my situation.

    Thanks!!!

  • http://www.dracoware.com/blog/2007/04/30/backing-up-a-subversion-repository/ Backing up a Subversion repository » Draconis Software Blog

    [...] Additional Reading: Version Control with Subversion Backup Subversion FSFS with rsync Cross-platform Automated Backups [...]

  • Blah

    Where is the -e ssh in the rsync example?