A while back I wrote about how to perform incremental backups via rsync to an Amazon EC2 instance. The script worked great when run manually from a Python interpreter but I always ran into issues when trying to automate the script via cron. I finally took some time to hammer out all of the automation issues and fixed some other bugs along the way. With the new fixes in place, I now have my VPS automatically backed up nightly via cron, all for about $1/month!
For the backup script including the latest updates, take a look at my Backup to AWS EBS via Rsync and Boto how-to. I’ve documented the problems encountered and how I fixed them below.
Preserve File Ownership in the Backup
I learned that the remote rsync process must run as root in order to preserve file ownership. This is accomplished by adding –rsync-path=”sudo rsync” to the rsync command. However, EC2’s Amazon Linux AMI does not allow this by default because it requires a tty (terminal) to run a sudo command. The solution was to add a line to the script that ssh’s into the EC2 instance, forces allocation of a pseudo-tty, then appends “Defaults !requiretty” to /etc/sudoers.
Maintain Proper Directory Structure in the Backup
Another issue I discovered was that my backup was getting created with directories like /home/home/username/ instead of /home/username/. The solution to this was to simply ensure that all of my rsync destinations contained a final trailing slash.
Connection Denied for SSH
This was the error I saw that was preventing me from running the script via cron. Putting the script to sleep for 60 seconds after attaching the volume fixed this.
Terminate the EC2 Instance when Finished
The original script stopped the EC2 instance rather than terminating it. The EC2 instance is only needed for the rsync after which it should be terminated in order to avoid extra fees from AWS! The backed up data will remain on the detached EBS volume even after the instance is terminated.