Automating Rsync Backups to Amazon EC2

A while back I wrote about how to perform incremental backups via rsync to an Amazon EC2 instance. The script worked great when run manually from a Python interpreter but I always ran into issues when trying to automate the script via cron. I finally took some time to hammer out all of the automation issues and fixed some other bugs along the way. With the new fixes in place, I now have my VPS automatically backed up nightly via cron, all for about $1/month!

For the backup script including the latest updates, take a look at my Backup to AWS EBS via Rsync and Boto how-to. I’ve documented the problems encountered and how I fixed them below.

Preserve File Ownership in the Backup

I learned that the remote rsync process must run as root in order to preserve file ownership. This is accomplished by adding –rsync-path=”sudo rsync” to the rsync command. However, EC2′s Amazon Linux AMI does not allow this by default because it requires a tty (terminal) to run a sudo command. The solution was to add a line to the script that ssh’s into the EC2 instance, forces allocation of a pseudo-tty, then appends “Defaults !requiretty” to /etc/sudoers.

Maintain Proper Directory Structure in the Backup

Another issue I discovered was that my backup was getting created with directories like /home/home/username/ instead of /home/username/. The solution to this was to simply ensure that all of my rsync destinations contained a final trailing slash.

Connection Denied for SSH

This was the error I saw that was preventing me from running the script via cron. Putting the script to sleep for 60 seconds after attaching the volume fixed this.

Terminate the EC2 Instance when Finished

The original script stopped the EC2 instance rather than terminating it. The EC2 instance is only needed for the rsync after which it should be terminated in order to avoid extra fees from AWS! The backed up data will remain on the detached EBS volume even after the instance is terminated.

Backup to AWS EBS via Rsync and Boto

Update 5/15/2013:

  • Preserve file ownership in the backup
  • Maintain proper directory structure in the backup
  • Added workaround for SSH “connection denied” message
  • Terminate the EC2 instance when finished
  • Added section on automation

Overview

Amazon Web Services Elastic Block Storage provides cheap, reliable storage—perfect for backups. The idea is to temporarily spin up an EC2 instance, attach your EBS volume to it and upload your files. Transferring the data via rsync allows for incremental backups which is very fast and reduces costs. Once the backup is complete, the EC2 instance is terminated. The whole process can be repeated as often as needed by attaching a new EC2 instance to the same EBS volume. I backup 8 GB from my own server weekly using this method. The backup takes about 3 minutes and my monthly bill from Amazon is around $1.

Setup

  1. If you don’t already have one, create an account with AWS.
  2. Take note of your access key. You will need to place it in the script in order to connect to the AWS EC2 API.
  3. Create an Amazon EC2 key pair. You need this to launch and connect to your EC2 instance. Download the private key and store in on your system. In my example, I have the private key stored at /home/takaitra/.ec2/takaitra-aws-key.pem
  4. Create an EBS volume in your preferred zone (location). Make sure it is large enough to store your backups.
  5. Create a security group called “rsync” that allows connections on two inbound TCP ports: 22 (for SSH) and 873 (for rsync).
  6. Ensure a recent version of Python and Boto are installed on your system. In Debian, this is accomplished by running the command ‘apt-get install python-boto’

The Script

The below script automates the entire backup process via boto (A Python interface to AWS). Make sure to configure the VOLUME_ID, ZONE and BACKUP_DIRS variables with your own values. Also update SSH_OPTS to point to the private key of your EC2 key pair. <aws access key> and <aws secret key> need to be filled in on line 19.

#!/usr/bin/env python

import os
from boto.ec2.connection import EC2Connection
import time

IMAGE           = 'ami-3275ee5b' # Basic 64-bit Amazon Linux AMI
KEY_NAME        = 'takaitra-key'
INSTANCE_TYPE   = 't1.micro'
VOLUME_ID       = 'vol-########'
ZONE            = 'us-east-1a' # Availability zone must match the volume's
SECURITY_GROUPS = ['rsync'] # Security group allows SSH
SSH_OPTS        = '-o StrictHostKeyChecking=no -i /home/takaitra/.ec2/takaitra-aws-key.pem'
BACKUP_DIRS     = ['/etc/', '/opt/', '/root/', '/home/', '/usr/local/', '/var/www/']
DEVICE          = '/dev/sdh'

# Create the EC2 instance
print 'Starting an EC2 instance of type {0} with image {1}'.format(INSTANCE_TYPE, IMAGE)
conn = EC2Connection('<aws access key>', '<aws secret key>')
reservation = conn.run_instances(IMAGE, instance_type=INSTANCE_TYPE, key_name=KEY_NAME, placement=ZONE, security_groups=SECURITY_GROUPS)
instance = reservation.instances[0]
time.sleep(10) # Sleep so Amazon recognizes the new instance
while not instance.update() == 'running':
    time.sleep(3) # Let the instance start up
time.sleep(10) # Still feeling sleepy
print 'Started the instance: {0}'.format(instance.dns_name)
# Get the updated instance
reservations = conn.get_all_instances()
reservation = reservations[0]
instance = reservation.instances[0]

# Attach and mount the backup volume
print 'Attaching volume {0} to device {1}'.format(VOLUME_ID, DEVICE)
volume = conn.get_all_volumes(volume_ids=[VOLUME_ID])[0]
volumestatus = volume.attach(instance.id, DEVICE)
while not volume.status == 'in-use':
    time.sleep(3) # Wait for the volume to attach
    volume.update()
time.sleep(60) # Still feeling sleepy
print 'Volume is attached'
os.system("ssh -t -t {0} ec2-user@{1} \"sudo mkdir /mnt/data-store && sudo mount {2} /mnt/data-store && echo 'Defaults !requiretty' | sudo tee /etc/sudoers.d/rsync > /dev/null\"".format(SSH_OPTS, instance.public_dns_name, DEVICE))

# Rsync
print 'Beginning rsync'
for backup_dir in BACKUP_DIRS:
os.system("rsync -e \"ssh {0}\" -avz --delete --rsync-path=\"sudo rsync\" {2} ec2-user@{1}:/mnt/data-store{2}".format(SSH_OPTS, instance.dns_name, backup_dir))
print 'Rsync complete'

# Unmount and detach the volume, terminate the instance
print 'Unmounting and detaching volume'
os.system("ssh -t -t {0} ec2-user@{1} \"sudo umount /mnt/data-store\"".format(SSH_OPTS, instance.dns_name))
volume.detach()
while not volume.status == 'available':
    time.sleep(3) # Wait for the volume to detatch
    volume.update()
print 'Volume is detatched'
print 'Terminating instance'
instance.terminate()

Automation

Follow these steps in order to automate backups to Amazon EC2. The steps may vary slightly depending on which distro you are running.

  1. Save the script to a file without a file extension such as “rsync_to_ec2″. Cron (at least in Debian) ignores scripts with extensions.
  2. Configure the script as explained above.
  3. Make the script executable (chmod +x rsync_to_ec2)
  4. Check that the script is working by running it manually (./rsync_to_ec2). This may take a long time if this is your initial backup.
  5. Copy the script to /etc/cron.daily/ or /etc/cron.weekly depending on how often you want the backup to run.
  6. Profit!

VelocityEmail – Easy E-mail for Java using Templates

A common requirement for a Java web application is that it be able to send e-mail in response to certain events. A couple simple examples of this requirement are a web form that sends a confirmation e-mail to the user and sending automated messages when exceptions occur. These e-mails must often incorporate dynamic content. In the web form example, content from the form submission could be included in the confirmation e-mail.

VelocityEmail extends the Email class from Commons Email to provide an elegant solution to this problem. The simplicity of Commons Email makes constructing and sending the e-mail a breeze while Velocity templates allow the content to be completely flexible and dynamic. The most important benefit of this approach is that it allows separation of the e-mail content from the project code, greatly increasing maintainability.

Download

The latest release is VelocityEmail 1.0

Binary
Source
Javadoc

How to Use VelocityEmail

  1. Create HTML and/or plaintext Velocity templates for your email. By
    default, you can reference any fields of your soon-to-be merged javabean
    like $bean.fieldName. Place the templates in your project’s source folder or
    wherever they’ll be available on the classpath.
  2. Create an instance of VelocityEmail, passing the the template filename to
    the constructor and initialize it as you would normally initialize an
    HtmlEmail. 

     VelocityEmail email = new VelocityEmail(templateName);
     email.setHostName(smtphost);
     email.setFrom(fromAddress);
     email.setTo(toAddress);
  3. Merge the javabean with the template by using the appropriate
    setHtmlMsg() and setTextMsg() methods. 

     email.setHtmlMsg(javabean)

    OR, if you have multiple javabeans, you may create your own VelocityContext and pass that in instead.

     VecocityContext context = new VelocityContext();
     context.put("bean1", javabean1);
     context.put("bean2", javabean2);
     email.setHtmlMsg(context);
  4. Send the email.
     email.send();

List Authors Version 2.0 Released

Today I released version 2.0 of the List Authors WordPress widget. The underlying code has in fact existed for quite some time in the form of a patch submitted to WordPress to enhance the abilities of the wp_list_authors template tag. My hope was that the patch would make it into version 3.0 of WP after which I could update my widget to use it. That never happened so I have instead added a custom version of the wp_list_authors function to List Authors plugin.

After several months of tweaking and re-tweaking my patch in response to comments from various WP devs, I was confident that the patch was production-ready. The feature request has been open for over a year with a working patch for six months–at this point, I doubt it will ever be released. Maybe there are legitimate performance concerns with the patch (there aren’t any problems I’m aware of), or maybe the features are just too low in priority. The patch was my attempt at contributing back to an open-source project. After this experience, I have to say I’m a bit disappointed in the whole process. Maybe I’ll try my hand at it again, but it will be with a project other than WordPress.

In the meantime, I hope those using my List Authors widget enjoy the new features of version 2.0.

PHPhotoNotes

My plan to create a PHP/MySQL implementation of my PhotoNotes script was completed sooner than expected thanks to the requests and encouragement I’ve received on the project page. This was the perfect opportunity for me to teach myself some AJAX: creating, updating, and deleting notes all happens without a page reload.

Better error handling is something I will work on in a future release. Right now, the user will not see any errors occurring from the PHP script. This is an area that needs to mature for AJAX in general. I skipped through a couple AJAX books at the book store and one didn’t even have the word “error” or “exception” in the table of contents!

Life Changes: Engagement and New Job

It has been an exciting and hectic month to say the least. Talisha and I had our 3 year anniversary dinner on February 3 at La Belle Vie. I’ve known for some time that she is the perfect girl for me–Talisha truly understands me, has a great sense of humor (because it’s like mine) ;), is a loving and caring person, and is in every way gorgeous to boot. My proposal to her was long overdue but I wanted to make sure it was special and memorable! In my head, this meant a romantic moment during a trip somewhere exotic but, with our vacations being few and far between of late, I had to make do. On our anniversary night at a fancy restaurant, however, “make do” is the wrong term to use.

I did my best to not act nervous during dinner. There was no doubt that I was popping the question that night because I had planned ahead of time to have the waiter bring out the engagement ring with the dessert. When the moment came, I was a little too eager to drop down on my knee–Tally hadn’t even had time to notice the ring on the dessert plate! I did achieve my goal of completely surprising her, however.

Now, the fun of wedding planning begins. What’s the date, budget, where, who to invite, what food and beverages, what kind of invites, etc, etc, etc? It’s so overwhelming that I’m almost avoiding thinking about it until I finish planning our road trip (more to come on that) and begin my job at St. Thomas in a couple of weeks.

That’s right, after almost four years at Cargill I am moving on. It has truly been an awesome and valuable experience at my current company and I am sorry to be leaving. I will especially miss my coworkers but hope to stay in touch with them. A few weeks ago, I interviewed with the University of St. Thomas for a web developer position. Those who know me will also know that web development is a passion of mine that I’ve had at least since high school. Although I have kept current through side projects like developing a WordPress plugin, the range of web technologies that I could learn about and work with has been limited for the last couple of years. When I was offered the UST position, I couldn’t pass up the opportunity to get back into the field. Another huge consideration was the generous tuition remission for myself and Talisha because getting back into school is very important to both of us. My final day at Cargill is next Friday and I will start my new job the following Monday.

PhotoNotes

My next coding side-project is another WordPress plugin. After posting some photos of my breadboard, I decided it would be nice to have a way to add notes to those photos. The current solution is Mbedr which I’m not happy with because it requires that the photos be hosted on Flickr and makes a call to Mbedr’s web server.

Take a look at the PhotoNotes project page to see my progress so far. It’s not yet saving any changes to the database but having some working display code means that I’m half way there.

Enhance WordPress wp_list_authors Template Tag

As can be seen by some of the comments left on the site, a couple of frequently requested features cannot be added to the List Authors widget because they are not supported by the underlying wp_list_authors template tag. The WordPress code for wp_list_authors needs to be changed to enable these new features.

Feature 1: Allow an upper limit to the number of authors listed. Some WordPress sites have hundreds or even thousands of contributors. wp_list_authors currently would list all of them and, worse, execute an SQL statement for each author. The proposed fix could be a non-negative integer option named “count_limit.”

Feature 2: Allow sorting of the author list. This could be an option named “sort_by” with the values “name” and “post_count.”

Today I submitted a patch to WordPress Trac including both of the above features. It also changes the code to use a JOIN statement to get the author post count instead of running an SQL statement for every author. This was a necessity to prevent inconsistencies when applying the LIMIT and ORDER BY. One of the core developers is reviewing it now and, if I’m lucky, the change will be included in the next major release (version 3.0). If that happens, you can be sure I will be updating the List Authors widget to make use of the new options.

A less frequent request but one I’ve seen is for an “exclude” option with the ability to filter out certain categories of authors. There is a separate ticket for this and I might consider submitting a patch for this next.

Breadboarding an Alarm Clock

LCD Clock on Breadboard
I actually started this a couple of months back, but now that I have something working to the point where I’m pretty excited about it, I’m ready to share my pet project. Back in college, one of my favorite classes was my small electronics class. Learning about half-adders and resistors was okay but the really fun part was using embedded programming to interface a micro-controller to the outside world. Ever since then, I’ve wanted to build some sort of embedded project of my own. I finally motivated myself to build something I’ve always wanted–an alarm clock that works the way I want it to.

Here are the features of my ideal alarm clock:

  • Accurate to within a minute per month. Even better would be to synchronize to some time source.
  • Shows date, time and day of the week (so after a rough night, I know for sure if it’s a workday or not =D )
  • Easily visible during the day and at night.
  • Time adjustment allows adding or subtracting hours and minutes. I hate missing the correct minute on my current alarm clock and having to hit the set button 59 more times.
  • Alarm can be enabled/disabled according to the day of the week. Do I need an alarm on weekends? No. Am I so lazy that I don’t want to turn the alarm on and off every weekend? Yes.

Here are the items needed for breadboarding. This will be slightly different than the final bill of materials for the finished clock due to the breadboard power supply.

Note on LCD display: I purchased my display off of Ebay for $5 and it works great. If you decide to use SparkFun’s display, keep in mind that it requires a resistor in series when you supply power to the backlight. The resistor is not included in my schematic because my display has the resistor built in.

LCD Clock Breadboard

The notes on the image should give you an idea of what the components look like and what they are for. You will need to study the schematic to hook everything up correctly. If you are feeling lost at this point, read through the first two SparkFun tutorials then stop back. They will walk you through setting up the breadboard’s power supply and loading code onto an ATmega168.

LCD Clock Schematic

LCD Clock Schematic

If you’ve gotten this far, you’re probably interested in the code behind the alarm clock. It is available here. Keep in mind this is not a finished product yet so there are still some features missing. The RTC (real time clock) code is fairly solid though.

I already have a PCB layout ready in Eagle which I’m going to send out to get printed. I’m really looking forward to migrating from the breadboard to the finished product. I’ll make sure to post the result!