Cloud application backup and OpenStack

2014-08-21

At work a few months back, was asked to look at backup for OpenStack. Specifically backup of virtual machines running in an OpenStack cloud. My prior project (backup for VMware vCloud) was wrapping up, so the timing was good, and the topic familiar.

Some initial observations.

Before going further, we need to be clear on names...

What AWS calls “instances”, OpenStack calls “instances”, or “servers”, or “hosts”. VMware, VirtualBox and KVM use the term “virtual machines” for the same purpose. In the following, I am going to use “instance” to refer to a virtual machine.

We also need to be clear on the need…

Some argue that a built-for-the-cloud application does not need traditional backup. I agree … mostly. The notion is that we rely on a robust versioned and replicated object store to preserve state, and instances are spun up to handle load, but do not have state to preserve.

For some applications, this model can work. No need for traditional backup.

For more traditional sorts of applications, the cloud model may not apply. There are lots (and lots) of existing working useful applications, built to an older model, that we might choose to spin up in a cloud. Frankly, I suspect there are many rich applications where web-scale is not needed, and the “pure” cloud application model is more trouble than it is worth. For those (many) applications, a more traditional sort of backup is needed.

So there is still a need for the backup of applications (instances) in the cloud.

Note also that storage for backups calls for a different performance profile. Backups are write-mostly, read-seldom, and multiple backups can be enormously compressed by deduplication. This profile differs from other cloud storage services.

In outline, efficient cloud application backup is simple. Note that a cloud application may consist of one or more instances, with one or more volumes (virtual disks) attached to each instance.

Given a list of instances, and a list of associated volumes…

Collect the metadata for the instances. Quiesce the instances (interrupts application activity). Snapshot all (wanted) volumes in all instances. Resume the instances (so as to minimize interruption of service). Collect changed-block lists for all volumes in all instances. Store the application/instance metadata and changed blocks into backup storage. Notify of backup completion. Note particularly that we want the quiesce/snapshot/resume sequence to be as fast as possible, to minimize any interruption of application service. Also scanning and/or copying multi-gigabyte volumes on every backup is expensive and silly. Smart volume storage can keep track of changed blocks, and enormously accelerate backups.

Note also that this is not a one-instance at a time backup!

Given the above outline, the existing OpenStack APIs for backup are … silly.

Cinder allows you to backup a volume … if the volume is not attached to an instance. Nova allows you to backup a single instance … to Glance (the object store) Add to this: in testing on “stable/icehouse” branch, if I start more than one “nova backup” in succession, only the first succeeds.

The existing OpenStack APIs for backup are not adequate.

Poking around, it looks like all the pieces are present (or nearly) for efficient backup of applications in the cloud, at scale, but … not assembled as a working whole.

The next few months should prove interesting.