Virtualization, the dark side

The race to virtualize everything has created a host of unintended consequences, not the least of which is how to meet the SLAs (service level agreements) for application backup. As we move into cloud alternatives this problem will only grow since your cloud provider will have to provide this to you on an application by application basis. Every virtual machine is essentially a set of large files such as VMDKs in a VMware context. These large files are typically stored in storage arrays which can be connected via iSCSI or Fiber Channel  or on NFS volumes. Traditional data protection techniques such as VMware's VADP, or VMware VCB rely on an agent to protect VMDK files associated with virtual servers. Typical steps are as follows:
  1. Pause the virtual servers to get a consistent set of VM files.
  2. Use the agent to read the VM image files from the data stores.
  3. Copy the image files to a backup disk target.
  4. Release the Virtual Servers for normal operations.
Streaming methods that that use these steps to move image files from the data store to backup disk for protection are the problem. For environments with ever shrinking backup windows, there is simply not enough time or bandwidth to move all the VM data. Even if the infrastructure is available to copy all this data, it places a tremendous burden on the application layer as the data is read. A large part of the benefit from virtualization is to increase capability with less effort.  We don't seem to be going in that direction as we pay "Paul" with "Peter's" savings. Also, the ease of deploying virtual machines along with automated application high availability processes is creating virtual sprawl, making it time consuming to keep track of applications that need to be backed up. There is a huge risk that VMs can be created and never backed up or that they will be moved by H/A tools and have gaps in backup.  With many virtual machines to keep track of, the manual method is simply not viable. We need to rethink traditional data protection techniques and Symantec has done much in this area. Data protection SLAs need minimal front end impact and connote rely on the legacy methods of streaming data from production to the backend. An effective solution is to take a snapshot of the data and perform the backup from that snapshot.  This capability is built into Netbackup's Enterprise Client and releases the virtual machines to focus on the user demand with minimal interruption. It also provides for a granular restore capability alleviating the need to restore an entire VMDK  it get a particular file. Now what about the problem of finding the right VMs to back up for a particular SLA? For automatic virtual machine selection, NetBackup uses "query rules" to determine which VMware virtual machines to select for a particular backup policy.  You create the rules in "Query Builder' on the "Clients" tab in a policy. A query rule consists of the following:
  • A keyword, such as Displayname or Datacenter (many keywords are available).
  • For example: For automatic selection of the virtual machines with the display names that contain certain characters, you need the Displayname keyword in the rule.
  • An operator, such as Contains, StartsWith, or Equal.
  • The operator describes how NetBackup analyzes the keyword. For example: Displayname StartsWith tells NetBackup to look for the display names that start with particular characters.
  • Values for the keyword.
  • For the Displayname keyword, a value might be "prod". In that case, NetBackup looks for the virtual machines that have the display names that include the characters prod.
  • An optional joining element (AND, AND NOT, OR, OR NOT) to refine or expand the query.
The policy uses these elements to discover and select virtual machines for backup. Now backup administrators can create policies based on SLAs and identify by naming convention the VMs that are covered by the SLA and be assured that the data for a given application will be backed regardless of where the application is hosted within the virtual environment.  If you are administering a cloud data center this becomes even more valuable.