Atom feed of this document
 

 Backup your nova-volume disks

While Diablo provides the snapshot functionality (using LVM snapshot), you can also back up your volumes. The advantage of this method is that it reduces the size of the backup; only existing data will be backed up, instead of the entire volume. For this example, assume that a 100 GB nova-volume has been created for an instance, while only 4 gigabytes are used. This process will back up only those 4 giga-bytes, with the following tools:

  1. lvm2, directly manipulates the volumes.

  2. kpartx discovers the partition table created inside the instance.

  3. tar creates a minimum-sized backup

  4. sha1sum calculates the backup checksum, to check its consistency

1- Create a snapshot of a used volume

  • In order to backup our volume, we first need to create a snapshot of it. An LVM snapshot is the exact copy of a logical volume, which contains data in a frozen state. This prevents data corruption, because data will not be manipulated during the process of creating the volume itself. Remember the volumes created through a nova volume-create exist in an LVM's logical volume.

    Before creating the snapshot, ensure that you have enough space to save it. As a precaution, you should have at least twice as much space as the potential snapshot size. If insufficient space is available, there is a risk that the snapshot could become corrupted.

    Use the following command to obtain a list of all volumes.

    $ lvdisplay
                            

    In this example, we will refer to a volume called volume-00000001, which is a 10GB volume. This process can be applied to all volumes, not matter their size. At the end of the section, we will present a script that you could use to create scheduled backups. The script itself exploits what we discuss here.

    First, create the snapshot; this can be achieved while the volume is attached to an instance :

    $ lvcreate --size 10G --snapshot --name volume-00000001-snapshot /dev/nova-volumes/volume-00000001
                            

    We indicate to LVM we want a snapshot of an already existing volume with the --snapshot configuration option. The command includes the size of the space reserved for the snapshot volume, the name of the snapshot, and the path of an already existing volume (In most cases, the path will be /dev/nova-volumes/$volume_name).

    The size doesn't have to be the same as the volume of the snapshot. The size parameter designates the space that LVM will reserve for the snapshot volume. As a precaution, the size should be the same as that of the original volume, even if we know the whole space is not currently used by the snapshot.

    We now have a full snapshot, and it only took few seconds !

    Run lvdisplay again to verify the snapshot. You should see now your snapshot :

                      --- Logical volume ---
      LV Name                /dev/nova-volumes/volume-00000001
      VG Name                nova-volumes
      LV UUID                gI8hta-p21U-IW2q-hRN1-nTzN-UC2G-dKbdKr
      LV Write Access        read/write
      LV snapshot status     source of
                             /dev/nova-volumes/volume-00000026-snap [active]
      LV Status              available
      # open                 1
      LV Size                15,00 GiB
      Current LE             3840
      Segments               1
      Allocation             inherit
      Read ahead sectors     auto
      - currently set to     256
      Block device           251:13
    
      --- Logical volume ---
      LV Name                /dev/nova-volumes/volume-00000001-snap
      VG Name                nova-volumes
      LV UUID                HlW3Ep-g5I8-KGQb-IRvi-IRYU-lIKe-wE9zYr
      LV Write Access        read/write
      LV snapshot status     active destination for /dev/nova-volumes/volume-00000026
      LV Status              available
      # open                 0
      LV Size                15,00 GiB
      Current LE             3840
      COW-table size         10,00 GiB
      COW-table LE           2560
      Allocated to snapshot  0,00%
      Snapshot chunk size    4,00 KiB
      Segments               1
      Allocation             inherit
      Read ahead sectors     auto
      - currently set to     256
      Block device           251:14
                

2- Partition table discovery

  • If we want to exploit that snapshot with the tar program, we first need to mount our partition on the nova-volumes server.

    kpartx is a small utility which performs table partition discoveries, and maps it. It can be used to view partitions created inside the instance. Without using the partitions created inside instances, we won' t be able to see its content and create efficient backups.

    $ kpartx -av /dev/nova-volumes/volume-00000001-snapshot
                            

    If no errors are displayed, it means the tools has been able to find it, and map the partition table. Note that on a Debian flavor distro, you could also use apt-get install kpartx.

    You can easily check the partition table map by running the following command:

    $ ls /dev/mapper/nova*
                        

    You should now see a partition called nova--volumes-volume--00000001--snapshot1

    If you created more than one partition on that volumes, you should have accordingly several partitions; for example. nova--volumes-volume--00000001--snapshot2, nova--volumes-volume--00000001--snapshot3 and so forth.

    We can now mount our partition :

    $ mount /dev/mapper/nova--volumes-volume--volume--00000001--snapshot1 /mnt
                            

    If there are no errors, you have successfully mounted the partition.

    You should now be able to directly access the data that were created inside the instance. If you receive a message asking you to specify a partition, or if you are unable to mount it (despite a well-specified filesystem) there could be two causes :

    • You didn't allocate enough space for the snapshot

    • kpartx was unable to discover the partition table.

    Allocate more space to the snapshot and try the process again.

3- Use tar in order to create archives

  • Now that the volume has been mounted, you can create a backup of it :

    $ tar --exclude={"lost+found","some/data/to/exclude"} -czf volume-00000001.tar.gz -C /mnt/ /backup/destination
                                

    This command will create a tar.gz file containing the data, and data only. This ensures that you do not waste space by backing up empty sectors.

4- Checksum calculation I

  • You should always have the checksum for your backup files. The checksum is a unique identifier for a file.

    When you transfer that same file over the network, you can run another checksum calculation. If the checksums are different, this indicates that the file is corrupted; thus, the checksum provides a method to ensure your file has not been corrupted during its transfer.

    The following command runs a checksum for our file, and saves the result to a file :

    $ sha1sum volume-00000001.tar.gz > volume-00000001.checksum
                            

    Be aware the sha1sum should be used carefully, since the required time for the calculation is directly proportional to the file's size.

    For files larger than ~4-6 gigabytes, and depending on your CPU, the process may take a long time.

5- After work cleaning

  • Now that we have an efficient and consistent backup, the following commands will clean up the file system.

    1. Unmount the volume: unmount /mnt

    2. Delete the partition table: kpartx -dv /dev/nova-volumes/volume-00000001-snapshot

    3. Remove the snapshot: lvremove -f /dev/nova-volumes/volume-00000001-snapshot

    And voila :) You can now repeat these steps for every volume you have.

6- Automate your backups

Because you can expect that more and more volumes will be allocated to your nova-volume service, you may want to automate your backups. This script here will assist you on this task. The script performs the operations from the previous example, but also provides a mail report and runs the backup based on the backups_retention_days setting. It is meant to be launched from the server which runs the nova-volumes component.

Here is an example of a mail report:

Backup Start Time - 07/10 at 01:00:01
Current retention - 7 days

The backup volume is mounted. Proceed...
Removing old backups...  : /BACKUPS/EBS-VOL/volume-00000019/volume-00000019_28_09_2011.tar.gz
     /BACKUPS/EBS-VOL/volume-00000019 - 0 h 1 m and 21 seconds. Size - 3,5G

The backup volume is mounted. Proceed...
Removing old backups...  : /BACKUPS/EBS-VOL/volume-0000001a/volume-0000001a_28_09_2011.tar.gz
     /BACKUPS/EBS-VOL/volume-0000001a - 0 h 4 m and 15 seconds. Size - 6,9G
---------------------------------------
Total backups size - 267G - Used space : 35%
Total execution time - 1 h 75 m and 35 seconds
            

The script also provides the ability to SSH to your instances and run a mysqldump into them. In order to make this to work, ensure the connection via the nova's project keys is enabled. If you don't want to run the mysqldumps, you can turn off this functionality by adding enable_mysql_dump=0 to the script.


loading table of contents...