- Large VM disk size resulting in normal SolusVM migration taking many hours
- VM cannot afford any long downtime due to criticality
I. PREPARE DESTINATION NODE AND VM
1. In destination node, create an exactly identical VM to the source VM. Especially important is the disk size, which must exactly match the source disk size. All other specifications should also match as closely as possible to the source VM, except for the IP address, which must be a different temporary IP address.
2. Boot the newly created destination VM to ensure that it is able to boot and no problems occur.
3. Shutdown the newly created destination VM. Make sure it is always shutdown throughout the rest of the process.
4. Make sure the source node is able to SSH to the destination node as root, preferably via SSH keys, as follows:-
- SSH to source node, copy contents of /root/.ssh/id_rsa.pub to destination node's /root/.ssh/authorized_keys ( if .ssh does not exist, create using ssh-keygen -t rsa ).
- Test by SSH from source node to destination node as root. In this example, source node is "server03", and destination node is "server06". From server03:-
# ssh -v -l root -p 22222 server06
( You should be logged in to server06 without having to key in any passwords ).
II. IMAGE SNAPSHOT OF SOURCE VM TO DESTINATION VM
1. In destination node, view the destination VM configuration file, to obtain the allocated LV. In a SolusVM node, perform the following:-
- Open destination VM in SolusVM, and take note of the VM ID ( in this example, ID: vm911 ).
- SSH to destination node, and open the VM configuration file corresponding to the VM ID. In this example:-
# cd /home/xen/vm911
# cat vm911.cfg
- Find the 'disk' configuration line, and locate the LV allocated for the destination VM, as shown in this example:-
disk = ['phy:/dev/vg_server06/vm911_img,xvda1,w', 'phy:/dev/vg_server06/vm911_swap,xvda2,w']
^^^^^^^^^^^^^^^^^^^^^^^^^^
- In this example, the allocated LV is /dev/vg_server06/vm911_img . Take note of this as the "Destination LV".
2. In source node, view the source VM configuration file, to obtain the allocated LV. In a SolusVM node, perform the following:-
- Open source VM in SolusVM, and take note of the VM ID ( in this example, ID: vm578 ).
- SSH to source node, and open the VM configuration file corresponding to the VM ID. In this example:-
# cd /home/xen/vm578
# cat vm578.cfg
- Find the 'disk' configuration line, and locate the LV allocated for the source VM, as shown in this example:-
disk = ['phy:/dev/VolGroup00/vm578_img,xvda1,w', 'phy:/dev/VolGroup00/vm578_swap,xvda2,w']
^^^^^^^^^^^^^^^^^^^^^^^^^
- In this example, the allocated LV is /dev/VolGroup00/vm578_img . Take note of this as the "Source LV".
3. In the source node, create an LV snapshot for the "Source LV" :-
# lvcreate -s -n /dev/VolGroup00/vm578_snap -L 10G /dev/VolGroup00/vm578_img
# lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
LogVol00 VolGroup00 -wi-ao 46.88G
vm578_img VolGroup00 owi-ao 320.00G
vm578_snap VolGroup00 swi-a- 10.00G vm578_img 10.08 <--- The newly created snapshot
vm578_swap VolGroup00 -wi-ao 9.78G
....
4. Now you are ready to image the source LV (in source node) to the new LV (in destination node) via SSH. Make sure to do this in a screen session it will take hours depending on the size of the LV. In source node:-
# screen
# dd if=/dev/VolGroup00/vm578_snap | ssh -p 22222 server06 "dd of=/dev/vg_server06/vm911_img"
- NOTE: For 'server06' in SSH parameters, it is recommended to use the destination node's internal interface IP address
- This will run silent for hours, you need to run 'top' on both nodes to ensure 'dd' process are running, or check cacti graphs for active bandwidth use on both nodes. During the process, all operations in the source node and VM should be up and running as normal.
- After some hours, the imaging process will finally end as follows:-
671088640+0 records in
671088640+0 records out
343597383680 bytes (344 GB) copied, 37518.1 seconds, 9.2 MB/s
671088640+0 records in
671088640+0 records out
343597383680 bytes (344 GB) copied, 37518.6 s, 9.2 MB/s
- Ensure no errors occur. If any, fix it and restart the mirroring process.
5. Once done, ensure successful imaging while preparing for next step, by mounting the destination LV. In the destination node:-
# mount /dev/vg_server06/vm911_img /home/xen/vm911/mnt
# df -h
.....
/dev/mapper/vg_server06-vm911_img315G 218G 81G 73% /home/xen/vm911/mnt
# cd /home/xen/vm911/mnt
- Look around the mount point to ensure roughly the same files as the source VM.
6. If imaging confirmed successful, remove the snapshot VM in the source node:-
# lvremove VolGroup00/vm578_snap
Do you really want to remove active logical volume vm578_snap? [y/n]: y
Logical volume "vm578_snap" successfully removed
Now you should have 2 identical VMs, in both source and destination nodes, with the following situation:-
- The source VM is still running and operational with no downtime, with constant data changes as normal
- The destination VM is shut down, not operational, and holds a snapshot of the source VM data.
III. REGULAR SYNC OF SOURCE VM TO DESTINATION VM
1. Ensure the destination node is able to SSH into the source VM (not node!) smoothly with RSA key pairs.
- SSH to destination node, copy contents of /root/.ssh/id_rsa.pub to source VM's /root/.ssh/authorized_keys ( if .ssh does not exist, create using ssh-keygen -t rsa ).
- Test by SSH from destination node to source VM as root. In this example, destination node is "server06", and source VM is "vm2". From server06:-
# ssh -v -l root -p 22222 vm2
( You should be logged in to vm2 without having to key in any passwords ).
2. Prepare a simple batch script ( stored as /usr/local/scripts/sync-vm.sh in destination node ), to be used as regular syncing of live data from the current operational source VM to the dormant destination VM:-
----- begin script -----
#!/bin/sh
# Change this according to node type and VM config locations
node_type=xen # xen/kvm
vm_home=/home/$node_type
rsync=/usr/bin/rsync
ssh=/usr/bin/ssh
ssh_port=22222
rsync_args="-ave '$ssh -p $ssh_port' --exclude='backup/' --exclude='sys/' --exclude='proc/' --delete --force"
if [[ -z "$1" || -z "$2" ]]; then
echo "Usage: $0 <Live Source VM IP Address (internal IP recommended)> <Dormant Destination VM ID>"
echo
exit
fi
src_vm_host=$1
dst_vm_id=$2
dst_vm_data=$vm_home/$dst_vm_id/mnt/
if [ ! -d "$dst_vm_data/home" ]; then
echo "ERROR: Destination VM data ( $dst_vm_data ) does NOT exist. Make sure it is typed and mounted properly."
echo
exit
fi
echo "-------------------------------------------------------------------"
echo "Begin $src_vm_host VM Sync to $dst_vm_data: "`date`
echo "-------------------------------------------------------------------"
eval $rsync $rsync_args $src_vm_host:/ $dst_vm_data
echo "-------------------------------------------------------------------"
echo "End $src_vm_host VM Sync to $dst_vm_data: "`date`
echo "-------------------------------------------------------------------"
echo
----- end script -----
3. In the destination node, in a screen session, test run this script to ensure correct operations ( it should sync the currently running source VM in source node, to the dormant source VM in destination node ).
# screen
# /usr/local/scripts/sync-vm.sh
Usage: /usr/local/scripts/sync-vm.sh <Live Source VM IP Address (internal IP recommended)> <Dormant Destination VM ID>
# /usr/local/scripts/sync-vm.sh vm2 vm911
-------------------------------------------------------------------
Begin vm2 VM Sync to /home/xen/vm911/mnt/: Ahd Sep 13 11:58:26 MYT 2015
-------------------------------------------------------------------
...... (some file transfers)
-------------------------------------------------------------------
End vm2 VM Sync to /home/xen/vm911/mnt/: Ahd Sep 13 12:58:26 MYT 2015
-------------------------------------------------------------------
4. If the script completes successfully, schedule the script to run a few times during off hours in /etc/crontab :-
# vim /etc/crontab
0 19,0,5 * * * root /usr/local/scripts/sync-vm.sh vm2 vm911 >> /var/log/sync-vm2.log 2>&1
:wq
- In this example, sync-vm.sh is scheduled to run daily at 7pm, 12am, and 5am (off hours for the VM). Adjust according to the relative off hours of the customer.
5. From time to time, you can monitor the sync operations by :-
# tail -f /var/log/sync-vm2.log
IV. PREPARATION FOR SWITCHOVER TO DESTINATION VM
1. Ensure the destination node is able to SSH into the source node smoothly with RSA key pairs.
- SSH to destination node, copy contents of /root/.ssh/id_rsa.pub to source node's /root/.ssh/authorized_keys ( if .ssh does not exist, create using ssh-keygen -t rsa ).
- Test by SSH from destination node to source node as root. In this example, destination node is "server06", and source node is "server03". From server06:-
# ssh -v -l root -p 22222 server03
( You should be logged in to server03 without having to key in any passwords ).
2. Prepare pre-switchover sync script as below, save as /usr/local/scripts/sync-vm-offline.sh in destination node. This script is similar to the online sync script above, but the difference is both source and destination VMs are shutdown during the sync.
----- begin script -----
#!/bin/sh
# Change this according to node type and VM config locations
node_type=xen # xen/kvm
vm_home=/home/$node_type
rsync=/usr/bin/rsync
ssh=/usr/bin/ssh
ssh_port=22222
rsync_args="-ave '$ssh -p $ssh_port' --exclude='backup/' --exclude='sys/' --exclude='proc/' --delete --force"
if [[ -z "$1" || -z "$2" ]]; then
echo "Usage: $0 <Source node IP Address (internal IP recommended)> <Dormant Source Destination VM ID> <Dormant Destination VM ID>"
echo
exit
fi
src_node_host=$1
src_vm_id=$2
dst_vm_id=$3
src_vm_data=$vm_home/$src_vm_id/mnt/
dst_vm_data=$vm_home/$dst_vm_id/mnt/
if $ssh -p $ssh_port $src_node_host test -d $src_vm_data/home; then
echo "OK: Source VM data ( $src_vm_data ) exists and mounted."
echo
else
echo "ERROR: Source VM data ( $src_vm_data ) does NOT exist. Make sure it is typed and mounted properly."
echo
exit
fi
if [ ! -d "$dst_vm_data/home" ]; then
echo "ERROR: Destination VM data ( $dst_vm_data ) does NOT exist. Make sure it is typed and mounted properly."
echo
exit
fi
----- end script -----
3. In destination node, test run the script to ensure runnable and no errors:-
# /usr/local/scripts/sync-vm-offline.sh
Usage: /usr/local/scripts/sync-vm-offline.sh <Source node IP Address (internal IP recommended)> <Dormant Source Destination VM ID> <Dormant Destination VM ID>
4. Repeat the test run, this time with the actual arguments (Since source VM not prepared yet, it should throw error, but it should verify script ready to be run):-
# /usr/local/scripts/sync-vm-offline.sh server03.nocser.net vm578 vm911
ERROR: Source VM data ( /home/xen/vm578/mnt/ ) does NOT exist. Make sure it is typed and mounted properly.
- If you get the above error, you are OK and ready to switchover.
V. SWITCHOVER TO DESTINATION VM
1. When you are ready for switchover, schedule a 1-hour downtime for the VM with the customer, and remove all scheduled synchronization from crontab:-
- Remove the sync-vm.sh line corresponding to the VM that will be switched over:-
# vim /etc/crontab
#0 19,0,5 * * * root /usr/local/scripts/sync-vm.sh vm2 vm911 >> /var/log/sync-vm2.log 2>&1 <---- REMOVE THIS LINE
:wq
2. At the start of scheduled downtime, shutdown the source VM in SolusVM. Make sure both source and destination VMs are shutdown.
3. In source node, recall from your notes, the LV allocated to the source VM (Source LV).
- In this example, the allocated LV is /dev/VolGroup00/vm578_img .
4. After double checking that source VM is shutdown and not running, mount the Source LV:-
# mount /dev/VolGroup00/vm578_img /home/xen/vm578/mnt
# df -h
.....
/dev/mapper/VolGroup00-vm578_img 315G 218G 81G 73% /home/xen/vm578/mnt
# cd /home/xen/vm578/mnt
- Look around the mount point to ensure roughly the same files as the previously online source VM or the current dormant destination VM.
5. Now you are ready for the final sync before the switchover. SSH to destination node and run the sync-vm-offline.sh script you initially prepared, as follows:-
# /usr/local/scripts/sync-vm-offline.sh server03.nocser.net vm578 vm911
- This time, the script will run and copy the latest changes from source VM to destination VM.
-------------------------------------------------------------------
Begin server03.nocser.net:/home/xen/vm578/mnt/ VM Sync to /home/xen/vm911/mnt/: Isn Sep 14 10:48:26 MYT 2015
-------------------------------------------------------------------
..... some file transfers
-------------------------------------------------------------------
End server03.nocser.net:/home/xen/vm578/mnt/ VM Sync to /home/xen/vm911/mnt/: Isn Sep 14 12:48:26 MYT 2015
-------------------------------------------------------------------
6. Once done, unmount both source and destination VMs:-
- In source node:-
# umount /home/xen/vm578/mnt
# df -h
- Ensure no more vm578 mounts
- In destination node:-
# umount /home/xen/vm911/mnt
# df -h
- Ensure no more vm911 mounts
7. In SolusVM, open source VM and take note of current VM main IP and assigned IPs. Then remove them. Replace with a single temporary main IP.
8. Then, still in SolusVM, open destination VM and remove all current IP allocations, and replace with the exact same IP allocations as the source VM previously. Boot the destination VM. It should boot the VM on the new node. Verify boot process OK, all allocated IP addresses operational, and all services running OK.
9. Leave source VM dormant until confident enough to remove (after VM running stable for some time in new node).
DONE.
-----
References:-
http://www.serveradminz.com/blog/?p=862
https://github.com/janeczku/vps-musings/wiki/Backup-or-clone-KVM-guest
https://www.linode.com/docs/migrate-to-linode/disk-images/copying-a-disk-image-over-ssh/