Site-to-Site Replication & Disaster Recovery
If you are running mission-critical applications, you cannot afford to let your services go offline if a hardware failure occurs or if the server building loses power.
Vapor's Site-to-Site Asynchronous Replication allows you to automatically mirror virtual machines from a primary Data Center (DC) host to a secondary Disaster Recovery Center (DRC) host.
1. Backups vs. Replication: What is the difference?
- Backups (Historical Archives): Backups are like keeping a box of old photo albums in your closet. If you delete a database table or contract a virus, you dig into the box and restore a copy from 3 days ago.
- Replication (Disaster Recovery): Replication is like building an exact replica of your entire house in another town. If your primary house catches fire, you immediately pack up, go to the replica house, turn on the lights, and keep living.
2. Understanding Asynchronous Replication & RPO
Vapor uses Asynchronous Replication, which operates like writing a diary:
Imagine you write in a daily diary. You want your friend in another city to have a copy of it.
- Synchronous would mean calling your friend on the phone and dictating every single word as you write it. This slows you down.
- Asynchronous means you write your diary normally at your own speed, and every evening (e.g. at 6:00 PM), you scan the new pages and email them to your friend.
RPO (Recovery Point Objective)
The RPO is the replication interval (e.g., 15 minutes, 1 hour, or 12 hours). It defines "how much data you are willing to lose in a disaster." If you replicate every 15 minutes, and your server explodes at 2:10 PM, the replica on the secondary host will have all data up to 2:00 PM. You lose a maximum of 10 minutes of written data.
3. Disaster Recovery Operations in Vapor
Vapor provides four key operations to manage disaster recovery:
A. Site Pairing
Link your primary host with a remote host in a different network or physical location.
- In the Web UI, navigate to Replication > Sites and click Pair Site.
- Provide the destination server's API URL (e.g.,
https://192.168.122.129:7770/api/v1) and security Token.
B. Planned Migration vs. Emergency Failover
When you need to activate the replica VM at the secondary site:
- Planned Migration: Used for scheduled maintenance (e.g., a power outage is scheduled at the primary site). Vapor gracefully shuts down the production VM, runs a final delta sync to ensure zero data loss, and boots the replica VM at the DR site.
- Emergency Failover (Disaster Recovery): Used when the primary server is completely offline (e.g., hardware failure). Vapor instantly boots the replica VM at the DR site using the latest successful synchronization point.
C. Disaster Recovery (DR) Drills / Test Failovers
How do you know your replica VM will actually boot up and work when a real disaster strikes? You run a DR Drill.
- Triggering a Test Failover tells Vapor to spin up a temporary clone of the replica VM inside an isolated "network bubble" at the DR site.
- You can log in, check that the application works, and run tests.
- Active replication continues running in the background. Once verified, you "Stop Drill" to automatically delete the test clone and cleanup the sandbox resources.
D. Reprotect (Reversing Replication)
Once the disaster is resolved and your primary server is repaired, you want to move back.
- The VM is now running at the DR site.
- Clicking Reprotect tells Vapor to reverse the mirror direction. It will now track changes on the DR site and replicate them back to the primary site. Once synchronized, you can perform a planned migration to return home.