Thursday, November 13, 2008

Steps For Data Replication

Steps to Data Replication

We must take care of the following thing before starting data replication

1. The amount of data you need to be copied


The amount of data that need to be copied will determine the bandwidth of the network required to move that amount of data. This is also the part that people don't really think about until they find out how much bandwidth they really need.

2. Available network bandwidth
If you only have a dial-up connection between sites, you may as well back up the Chevy truck and start loading tapes to be shipped to your disaster site. As a good rule of thumb, you will need about 10M bit of bandwidth for each M byte of data you need to copy per second. As an example, a T3 link can handle almost 5M bytes of data per second.
3. Distance between locations
The distance will determine what kind of remote copy solution you can use, synchronous or asynchronous. Under sync replication an I/O is not complete until it is written to both sides. This is a good thing because your transactions stay consistent. Every write written to the primary side is written "in-order" to the remote side before the application sees an "I/O complete" message.
The problem here is that Fibre Channel protocol requires four round trips to complete every I/O under sync replication. Even using dark Fibre cables between sites, the speed of light becomes your limiting factor because of the four round trips -- you lose about a millisecond for every 25 miles. Sync is limited in distance to about 100 kilometers. After that, application performance goes in the toilet. Async can go around the planet. So the farther you go, the more you need async remote copy.
4. Type of operating systems and how many servers involved
Software-based replication products work great. The problem arises when you have hundreds of servers to copy data from. Buying a software license for 200 servers at the primary location and another 200 licenses for the servers that need to be at the remote site can get very expensive. Also, I don't know of a software package yet that can be used with every operating system. If you have AIX, Solaris, Netware, NT, Windows 2000 and VMS, you may need a several separate software solutions. For a homogenous NT or Unix environment though, software works great and can save you money.
5. Whether or not clustering is being used
Most cluster solutions require real-time connectivity for heartbeat and locking for quorum resources. If you use clustering software like MSCS and want to stretch the cluster between locations so that all your applications transparently fail over, you will need to be within sync replication distances.
6. Availability of storage, servers and floor space at the remote site
If you have your own data center for your remote site, you're fine. If you need to lease space from a provider, you want to make sure your solution is as compact as possible. Server and storage consolidation must be considered prior to introducing hosted disaster recovery solutions. Hey, when you're paying by the foot you want to have very small feet!
7. and last but not least, your available budget
This is a no-brainer. Many companies, when faced with the real-world costs of disaster recovery, tend to get shell-shocked. Consider the costs:
• Floor space
• Servers for the recovery site
• Staff for the recovery site
• Storage hardware and licenses
• Software licenses
• Services to implement the solution
• Services to determine what needs to be copied, and why
• Network links (this is usually the most expensive part)
• Network-based SAN extension gear
The costs can add up quick. This sometimes makes the CTAM method look like a wonderful idea. (CTAM = Chevy Truck Access Method or, dump your backup tapes in the back of a truck and drive your data to the remote site).

No comments: