Lack of Bandwidth does not have to be a Replication Killer
As disk-based backup and deduplication becomes more popular in the backup process, it is a natural next step to want to move data off-site. Whether this motivation is driven by disaster recovery requirements or centralizing backup data, replicating data from one disk-based subsystem to another is growing in popularity. It is as companies look to implement replication as part of their disk-based backup solution, especially when replicating data from remote and branch offices to central offices, that concerns about bandwidth availability inevitably arise.
While the price of bandwidth over WAN links has certainly dropped in recent years and there are more options available than ever before, it still is not free and not every site has the same type of WAN connection. If anything, when companies look to replicate backup data, they are looking to spend less money, not more, on managing backup data so the last thing they will want to do is add more WAN links or bandwidth just to replicate backup data. This makes it more important that companies not only deduplicate their backup data before it is replicated but only replicate the backup data that is not already at the target site.
One option that some disk-based libraries use to replicate data is moving blocks of deduplicated data in bursts. Using data bursts the system sends blocks of data to a target site without knowing if the blocks of data are needed at the target site or not. Since the data is deduplicated, it is quite possible that the data sent is not unique to the target system which can result in the target system needing to rededuplicate the data on the target system. This is inefficient since it causes overhead on the target system but more importantly, the replication consumes precious network bandwidth. This can result in the need for more expensive WAN links, incomplete replication jobs due to network congestion or disruptions in other network transmissions.
In a check and forward data replication scenario such as is used by disk libraries like the Quantum DXi-Series, it functions as follows:
- Unique blocks are sent from a source to a target, along with all the index entries for the data.
- The target system builds its own index that includes all the unique blocks of all data it has received from all sources, including local backups.
- Before a source system sends the target the replicated blocks, it first sends the index entries for the blocks it plans to replicate.
- The target system sends back a list of blocks it does not already have stored.
- The source system then sends the blocks of deduplicated data specified in this list to the target system. All of these blocks will be unique to the target system
Using a check and forward data replication system such as is found the Quantum DXi-Series, data that is deduplicated at the source site is not necessarily replicated unless the block of deduplicated data is also unique to the target system. This saves valuable bandwidth and avoids the need for the target system to rededuplicate the data which enables companies to dedicate more of these resources to other operational tasks.
Replication technology is rapidly becoming viewed as a requirement when implementing deduplication but network bandwidth can become a stumbling block when looking to replicate data off-site. However by identifying disk-based systems that use a check and forward data replication methodology, you increase the odds you can use your available bandwidth without impacting your current environment or impacting other applications that use these WAN links
Leave a comment