When I started working for my current employer, we had six remote sites, each with a Windows file server backing up to a set of DAT72 tapes, that were stored on-site. The remote sites would not pay for off-site backup, so I set out to develop a way to back them up to our headquarters in Minnesota.
Our set up is each remote site connected via MPLS VPN over a T1 line to our main site in Minnesota. Our site in Minnesota has two bundled T1 lines to the VPN and then two bundled T1′s to the Internet. With bandwidth limited to 1.5Mbps at each site, I needed an efficient solution.
I elected to use rsync because it will transfer only changed parts of files, rather then copying the entire file. Rsync is a very robust file copy program that can push and pull files. I settled on using DeltaCopy, a windows based rsync wrapper.
For a backup target, I purchased a Buffalo Terastation II Pro. It has four 500GB drives in a Raid 5 array, giving me 1.5TB of storage with fault-tolerance. I hacked the Buffalo to give me command line access and then configured rsync daemons to run on the device.
I configured each Windows server to run the DeltaCopy client to synchronize the shares to the Buffalo drive. It will then email the results of the backup to me. The DeltaCopy client ignores file permissions and catches changes in each file in the share. Total storage used was around 300GB.
After getting it all working, I realized it was backing up files I didn’t want included, like tmp files and others. I set up an exclude file to remove all these unwanted files. I discovered you could configure DeltaCopy with most rsync parameters using the Options tab. I added the line –exclude-from “/cygdrive/C/tools/excludes.txt” and put the excludes in that file. Then, I went through the backup and removed the unwanted files using the linux find command.
After getting backups down to a manageable level, I decided that it would be nice to keep changes made to files. After reading up on the command line options of rsync, I decided to use some environmental variables to store backed up files by date. I tried to use –backup –backup-dir=”date” using both linux date commands and windows date commands, but neither would work. I settled on using –backup –backup-dir=”XXXX-XX-XX” and then setting up a script on the backup server that renames all my XXXX-XX-XX to the date of the backup. It works very well.
Backup times have been reduced greatly. On the old tape based backup, nightly differential backups would happen at each location at 10pm. Each backup would run for between 1 and 2 hours. Full backups would happen on Thursday nights, and last between 4 and 6 hours. Now, each site generates a full backup every night in about 15 minutes. It also keeps old versions and has offsite protection.
Restoring files is as simple as copying the files back to their original location. In the event of a catastrophic loss, data can be copied to external HD’s hooked to each backup server and sent overnight to the location. Fedex is still faster than pushing large amounts of data over a T1 line.
This project took about a week to research, a week to configure and two months to fine-tune. Since it’s launch, it has eliminated most of the user error involved with failed backups.
If you have questions or suggestions, I’d like to hear them.