In some of my environments I have to deal with large data sets that are well over 300GB. In this test I have a master, a read only slave and a backup server. In order to rebuild one of the slaves I need to transfer data over the network. Many know doing this is time consuming and expensive, so I have done some benchmark tests to find the fastest possible way. I have tested 3 methods at this time to find a good solution, I will test a forth method this week and update this article to give you the results.
Our system runs on Linux and has a gig network connection so times may vary depending on your setup.
First test was using rsync. Using this method is great if you can recover with existing data as it can make the changes using incremental differences. If you are building from newer hardware or cannot use the existing data then the time is drastically longer. The first rsync will have to move the files over by using the scp method(later explained). You can use this method if you want to minimize downtime. You will need to set up ssh keys in order to proceed with rsync which I remove once the data transfer is complete. The step below are the tests I timed using this method.
1.Create a ssh key from the server running rsync
ssh-keygen -t rsa
2.cat the newly created key
cat ~/.ssh/id_rsa_pub
3.Copy the text and paste on the remote server.
Vi /root/.ssh/authorized_keys
4.Execute rsync on the server to be rebuilt
/usr/bin/rsync -avzpogt –exclude=”*any files*” –delete -e ssh root@:/some/path* /some/path
This test was conducted while MySQL is stopped(you can lock the tables dont forget to set wait_timout and interactive timeout to 28000).
Timed Results first run
real 677m28.259s
user 228m48.514s
sys 47m40.737s
Timed Results first (Final)
real 2m21.259
user 0m38.259s
sys 1m00.252s
This test was conducted while MySQL was running, but you will need to stop MySQL on the final sync. I also uses an expect script to continue to run the rsync until I was ready to do the final transfer.
Timed Results first run
real 877m28.845s
user 248m54.214s
sys 30m22.231s
expect average time
real 48m32.156s
user 22m59.781s
sys 35m43.376s
expect script ran until we locked table and ran the final
real 22m45.101s
user 8m09.563s
sys 6m15.785s
The second method was introduced to me by A friend of mine and his blog is Koopman.me There are 2 methods and I chose the first one. With this method You will need to sop MySQL so transfer the files. This method will compress the files over the network and decompress on the remote host. Yes I could have used the lets tar up the data files on sending server, but this is expensive and will take time to do. Not to mention that you can run out of disk space quickly.
This is the basic commands:
A. tar cf – data files(*) | ssh user@remotehost ” ( cd /some/path ; tar xf – ) ”
B. ssh user@remotehost “( cd /some/path ; tar cf – data file(*) ) ” | tar xf -
Timed Results:
real 177m41.929s
user 108m52.797s
sys 47m40.737s
The last method was using straight scp. This method is just like the first step of rsync, but simplified. Not a lot of explaining on this one.
Command from the sending server.
scp data files(*) user@somehost:/some/path
Please not that you will need to have ownership or a tmp directory on the remote server.
Timed Results:
real 599m48.729s
user 203m52.992s
sys 68m40.569s
The next test to be conducted will be using scp with the -C flag for compression some state this will be the fastest. At this time the tar over ssh seems to be the fastest thanks koopman.me

Posted in
Tags: 

[...] Move duplicate data from master server to slave server using one of the methods here. [...]