Startseite > English > bidirectional syncing of MySQL tables with percona pt-table-sync

bidirectional syncing of MySQL tables with percona pt-table-sync

What need to be done:

For an current project in the organization I work (INF) I want try the bidirectional syncing of a MySQL database. We will run the same application in different offices. Between the servers we have a VPN connections. We don’t need live syncing – once a day should be enough.

Possibilities:

The possibilities that I found for syncing are these:

MySQL own replication.
seems to be hard to implement and nobody recommends a master-master replication. But maybe its not recommended to have a read-write access to master-master replication with any solution. But that doesn’t matter – I need a solution.

SQLyog Job Agent (SJA)
from http://www.webyog.com/en/ what is part of SQLyog Enterprise and SQLyog Ultimate
This is the solution suggested by the developer of our application
The Linux version should be free
I cannot find the download for Linux, maybe its included in the 30 days trial version.
And the bidirectional sync seems just to be two times a one-way-sync. So first we sync from serverA to serverB and then back.
So this doesn’t help us much. If somebody updates serverB but we syncing first from serverA to serverB, we will overwrite the change again.

pt-table-sync made by percona
the bidirectional syncing is marked as experimental but till now it seems to be the best way – so I will test it first.

user permissions

the sync user needs quite much permissions. I tried to give as less as possible and run several times into problems. pt-table-sync tells you every time it has not enough privileges. I will have to find out what are the minimal privileges are and then (maybe :-} ) update this post. Now I ended up with SUPER and REPLICATION CLIENT as global permissions and SELECT, INSERT, UPDATE, DELETE, REFERENCES, LOCK TABLES for the table I like to sync.

primary key conflict:

In the MySQL default configuration the AUTO_INCREMENT fields are just incremented by 1 if a new row is added to the table. If two servers add a row in a table where the primary key is set to AUTO_INCREMENT (what happens really very often), we would have a conflict that is not resolvable because we have the same PRIMARY KEY double.
So we set two variables in the MySQL configuration file (my.cnf)
auto-increment-increment = 100
auto-increment-offset = 1

auto-increment-increment is on every server set to 100 and auto-increment-offset is set different on every server (1,2,3 etc.)
So now new AUTO_INCREMENT values will incremented by 100 and every server has an own offset, so new rows will get on server1 the ids 101, 201, 301 and on server2 102, 202, 302

timestamps:

Our database structure has a timestamp column in every table, what is very useful so we can use the pt-table-sync options: „–conflict-column timestamp –conflict-comparison newest“ for conflicts.
So in case an item is updated on both servers, the one with the newest timestamp will win.
Just in the case there is a change on the same item, saved in the same seconds we would have an unresolvable conflict, but that is not very likely.
By default pt-table-sync just creates a warning when running in a unresolvable conflict but with the option „–conflict-error die“ it stops the execution in this case.

tables with combined primary keys:

The documentation says that one of the limitations of the bidirectional sync is:
* requires that the table(s) are chunkable with the Chunk algorithm
I didn’t understand that limitation first, but now I think it means that we need a single-column-primary-key. The one way syncing works fine with combined primary key tables, but for bidirectional syncing of tables with combined primary keys I got the message that „No plugin can sync <database.table>„.
This forum post gave me the tip.

We have some tables that have combined primary keys and we need to sync them too.
So I changed the PRIMARY KEY indexes to UNIQUE and add an „id“ column and made it PRIMARY KEY.
This should not affect the application.

not found changes:

In every table our database has a column called „deleted“. This column can be set to 0 or 1. When I changed this value pt-table-sync didn’t recognize the change. I think this must be because of the checksum function, by default it is CRC32. When I set it to MD5 it works fine ( –function MD5 ). SHA1 is probably even more save but also slower.

automatic script:

I’ve wrote a small perl script that has contains a list of tables that need to be sync and the timestamp column name. If the timestamp column names would be the same everywhere we maybe would not need the list – but I cannot change the database design. The list makes also sure that we update / insert data in the correct order, so we don’t run into foreign key problems.
So the script runs through the list of tables and sync them with pt-table-sync.
Conflicts (changes between two sync on different servers) will be recognized with the help of the „change_log“ table. In a conflict the later change wins but both parties gets an email with the information about the conflict.

The script collects some information and prints them. It’s designed to run as cron job and send the syncing information by email to the admin, that’s why the short information is in the head and the verbose at the bottom.
You can find the script here.

But don’t use it as an out-of-the-box solution. It changes your database, so use it just if you understand what it does! I don’t give any guaranty it works. It might destroy your database, server, office. So be careful!

Additionally the script uses unison to sync files on the filesystem

process of the sync:

  1. every server runs a MySQL backup locally at a specific time e.g. every day at 6.00PM
  2. a little bit later e.g. at 6:30PM the main server in the central office starts the sync script
    1. The sync script first sets his own and the remote software to „Maintenance Mode“ (a feature we have in our software, to prevent user to use it while the admin works :-] )
      For this matter I use the preExecutionSQLcommand in the script.
    2. then it runs trough every server the first time and syncs the database and the files
    3. after syncing every server once the script runs a second time to make sure every server gets also the changes from later sync servers.  If I sync server 1,2,3,4 the first time I need to sync 1,2,3 again to make sure the changes from 4 are everywhere.
    4. if the synchronization works the server disable the „Maintenance Mode“ again (postExecutionSQLcommand). If the synchronization fails the „Maintenance Mode“ will be still enabled, so the administrator would have the chance to fix the problem before somebody else have the chance to edit data and make the problem worse.
    5. the results are send by email to the administrators

simulate slow and bad connection:

I’m testing this solution between two virtual machines and for simulating bad and slow connection we can use the Linux „tc“ tool:

http://www.linuxfoundation.org/collaborate/workgroups/networking/netem

For bad or slow connections it might be helpful to increase the net_read_timeout and net_write_timeout settings of MySQL.

Kategorien:English Schlagwörter: , ,
  1. Es gibt noch keine Kommentare.
  1. No trackbacks yet.

Schreibe einen Kommentar

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit Deinem WordPress.com-Konto. Abmelden / Ändern )

Twitter-Bild

Du kommentierst mit Deinem Twitter-Konto. Abmelden / Ändern )

Facebook-Foto

Du kommentierst mit Deinem Facebook-Konto. Abmelden / Ändern )

Google+ Foto

Du kommentierst mit Deinem Google+-Konto. Abmelden / Ändern )

Verbinde mit %s

%d Bloggern gefällt das: