Practical advice for MySQL/MariaDB live migrations

Last Updated on

Modifying table structures is sometimes necessary, or desirable. Modifying them online can be a pain, especially with big tables. Things are easier if we can just schedule a migration and make the database unreachable until the end of the operation, but usually we can’t, because that would imply a revenue loss for the company.

This article provides some practical advice for live migrations, aka ALTER TABLE in production.

This article does not cover migrations on Galera Cluster or Group Replication, and some of its contents don’t apply in those environments. These topics may be discussed on different articles, if there is interest (so consider dropping a comment if you want me to write these articles).

A scene from The Birds, by HItchcock. Seagulls are sitting on a fence, in front of the sea.
These birds were not migrating.
From The Birds, Hitchcock.

pt-online-schema-change and gh-ost

Both these tools serve the purpose of modifying a table without putting locks for a long time, and they do this in similar ways. The general flow is the following:

  1. Create a new table identical to the old one.
  2. Modify the new table’s structure.
  3. Start to keep data in sync.
  4. Copy the data from the old table to the new.
  5. Once they’re in sync, rename the tables so that the clients start to use the new one.

Only the last operation implies a short-living lock. It is an atomic operation, so in every point in time the queries will be able to read a table. Essentially, the clients work will not be affected in any way.

The difference between the tools is the way they keep data in sync: pt-online-schema-changes create triggers on the old table, while pt-ghost consumes the binary log and applies the changes to the new table.

As a general rule, pt-online-schema-change keeps up more easily with intensive workloads, while gh-ost adds less work to the database.

pt-online-schema-change is produced by Percona and part of the Percona Toolkit. gh-ost is produced by GitHub.

Even if these tools are based on a great idea, it is important to understand that often their use should be avoided. They move a lot of data, because all existing rows must be read and rewritten. This means more work for the server, longer execution time, more disk space needed. This is unnecessary for certain operations. In fact, both MySQL and MariaDB support some online DDL, for example to add an index. Some operations are instantaneous – at the time of this writing, MySQL supports instantaneous ADD COLUMN and MariaDB supports instantaneous ADD COLUMN and DROP COLUMN.

Some notes about MariaDB and MySQL:

  • When MariaDB completes MDEV-16329, using external tools should not be necessary anymore (note that should != will; let’s wait and see).
  • MySQL 8 supports atomic DDL, currently MariaDB does not. This means that a crash should never leave a corrupted index, or orphan data structures on disk.

pt-online-schema-change and gh-ost should not be used in combination with the following techniques unless you know exactly what you are doing and have a bright idea. With some techniques it would be useless, with others there would likely be replication outages.


If we don’t have enough disk space to contain a copy of the table, we should run our migration with ALTER TABLE ALGORITHM=INPLACE. But it is important to understand that a query executed in this way is always locking – in the best case it will only imply a shared lock, meaning that the table will still be readable.

Otherwise, we can run ALTER TABLE LOCK=NONE. This means what it seems to mean: locks on the table during the migration will be avoided.

These options are not available for all operations. Some operations cannot be performed without a table copy or without locking the table. The list of these operations varies depending on the DBMS flavor and version.

These options however are still useful. If the options we specified are not supported for the migration we are attempting to run, the ALTER TABLE will fail with an error, and we will at least be aware of the problem, so we can plan our migration in a different way.

My advice is to try specifying ALGORITHM and LOCK clauses in this order (if a statement fails, try the next one):


If the priority is to save disk space, directly try ALGORITHM=INPLACE. Except, with MariaDB, if you are changing or adding a table primary key, try ALGORITHM=NOCOPY first (it could be faster).


MariaDB allows to specify how many seconds an ALTER TABLE should wait to acquire a lock before failing with an error. For example, ALTER TABLE WAIT 60 will wait for 60 seconds and ALTER TABLE NOWAIT will not wait at all.

This is especially useful on slaves, imagine the following scenario:

  1. A slave is running a long SELECT on table users, issued by some client.
  2. The slave should now replicate an ALTER TABLE on table users.

If parallel replication is not used, replication will completely stuck until the SELECT finishes. Even if parallel replication is used, at least one thread will be waiting – probably all threads will be blocked at some point.

Replication and IF NOT EXISTS (mostly MariaDB)

There is another way to avoid having replication stuck on an ALTER TABLE.

The migration can be executed on slaves first. When all slaves that we don’t want to get stuck have ran the migration, we can run an ALTER TABLE on the master, with the IF NOT EXISTS syntax. For example:

    ADD INDEX IF NOT EXISTS idx_phone (phone);

Thanks to IF NOT EXISTS, replication will not crash on slaves where the index already exists.

Unfortunately, MySQL does not support IF NOT EXISTS in ALTER TABLE.


pt-online-schema-change and gh-ost don’t handle foreign keys properly: gh-ost does not support them, pt-online-schema-change handles them in a risky way. To be clear, it’s not their fault: MySQL and MariaDB don’t provide a safe way to handle them.

Other problems with these tools is that they could be too slow in certain cases, and the disk could not have enough space.

So sometimes, even though it would be desirable to use them, it is simply not an option.

The switchover technique is useful in these cases. It is similar to the previous one, but it’s a bit more delicate. Here are the step – let’s assume we have a master called srv1 and a slave called srv2, plus other slaves whose names don’t matter:

  1. Run the migration on a srv2.
  2. Configure srv1 to replicate from srv2 (it’s a dual master configuration).
  3. Wait until Seconds_behind_master=0 on srv2.
  4. Point clients read-write connections to srv2.
  5. Wait until all read-write connections to srv1 are closed.
  6. Stop replicating from srv1 to srv2.
  7. Run the migration on all the slaves, including srv1.
A high-level diagram of the steps above. While the application uses srv1, ALTER TABLE runs on srv2. When the application uses srv2, ALTER TABLE runs on srv1.

The idea is that the migration runs on all production servers, but now while they are used by applications.

Related contents

Related articles

Related courses

The articles on share a lot of technical knowledge for free, in the hope that they can help solving specific problems and raise the level of technical culture. However, if you work with database technologies, as an administrator or as a developer, you may need to build more organic knowledge and skills. Here are some courses related to the subject of this article.

Related services

Monthly time: Outsource delicate migrations that may take your servers down or slow down your applications. Avoid useless risks, take advantage of my expertise!


How do you run migrations in production? Do you have experiences or thoughts to share? Did you find anything wrong in the article? Please comment!

Comments to my articles make this website not just one person’s texts, but also a more collective knowledge base.

Toodle pip,

Photo credit

Comments (4)

  1. Great article. Though I think “If we don’t have enough disk space to contain a copy of the table, we should run our migration with ALTER TABLE ALGORITHM=INPLACE.” is not absolutely correct.
    Even with ALGORITHM=INPLACE, a copy table is created. It is just that concurrent DML is allowed.

    1. Hi Tusbar!
      Thanks for the feedback. Check MySQL documentation though: “INPLACE: Operations avoid copying table data but may rebuild the table in place. An exclusive metadata lock on the table may be taken briefly during preparation and execution phases of the operation. Typically, concurrent DML is supported.”
      So, the table data is not copied, and typically concurrent DML is allowed, but to be sure about this you should use LOCK = NONE.

  2. Thank you for the article! I am really interested in migrations with Galera cluster in production. I don’t know if you have already written it or not.

    1. Hi Ana! It is indeed an interesting topic. I didn’t write an article about it, but it is a good idea. I will write it for the website of the company I started very recently. If you follow me there, you will see it soon. Thanks for a great suggestion.

Leave a Reply

Your email address will not be published. Required fields are marked *