Practical advice for MySQL/MariaDB live migrations

Last Updated on

Modifying table structures is sometimes necessary, or desirable. Modifying them online can be a pain, especially with big tables. Things are easier if we can just schedule a migration and make the database unreachable until the end of the operation, but usually we can’t, because that would imply a revenue loss for the company.

This article provides some practical advice for live migrations, aka ALTER TABLE in production.

This article does not cover migrations on Galera Cluster or Group Replication, and some of its contents don’t apply in those environments. These topics may be discussed on different articles, if there is interest (so consider dropping a comment if you want me to write these articles).

A scene from The Birds, by HItchcock. Seagulls are sitting on a fence, in front of the sea.
These birds were not migrating.
From The Birds, Hitchcock.

pt-online-schema-change and gh-ost

Both these tools serve the purpose of modifying a table without putting locks for a long time, and they do this in similar ways. The general flow is the following:

  1. Create a new table identical to the old one.
  2. Modify the new table’s structure.
  3. Start to keep data in sync.
  4. Copy the data from the old table to the new.
  5. Once they’re in sync, rename the tables so that the clients start to use the new one.

Only the last operation implies a short-living lock. It is an atomic operation, so in every point in time the queries will be able to read a table. Essentially, the clients work will not be affected in any way.

The difference between the tools is the way they keep data in sync: pt-online-schema-changes create triggers on the old table, while pt-ghost consumes the binary log and applies the changes to the new table.

As a general rule, pt-online-schema-change keeps up more easily with intensive workloads, while gh-ost adds less work to the database.

pt-online-schema-change is produced by Percona and part of the Percona Toolkit. gh-ost is produced by GitHub.

Even if these tools are based on a great idea, it is important to understand that often their use should be avoided. They move a lot of data, because all existing rows must be read and rewritten. This means more work for the server, longer execution time, more disk space needed. This is unnecessary for certain operations. In fact, both MySQL and MariaDB support some online DDL, for example to add an index. Some operations are instantaneous – at the time of this writing, MySQL supports instantaneous ADD COLUMN and MariaDB supports instantaneous ADD COLUMN and DROP COLUMN.

Some notes about MariaDB and MySQL:

  • When MariaDB completes MDEV-16329, using external tools should not be necessary anymore (note that should != will; let’s wait and see).
  • MySQL 8 supports atomic DDL, currently MariaDB does not. This means that a crash should never leave a corrupted index, or orphan data structures on disk.

pt-online-schema-change and gh-ost should not be used in combination with the following techniques unless you know exactly what you are doing and have a bright idea. With some techniques it would be useless, with others there would likely be replication outages.

LOCK=NONE and ALGORITHM=INPLACE

If we don’t have enough disk space to contain a copy of the table, we should run our migration with ALTER TABLE ALGORITHM=INPLACE. But it is important to understand that a query executed in this way is always locking – in the best case it will only imply a shared lock, meaning that the table will still be readable.

Otherwise, we can run ALTER TABLE LOCK=NONE. This means what it seems to mean: locks on the table during the migration will be avoided.

These options are not available for all operations. Some operations cannot be performed without a table copy or without locking the table. The list of these operations varies depending on the DBMS flavor and version.

These options however are still useful. If the options we specified are not supported for the migration we are attempting to run, the ALTER TABLE will fail with an error, and we will at least be aware of the problem, so we can plan our migration in a different way.

My advice is to try specifying ALGORITHM and LOCK clauses in this order (if a statement fails, try the next one):

  1. ALGORITHM=INSTANT LOCK=NONE
  2. ALGORITHM=INSTANT LOCK=SHARED
  3. LOCK=NONE
  4. LOCK=SHARED

If the priority is to save disk space, directly try ALGORITHM=INPLACE. Except, with MariaDB, if you are changing or adding a table primary key, try ALGORITHM=NOCOPY first (it could be faster).

WAIT and NOWAIT (MariaDB)

MariaDB allows to specify how many seconds an ALTER TABLE should wait to acquire a lock before failing with an error. For example, ALTER TABLE WAIT 60 will wait for 60 seconds and ALTER TABLE NOWAIT will not wait at all.

This is especially useful on slaves, imagine the following scenario:

  1. A slave is running a long SELECT on table users, issued by some client.
  2. The slave should now replicate an ALTER TABLE on table users.

If parallel replication is not used, replication will completely stuck until the SELECT finishes. Even if parallel replication is used, at least one thread will be waiting – probably all threads will be blocked at some point.

Replication and IF NOT EXISTS (mostly MariaDB)

There is another way to avoid having replication stuck on an ALTER TABLE.

The migration can be executed on slaves first. When all slaves that we don’t want to get stuck have ran the migration, we can run an ALTER TABLE on the master, with the IF NOT EXISTS syntax. For example:

ALTER TABLE users
    LOCK=NONE,
    ADD INDEX IF NOT EXISTS idx_phone (phone);

Thanks to IF NOT EXISTS, replication will not crash on slaves where the index already exists.

Unfortunately, MySQL does not support IF NOT EXISTS in ALTER TABLE.

Switchover

pt-online-schema-change and gh-ost don’t handle foreign keys properly: gh-ost does not support them, pt-online-schema-change handles them in a risky way. To be clear, it’s not their fault: MySQL and MariaDB don’t provide a safe way to handle them.

Other problems with these tools is that they could be too slow in certain cases, and the disk could not have enough space.

So sometimes, even though it would be desirable to use them, it is simply not an option.

The switchover technique is useful in these cases. It is similar to the previous one, but it’s a bit more delicate. Here are the step – let’s assume we have a master called srv1 and a slave called srv2, plus other slaves whose names don’t matter:

  1. Run the migration on a srv2.
  2. Configure srv1 to replicate from srv2 (it’s a dual master configuration).
  3. Wait until Seconds_behind_master=0 on srv2.
  4. Point clients read-write connections to srv2.
  5. Wait until all read-write connections to srv1 are closed.
  6. Stop replicating from srv1 to srv2.
  7. Run the migration on all the slaves, including srv1.
A high-level diagram of the steps above. While the application uses srv1, ALTER TABLE runs on srv2. When the application uses srv2, ALTER TABLE runs on srv1.

The idea is that the migration runs on all production servers, but now while they are used by applications.

Related contents

Related articles

Related courses

The articles on Federico-Razzoli.com share a lot of technical knowledge for free, in the hope that they can help solving specific problems and raise the level of technical culture. However, if you work with database technologies, as an administrator or as a developer, you may need to build more organic knowledge and skills. Here are some courses related to the subject of this article.

Related services

Monthly time: Outsource delicate migrations that may take your servers down or slow down your applications. Avoid useless risks, take advantage of my expertise!

Conclusions

How do you run migrations in production? Do you have experiences or thoughts to share? Did you find anything wrong in the article? Please comment!

Comments to my articles make this website not just one person’s texts, but also a more collective knowledge base.

Toodle pip,
Federico

Photo credit

Comments (12)

  1. Great article. Though I think “If we don’t have enough disk space to contain a copy of the table, we should run our migration with ALTER TABLE ALGORITHM=INPLACE.” is not absolutely correct.
    Even with ALGORITHM=INPLACE, a copy table is created. It is just that concurrent DML is allowed.

    1. Hi Tusbar!
      Thanks for the feedback. Check MySQL documentation though: “INPLACE: Operations avoid copying table data but may rebuild the table in place. An exclusive metadata lock on the table may be taken briefly during preparation and execution phases of the operation. Typically, concurrent DML is supported.”
      So, the table data is not copied, and typically concurrent DML is allowed, but to be sure about this you should use LOCK = NONE.
      https://dev.mysql.com/doc/refman/8.0/en/alter-table.html

  2. Thank you for the article! I am really interested in migrations with Galera cluster in production. I don’t know if you have already written it or not.
    Regards

    1. Hi Ana! It is indeed an interesting topic. I didn’t write an article about it, but it is a good idea. I will write it for vettabase.com the website of the company I started very recently. If you follow me there, you will see it soon. Thanks for a great suggestion.
      Cheers

  3. MariaDB has bright start, looks promising, but according to db-engines site, has no significant growth since 2019 year. Lack of compatibility with MySQL, no “explain format=tree option” and horrible documentation, compairing with excellent MySQL docs. No dedicated GUI, also. MySQL is well supported by proffesionals, MariaDB is created by enthusiasts with uncertain future. Migrating to MariaDB is risky if someone is thinking about a stable future of its software.

    1. Hi Krzys,
      I’m not sure how this relates to an article about ALTER TABLE, it looks like you wanted to write your opinion somewhere and couldn’t find a better place. The topic itself however is interesting, and I’m happy to reply.
      I disagree. Sometimes upgrading a MySQL minor version is more problematic than switching to MariaDB. An example of this is when a release of MySQL 8 (which was already GA) broke compatibility with Percona Xtrabackup. Note that non-working backups are arguably the worse problem an IT department can have. This happened because at some point Oracle decided it is ok to introduce untested features or remove features (not necessarily after obsoleting them) from a GA branch, actually managing them as if they were beta branches.
      On the other hand, MariaDB didn’t behave very differently when they introduced heavy InnoDB changes in the first GA release of MariaDB 10.5.
      You are definitely right about MariaDB KnowledgeBase. Apparently they counted a bit too much on community help, and aren’t really catching up with MySQL manual’s quality.
      MariaDB developers are as professional as MySQL developers – actually many of them are former MySQL developers who prefer to work for MariaDB.
      MariaDB and MySQL are both great products, with different pro’s and con’s. None of them is always better than the other, and the best choice depends from case to case.

      1. Yes, you are absolutely right, my reply had nothing with ALTER command, but was related to the title of that post (migration from MySQL to MariaDB). Your thoughts about problems with upgrades are very interested. Thanks a lot for opinion.

        1. I see where the confusion comes from, but ALTER TABLEs in production are often called migrations.
          Thanks for commenting, though I’m afraid the comments will disappear when the post is moved to vettabase.com website. Sorry for the inconvenience, but please feel free to repropose your thoughts.

  4. How do you think query optimization in MariaDB looks like? Because this database does not support EXPLAIN = tree, so you cannot view the plan as a tree. And yet most major bases are equipped with such capabilities. Additionally, MariaDB does not provide the cost of executing the query. This makes it harder for MySQL and MariaDB to compare query plans, and makes it harder to optimize queries. I am curious what is your opinion on this?

    1. As I explained after your first comment, in this same page, I’d prefer comments to be related to the article.

      I’ll answer once more, mainly to be nice. But actually I don’t know what to say, because I don’t use the tree view implemented in MysQL 8.0 and (as far as I remember) I never minded about the estimated cost in MySQL.

  5. Thanks for reply. If you don’t like my comments/questions, please be so kind and do delete them. I will not post them any more.

    1. I’m sorry to read that. On the other side, a comment form someone that didn’t read the article is actually inappropriate. If you read my articles and have something to say about them, I’ll be happy to see your comments.

      Cheers,
      Federico

Leave a Reply

Your email address will not be published. Required fields are marked *