Last Updated on
Modifying table structures is sometimes necessary, or desirable. Modifying them online can be a pain, especially with big tables. Things are easier if we can just schedule a migration and make the database unreachable until the end of the operation, but usually we can’t, because that would imply a revenue loss for the company.
This article provides some practical advice for live migrations, aka
ALTER TABLE in production.
This article does not cover migrations on Galera Cluster or Group Replication, and some of its contents don’t apply in those environments. These topics may be discussed on different articles, if there is interest (so consider dropping a comment if you want me to write these articles).
pt-online-schema-change and gh-ost
Both these tools serve the purpose of modifying a table without putting locks for a long time, and they do this in similar ways. The general flow is the following:
- Create a new table identical to the old one.
- Modify the new table’s structure.
- Start to keep data in sync.
- Copy the data from the old table to the new.
- Once they’re in sync, rename the tables so that the clients start to use the new one.
Only the last operation implies a short-living lock. It is an atomic operation, so in every point in time the queries will be able to read a table. Essentially, the clients work will not be affected in any way.
The difference between the tools is the way they keep data in sync: pt-online-schema-changes create triggers on the old table, while pt-ghost consumes the binary log and applies the changes to the new table.
As a general rule, pt-online-schema-change keeps up more easily with intensive workloads, while gh-ost adds less work to the database.
Even if these tools are based on a great idea, it is important to understand that often their use should be avoided. They move a lot of data, because all existing rows must be read and rewritten. This means more work for the server, longer execution time, more disk space needed. This is unnecessary for certain operations. In fact, both MySQL and MariaDB support some online DDL, for example to add an index. Some operations are instantaneous – at the time of this writing, MySQL supports instantaneous
ADD COLUMN and MariaDB supports instantaneous
ADD COLUMN and
Some notes about MariaDB and MySQL:
- When MariaDB completes MDEV-16329, using external tools should not be necessary anymore (note that should != will; let’s wait and see).
- MySQL 8 supports atomic DDL, currently MariaDB does not. This means that a crash should never leave a corrupted index, or orphan data structures on disk.
pt-online-schema-change and gh-ost should not be used in combination with the following techniques unless you know exactly what you are doing and have a bright idea. With some techniques it would be useless, with others there would likely be replication outages.
LOCK=NONE and ALGORITHM=INPLACE
If we don’t have enough disk space to contain a copy of the table, we should run our migration with
ALTER TABLE ALGORITHM=INPLACE. But it is important to understand that a query executed in this way is always locking – in the best case it will only imply a shared lock, meaning that the table will still be readable.
Otherwise, we can run
ALTER TABLE LOCK=NONE. This means what it seems to mean: locks on the table during the migration will be avoided.
These options are not available for all operations. Some operations cannot be performed without a table copy or without locking the table. The list of these operations varies depending on the DBMS flavor and version.
These options however are still useful. If the options we specified are not supported for the migration we are attempting to run, the
ALTER TABLE will fail with an error, and we will at least be aware of the problem, so we can plan our migration in a different way.
My advice is to try specifying
LOCK clauses in this order (if a statement fails, try the next one):
If the priority is to save disk space, directly try
ALGORITHM=INPLACE. Except, with MariaDB, if you are changing or adding a table primary key, try
ALGORITHM=NOCOPY first (it could be faster).
WAIT and NOWAIT (MariaDB)
MariaDB allows to specify how many seconds an
ALTER TABLE should wait to acquire a lock before failing with an error. For example,
ALTER TABLE WAIT 60 will wait for 60 seconds and
ALTER TABLE NOWAIT will not wait at all.
This is especially useful on slaves, imagine the following scenario:
- A slave is running a long
users, issued by some client.
- The slave should now replicate an
ALTER TABLEon table
If parallel replication is not used, replication will completely stuck until the
SELECT finishes. Even if parallel replication is used, at least one thread will be waiting – probably all threads will be blocked at some point.
Replication and IF NOT EXISTS (mostly MariaDB)
There is another way to avoid having replication stuck on an
The migration can be executed on slaves first. When all slaves that we don’t want to get stuck have ran the migration, we can run an
ALTER TABLE on the master, with the
IF NOT EXISTS syntax. For example:
ALTER TABLE users LOCK=NONE, ADD INDEX IF NOT EXISTS idx_phone (phone);
IF NOT EXISTS, replication will not crash on slaves where the index already exists.
Unfortunately, MySQL does not support
IF NOT EXISTS in
pt-online-schema-change and gh-ost don’t handle foreign keys properly: gh-ost does not support them, pt-online-schema-change handles them in a risky way. To be clear, it’s not their fault: MySQL and MariaDB don’t provide a safe way to handle them.
Other problems with these tools is that they could be too slow in certain cases, and the disk could not have enough space.
So sometimes, even though it would be desirable to use them, it is simply not an option.
The switchover technique is useful in these cases. It is similar to the previous one, but it’s a bit more delicate. Here are the step – let’s assume we have a master called srv1 and a slave called srv2, plus other slaves whose names don’t matter:
- Run the migration on a srv2.
- Configure srv1 to replicate from srv2 (it’s a dual master configuration).
- Wait until
- Point clients read-write connections to srv2.
- Wait until all read-write connections to srv1 are closed.
- Stop replicating from srv1 to srv2.
- Run the migration on all the slaves, including srv1.
The idea is that the migration runs on all production servers, but now while they are used by applications.
- The perils of ALTER TABLE in MySQL/MariaDB – Avoiding locks is not everything, please be aware of all the problems that may arise.
- Dropping database objects in a safe way – In case you reached this article because you need to drop columns or indexes.
- If you use MariaDB, you may also check Use cases for MariaDB Invisible Columns to find out how to better test adding and dropping columns.
The articles on Federico-Razzoli.com share a lot of technical knowledge for free, in the hope that they can help solving specific problems and raise the level of technical culture. However, if you work with database technologies, as an administrator or as a developer, you may need to build more organic knowledge and skills. Here are some courses related to the subject of this article.
Monthly time: Outsource delicate migrations that may take your servers down or slow down your applications. Avoid useless risks, take advantage of my expertise!
How do you run migrations in production? Do you have experiences or thoughts to share? Did you find anything wrong in the article? Please comment!
Comments to my articles make this website not just one person’s texts, but also a more collective knowledge base.