ALTER TABLE is often necessary, but it can be risky in a production environment for many reasons. It is a destructive command, meaning that it can delete some data and cannot be rolled back. It acquires locks that can cause problems, especially if you have some form of high availability. It can obviously be a quite heavy operation. Also, some MySQL and MariaDB features are not designed to allow safely change table structures.
There are solutions for these problems, and good practices to follow when altering tables. For example, it is often a good idea to use a tool like pt-online-schema-change or gh-ost to modify tables (and sometimes it is a good idea not to use them). Specifying options like
LOCK also helps us to gain more control. But these topics will probably be discussed in a future article.
For now, let’s focus on the risks instead. It’s important to understand them, before deciding what to do or how.
This is just a quick reference. It could be incomplete and all details are missing. If you would like to know more about some of these points, or if you have some more risks to add to this list, please write a comment below.
EDIT: In MariaDB the use of external tools for ALTER TABLEs may be obsolete after MDEV-16329 Cross-engine ALTER ONLINE TABLE is completed and released. Thanks to Marko Makela for reporting this in the comments.
ALTER TABLE operations require a table copy. Before starting, we need to be sure that we have enough space on disk. If we have slaves, we should be sure that our slaves also have enough disk space.
When you add a column or index, a table will permanently grow. When you add a column, an index or a table, new rows will require space permanently. These are good reasons to use the correct data types. While for each single column using the correct type and size could be irrelevant, it will probably be relevant to choose the correct type and size every time we make a change, as an organisation rule.
At the same time, don’t be too paranoid about space. If you create columns that are too small, a migration will be necessary in the future to fix the problem. And if your application uses a non-restrictive
SQL_MODE, it is very possible that your data will be truncated and no one will note that.
ALTER TABLE types are online. The exact set of online operations depends on your MySQL version, and they are not the same between MySQL and MariaDB. Some operations are even instantaneous.
But others will lock DML statements. Imagine you write to a table several times a second, and you need to alter the table… but the operation will last for hours. This is a potential disaster, so the migration needs to be planned.
Galera has two methods to run
ALTER TABLE: RSU and TOI. RSU can potentially avoid a cluster downtime, but it is risky. For example, it allows to write on node1 while you add a new column without a default value on node2. But the in this case, the
INSERTs will fail on node2.
Even a simple
SELECT acquires metadata locks. This is expected, because we don’t want a column to disappear in the middle of a query. But imagine we have a master and a slave. We run long-running queries on the slave. At some point, we run an
ALTER TABLE on the master and, after it completes, it is propagated to the slave. But long-running queries will block the
ALTER TABLE. And replication will lag for… how much? Hours? Maybe it’s more than we can tolerate.
A possible solution is to run the
ALTER TABLE on the master and the slave separately, setting
sql_bin_log=0 at session level so that the operation will not be replicated. Notice that this is risky if you don’t know what you are doing: adding or dropping a column on the slave can make some queries fail even if they succeeded on the master.
EDIT: MDEV-11675 will allow true lock-free
ALTER TABLE in MariaDB slaves. Thanks to Marko Makela for reporting this in the comments.
Replicate to different tables
A slave can have more columns than the master for a certain table, as long as those columns are added at the end of the table. The master can have more column than the slaves, if they are added at the end of the table. Replication will not take these columns into account.
Probably the astute reader already guessed what the problem is. If we use this feature, we must be sure that non-common columns remain at the end of the table. By default
ALTER TABLE adds new columns at the end of the table, so we need to be very careful to use the
AFTER clause when needed. And we must be sure that extra columns are in the master or in a slave, not both.
Of course the simplest solution is not to use this feature.
A source of many problems for IT people is that there is no way to deploy code in the exact moment an
ALTER TABLE finishes. If you add a column,
INSERTs should be rewritten accordingly. But old
INSERTs will fail on the new table, new
INSERTs will fail on the old table, and the code change cannot be coordinated with the DBMS.
A clean solution is to add columns with
DEFAULT value. In this way old
INSERTs will not fail.
INSERTs can be adjusted later.
For the same reason, dropping columns can also be problematic. If we drop the column too early, all statements that mention the column will fail. We can drop it later, but only if the column has a
DEFAULT value, otherwise
INSERTs will fail. If the column doesn’t have a
DEFAULT value, we should add it, then deploy the new code, then drop the column.
MySQL supports the
IGNORE INDEX syntax. We basically list some indexes that should be used or ignored, no matter what MySQL thinks about it. But if we drop an index that is named by queries in this way, the queries will fail. This is true even in the case of
I think that MySQL is quite good at choosing the right index to use. But sometimes it can do horrible mistakes, and a query can be unnecessarily slow. For that reason, adding an index sometimes implies some risks. Unless we need to quickly add an index to fix a performance problem, it is a good idea to add indexes in staging first.
Changing the storage engine
ALTER TABLE allows to change a table’s storage engine, for example from InnoDB to BLACKHOLE. But we should keep in mind that some features, like foreign keys and virtual columns, are only supported by some storage engines. When we try to change the storage engine and those features are present, the
ALTER TABLE may fail; and that’s the best case. But it could also succeed. If we have foreign keys, for example, they will silently disappear.
Unfortunate code deploys can be reverted. Schema changes cannot. If a migration contains a mistake, the problem can only be fixed with another migration. Developers should be careful with schema changes, and DBAs should always review these changes.
This aspect is particularly scared when it comes to run
DROP TABLE or
ALTER TABLE ... DROP COLUMN statements, because they destroy data. This is the reason why I wrote the article Dropping database objects in a safe way.
MariaDB temporal tables (or system-versioned tables) keep the history of each row. But they don’t keep the history of table structures. If it is modified, history is somehow changed accordingly. If a new column is added, it will be added to all old rows. If a column is dropped, its value will disappear from the history. By default,
ALTER TABLE is not allowed on temporal tables.
If you encountered other types of problems with migrations, please let me know with a comment, I am curious.
In later articles we will see what can be done to mitigate the risks.
A personal note
Do you have big tables? Write-intensive tables? You don’t feel confident enough to run heavy
ALTER TABLEs on them?
Don’t get stuck with the current version of your schema, putting a hard limit to innovation and development. I can help you with migrations, and other delicate tasks. Please check my services.