Last Updated on
Updating triggers online means that we are able to do this in production, while our applications are running, without causing a disruption.
To achieve this, we need to run
CREATE TRIGGER and
DROP TRIGGER inside a transaction, but this is not supported by MySQL and MariaDB. If we try to do so, the current transaction is implicitly committed. MariaDB has a variable called
in_transaction that allows us to demonstrate this clearly:
MariaDB [test]> START TRANSACTION; Query OK, 0 rows affected (0.000 sec) MariaDB [test]> SELECT @@in_transaction; +------------------+ | @@in_transaction | +------------------+ | 1 | +------------------+ 1 row in set (0.000 sec) MariaDB [test]> DROP TRIGGER bi_t; Query OK, 0 rows affected (0.001 sec) MariaDB [test]> SELECT @@in_transaction; +------------------+ | @@in_transaction | +------------------+ | 0 | +------------------+ 1 row in set (0.000 sec)
The problem is actually more generic: MySQL and MariaDB don’t support transactional DDL. But let’s focus on triggers. I’ll explain what problems this causes, and then I will propose a solution.
The problems are:
- Phantom trigger – it’s about every single updated trigger;
- Inconsistent triggers – it’s about a table’s triggers as a whole;
- Atomicity – it’s about failures.
If I want to replace one trigger with another, as mentioned, I will need to run
DROP TRIGGER followed by
CREATE TRIGGER. This implies that, in the (hopefully brief) interval between these expressions, the trigger will simply not exist.
If I run these statements in a transaction, this is not a problem. In fact, for other users, both the statements will take effect together when I
COMMIT the transaction. But again, this cannot happen in MySQL and MariaDB. On a heavily used server, we should definitely expect some users to write the table without activating a trigger.
However, MariaDB has a solution for this problem:
CREATE OR REPLACE TRIGGER. In this way the existing trigger (if any) will be destroyed and the new one will be created, but this operation is atomic. No other user can see the table without a trigger.
Even with the first problem solved, in most cases we have a second problem: we have multiple triggers associated to a table, and we always want them to be consistent each other.
This is better explained with an example. Suppose that we have triggers
BEFORE INSERT and
UPDATE. These triggers perform some data validation, and return an error if some data in not valid. If you want to know how to do this, take a look at an article I wrote some years ago in my personal blog: Validating data using a TRIGGER.
Anyway, the problem here is that, if we change the validation rules in one trigger, we want to change them in the other trigger also. In no point in time we want the triggers to be inconsistent.
Our triggers operations should completely succeed or completely fail. If, for example, we lose our connection in the middle of these changes, all the changes we’ve done should be undone, to avoid inconsistencies.
If you used MySQL for centuries like me, you may remember the
LOCK TABLES statement. And probably you remember it with horror. It was developed in ancient times as an alternative to transactions, when MyISAM (or probably its ancestor, ISAM) was the default storage engine and InnoDB simply didn’t exist. Basically, since you could not use a real transaction, you locked a whole table with a read lock or a write lock. Then you did some stuff, and then you ran
UNLOCK TABLES. Operations executed inside the
LOCK block were not atomic, but at least other users could not interfere with them. Nowadays, I only see that command in legacy code – but yes, I still see it.
LOCK TABLES help in this case? Because the lock survives statements like
DROP TRIGGER and
So basically what you would do is:
LOCK TABLE something WRITE;
- the command is queued until all existing metadata locks (transactions or DDL in progress) are released; this would happen even if we directly run
What can go wrong
The above solution seems to me good for most cases, but it is not perfect. I encourage you to read this section to understand which problems you may encounter.
Lost connection or server crash
We are emulating an atomic series of operations, but the logic that makes it atomic is implemented remotely. Without a built-in solution, some possible problems cannot be addressed:
- Lost connections: If the connection is lost, the table will be unlocked, and the triggers already created will be used by all users. Some triggers however are possibly needed but not created yet.
- mysqld crash: When mysqld crashes, of course we restart it. But on restart, tables will be unlocked, and possibly some of the triggers are already already created, while others are not.
Both these risks cannot be avoided. What we can do to mitigate them is to be fast and, if something bad happens, re-establish the connection and complete the migration as soon as possible.
LOCK TABLES and Galera
In the comments, a concern was raised about LOCK TABLES not working as expected as Galera.
LOCK TABLES seems to work perfectly on the node where it is executed. The problem is that the lock is not propagated to other nodes.
So I would advise it, if one of these best practices is followed:
- All writes go to the same node;
- Writes to each set of databases (or at least tables) go to the same node.
If so, you are reasonably safe. If you have automated failover and you want to be extra-safe, run
LOCK TABLES and
UNLOCK TABLES on all nodes. If one of them cannot be reached from your host but it seems to be up, you may prefer to postpone the migration until the problem is solved.
Waiting for a metadata lock
As mentioned, any DDL is subject to wait for a metadata lock to be acquired for a long time. Even a long-running SELECT may lead to this problem. Such wait may block replication.
There is no way to avoid this. When it happens, you may want to kill the connection and postpone the migration; or you can kill the long running transaction/query to unblock the migration.
What are the drawbacks of this solution?
- If the connection is lost at some point, no rollback happens. In other word, the Atomicity problem mentioned above is not solved in any way.
- If a long transaction is running, our
LOCK TABLEstatement will be queued for a long time. And it will cause several other statements (including
SELECTs) to be queue. But this also happens with a simple
- For a brief moment, the table will be completely write-locked.
What are the pro’s?
- No disruption.
- Isolation between concurrent sessions.
- Once the lock is acquired, the operations on triggers will be fast.
Other methods are possible, but they are more complicated and not necessarily less risky. For example, if you use row-based replication, you could update the triggers on a slave, promote it to master, and then update the triggers on the master. But then you have a risk of conflicts during the failover.
All in all, I believe that this method is good enough for most situations.
If you found this article useful,
you may consider to contact me for the training:
MySQL Stored Procedures
Photo credit: Stefano Trucco