READ ONLY transactions in MySQL

Duke Humfrey’s Library – Bodleian Library, Oxford, Europe

Transactions are the least known RDBMS features. Everyone knows they exist, but few know how they work. For this reason, it is not obvious that they can be read only. This article explains what read only transactions are and why we should use them when appropriate.

What is a read only transaction?

An objection I’ve heard more than once is: transactions are for writes, SELECTs don’t even happen inside a transaction! But this is wrong. Transactions are not only about ROLLBACK and COMMIT. They are also about isolation.

As a general rule, this is what happens. When you start a transaction we acquire a view, or snapshot, on data. Until the end of the transaction we will see that snapshot, not modifications made by other connections.

Reads are generally non-locking. After you’ve read a row, other users can still modify it, even if your transaction is still in progress. Internally, MySQL will keep the information that allow to “rebuild” data as they were when you acquired the snapshot, so you can still see them. This information is preserved in the transaction logs.

To be clear, we actually always use transactions. By default, we run queries in autocommit mode, and we don’t start or end transactions explicitly. But even in this case, every SQL statement we run is a transaction.

There is no such thing as reading objective data. We always read from (or write to) snapshots acquired at a certain point of time, that include all the changes made by committed transactions.

Why should we use read only transactions?

The answer should be obvious now. When we want to run more than one SELECT, and we want the results to be consistent, we should do it in a read only transaction. For example, we could do something like this:

SET @date := CURRENT_DATE();
SELECT COUNT(DISTINCT user_id)
    FROM product WHERE date = @date;
SELECT COUNT(DISTINCT product_id)
    FROM product WHERE date = @date;

If we want the two results to reflect the reality at a certain point in time, we need to do this in a transaction:

START TRANSACTION READ ONLY;
SET @date := CURRENT_DATE();
SELECT COUNT(DISTINCT user_id)
    FROM product WHERE date = @date;
SELECT COUNT(DISTINCT product_id)
    FROM product WHERE date = @date;
COMMIT;

The question implies something: why should we use a read only transaction, when we can use a regular read write transaction? If for some reason we try to write something, the statement will fail. So it’s simpler to just always use read write transactions.

The reason for using read only transactions is mentioned in the MySQL manual:

InnoDB can avoid the overhead associated with setting up the transaction ID (TRX_ID field) for transactions that are known to be read-only. A transaction ID is only needed for a transaction that might perform write operations or locking reads such as SELECT ... FOR UPDATE. Eliminating unnecessary transaction IDs reduces the size of internal data structures that are consulted each time a query or data change statement constructs a read view.

8.5.3 Optimizing InnoDB Read-Only Transactions – 25 August 2019

Isolation levels

Isolation level is a setting that defines how much the current transaction is isolated from concurrent transactions. It can be changed at session level, before starting the next transaction. There is also a global setting, which takes effect for sessions that don’t change their isolation level. By default it is REPEATABLE READ.

Isolation levels are:

  • READ UNCOMMITTED;
  • READ COMMITTED;
  • REPEATABLE READ.

(We’re skipping SERIALIZABLE here, as it’s just a variation of REPEATABLE READ and it does not make sense with read only transactions)

This list is only valid for MySQL. Different DBMSs support different isolation levels.

The behavior described above refers to REPEATABLE READ. To recap, it acquires a snapshot of the data, which is used for the whole duration of the transaction.

Another isolation level is READ COMMITTED. It acquires a new snapshot for each query in the transaction. But this does not make sense for a read only transaction.

A usually underestimated isolation level is READ UNCOMMITTED. The reason why it is underestimated is that it reads data that are being written by other transactions, but are not yet committed – and possibly, will never be. For this reason, highly inconsistent data may be returned.

However, this can be an important optimisation. In my experience, it can make some queries much faster and greatly reduce CPU usage, especially when you are running expensive analytics SELECTs on write-intensive servers.

There are cases when results inconsistencies don’t matter. A typical example is running aggregations on many rows: an average on one million values will not change sensibly. Another example is when reading session-level data, that no one else is modifying; or old data, that no one is supposed to modify anymore.

If you want to use READ UNCOMMITTED, there is no reason to wrap multiple queries in a single transaction. For each query, you can run something like this:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
START TRANSACTION;
SELECT COUNT(*) FROM product;
COMMIT;

MetaData Lock and the rollback trick

When accessing a table using a non-transactional engine (like MyISAM) inside a transaction, MySQL will return a warning. They seem to assume that there is no reason to access such tables from a transaction, which is false. MariaDB doesn’t return such warning.

In reality, transactions acquire metadata locks (MDL). This means that other connections will not be able to run an ALTER TABLE on those tables until the transaction finishes. Preventing such problems is very important and should be done every time it’s useful.

mysqldump rolls back

mysqldump --single-transaction uses default read write transactions, not read only ones. My guess is that they don’t consider it important for a single long transaction, but only for many concurrent transactions.

However, the MariaDB version of mysqldump does something interesting with rollbacks:

ROLLBACK TO SAVEPOINT in --single-transaction mode to release metadata lock on table which was already dumped. This allows to avoid blocking concurrent DDL on this table without sacrificing correctness, as we won't access table second time and dumps created by --single-transaction mode have validity point at the start of transaction anyway.

Note that this doesn't make --single-transaction mode with concurrent DDL safe in general case. It just improves situation for people for whom it might be working.

mysqldump in MariaDB repo, taken from commit 624dd71b9419555eca8baadc695e3376de72286f

In other words:

  • An MDL blocks some operations, so it can be a problem just like any other lock, especially in a long transaction;
  • An MDL is only useful for data that we are going to access again, just like any other lock;
  • Transactions support savepoints in the middle of a transaction, and we can rollback to a savepoint instead of rolling the whole transaction back;
  • Doing so releases the MDLs on objects that were only accessed after the savepoint.

See also

Related courses:

Conclusions

Here I wanted to advertise a feature that, in my opinion, should be used much more often. Actually, if you know from the beginning that a transaction is read only, there is no reason not to inform MySQL about that.

We discussed which isolation levels make sense for read only transactions, and the nice ROLLBACK TO SAVEPOINT trick used by mysqldump for MariaDB.

Did you notice any mistake? Do you have ideas to contribute?
Please comment!

As usual, I’ll be happy to fix errors and discuss your ideas.

Toodle pip,
Federico

Photo source: Wikipedia

Comments (9)

  1. I have encountered some bad practice with the READ UNCOMMITTED isolation in a past job. Apparently in 2010 someone wrote this article:
    http://itecsoftware.com/with-nolock-table-hint-equivalent-for-mysql

    In MS SQL the NOLOCK hint exists because in READ COMMITTED transaction isolation on MS SQL reads can lock out writes and vice versa. In some cases our developers wished to don’t place a (very short lived) lock to get a snapshot while reading data, so they used the NOLOCK hint in MS SQL. When redesigning the application for MySQL they came across this blogpost they picked up the MySQL “equivalent”. While it is true that the effect of NOLOCK is similar to the MySQL READ UNCOMMITTED isolation mode, the origin of NOLOCK does not apply to MySQL in 99% of the cases.

    Not understanding this difference the developers implemented the advice from the blogpost anyway. This meant they changed the transaction isolation level of the session to READ UNCOMMITTED, like in your example, selected the data and didn’t change it afterwards (because the connection gets “cleaned up” by GC or PHP). Our application was connected to the database via ProxySQL, and as this was multiplexing connections the change in transaction isolation applied to all connections. This meant the entire application ran in READ UNCOMMITTED mode (causing all sorts of havoc) because developers were afraid small read queries could potentially lock writes.

    1. Hi Art!
      That’s probably an overoptimisation. But anyway, it is very unfortunate that ProxySQL is not aware of transaction isolation levels:
      https://github.com/sysown/proxysql/issues/508
      In my understanding, it is very possible that:
      – Client1 changes transaction isolation level to READ UNCOMMITTED, and the query is sent to Server1
      – Client2 starts a connection, and the query is sent to Server1, so the transaction is READ UNCOMMITTED
      – Client1 runs queries that go to Server2, so it actually uses REPEATABLE READ

      1. It wasn’t an overoptimization as they were suffering with read-locks on the MS SQL code base. So when they rewrote all that stuff from .net/MS SQL to PHP/MySQL they simply migrated this NOLIMIT as well.
        You are entirely correct on that workflow and that happened to us as well. I only noticed this after looking at the query log. I found out that SET TRANSACTION ISOLATION READ UNCOMMITTED was one of the most occurring ones and started asking questions. Luckily their new stack was only live for one (very large) customer and we were able to fix this before it actually ended up biting us in the ass. 😉

  2. Just a nitpick, but: in your sql example, you should not be using two separate calls to the `date()` function. Even when used inside a transaction, there is a risk that, when run just before midnight, the 2nd query will operate on the successive day of the 1st query

    1. Hi gggeek,
      You are correct. Thanks for reporting.
      I fixed the bug and, in doing so, I’ve noticed that DATE() requires an argument, so I replaced it with CURRENT_DATE().

  3. Hi,
    I just wonder how is the REPEATABLE READ isolation level implemented in MySQL?
    Normally I would say that the sentence “…SERIALIZABLE is just a variation of REPEATABLE READ and it does not make sense with read only transactions” is wrong.
    The REPEATABLE READ does not address the PHANTOM READ phenomena.
    Which could basically mean when a parallel transaction creates a new row in the table ‘product’ then the subsequent reads from the table may return different results even they are run under the same transaction.

    1. Hi Ondra,
      Great question. MySQL is a bit strange here: with REPEATABLE READ, InnoDB uses next-key locking. This prevents phantom reads.
      SERIALIZABLE is just REPEATABLE READ, except that every SELECT behaves like SELECT … LOCK IN SHARE MODE.

Leave a Reply

Your email address will not be published. Required fields are marked *