MySQL/MariaDB: use SQL properly to run less queries

Popeye and Talky
Does your application talk too much
to the database?

Reducing the number of queries is important with any DBMS. There are two reasons for this: reducing the latency of applications; and reducing the workload for the database servers.

It should be obvious that reducing the number of queries reduces the latency. A web page is not completely loaded until all queries are executed. And a query is not just a statement like a string concatenation; it involves sending a text over a network, and waiting for the results. I have already mentioned the cost of slow web pages in the article Setting performance and availability objectives.

But also remember that the DBMS is usually your application’s bottleneck, and we shouldn’t increase its workload without a reason. While simple queries are fast, they always involve a certain amount of work that we tend to forget: listening to data sent by clients, parsing the query, mapping the table/column names to actual data structures, checking permissions, optimising the query… and more. I already suggested a couple of ways to avoid queries that do nothing in the article MySQL/MariaDB: run less queries! (thanks to performance_schema).

Here we will see how to use SQL in a more expressive way to reduce the number of queries we run. This principle applies to any database, but this article focuses on MySQL and MariaDB syntax.

LAST_INSERT_ID()

In relational databases, the following situation occurs quite often: we insert a new row into a table, we let the DBMS generate the id automatically, and then we need to know that id to insert another row into a linked table. For example we create a row in the order table, and than we create a row in ordered_product – which will contain the order_id.

MySQL has the LAST_INSERT_ID() function, which returns the last id generated by the current session. This allows to insert rows in this way:

INSERT INTO order (id, user_id, date)
    VALUES (DEFAULT, 24, DATE());
INSERT INTO ordered_product (order_id, product_id)
    VALUES (LAST_INSERT_ID(), 12);

Inserting multiple rows

When we insert several rows into the same table, there is no need to run a separate INSERT statement for each row. With MySQL, it is possible to insert many rows with a single statement:

INSERT INTO usert (first_name, last_name) VALUES
    ('Emma', 'Goldman'),
    ('Ada', 'Lovelace'),
    ('John', 'Doe');

Duplicate rows

Sometimes we want to insert a row that could be already there. In this context, “already there” means that trying to insert it will violate a UNIQUE constraint or the primary key uniqueness. If the row is already there, depending on the situation, we may want to:

  1. Receive an error/warning and insert nothing;
  2. Replace the old row or some of the columns.

The most commonly used solution is to run a SELECT to check if the row is present, and then run an INSERT if the row is not found; otherwise, run nothing or an UPDATE, or even a DELETE + INSERT. For all these cases, I propose a better solution.

If you don’t want to insert anything in case the row is already there, you can just run the INSERT and check if it returned an error. If that’s the case, the transaction will not roll back automatically.

If you are running a multi-row INSERT, you may want to insert the rows that are not duplicate. To do so, use INSERT IGNORE. For each row that cannot be inserted, you will receive a warning, not an error, and non-duplicate rows will still be inserted. If you receive an error, you know that the whole statement failed for some other reason and nothing was written.

To entirely replace a row, the simplest solution is to use REPLACE. I don’t suggest to do so. If you want to, be sure to understand what happens when you have foreign keys or triggers – and if you don’t have them now, ask yourself what will happen if they are added later. Even AUTO_INCREMENT behaviour with REPLACE is buggy (pardon, I mean: is interesting).

Instead, we can use INSERT ... ON DUPLICATE KEY UPDATE. This allows to replace all columns (maybe setting some of them to DEFAULT()) or some of them.

Modifying a value

Sometimes we know that a value is part of a certain row, and we need to modify it. Common examples are increasing a counter by one when a page is visited, or decreasing the quantity of products in stack. A less common example is modifying a string, by adding a suffix.

In such cases, the most common solution is to read a value from the database, modifying it in the application, and then writing it. So we have a SELECT followed by an UPDATE.

Normally, you can achieve the same result in this way:

UPDATE product SET qty = qty - 1 WHERE id = 24;

(For string concatenation, check an older article: How to concatenate strings in MySQL and MariaDB)

But what if we also need to know the value? Well, we can use variables:

UPDATE product
    SET qty = (@qty := qty - 1)
    WHERE id = 24;

Unfortunately, while this still works in MySQL 8.0, we will get this error:

mysql> SHOW WARNINGS \G
*************************** 1. row ***************************
  Level: Warning
   Code: 1287
Message: Setting user variables within expressions is deprecated and will be removed in a future release. Consider alternatives: 'SET variable=expression, ...', or 'SELECT expression(s) INTO variables(s)'.
1 row in set (0.00 sec)

They did this change because variables are sometimes set in a SELECT as a dirty workaround because window functions were historically not supported in MySQL. However MySQL supports them, and this hack is not needed anymore.

Except that this particular case has nothing to do with windowing functions. The solution proposed in the error message doesn’t help here.

Multi-table DELETE and UPDATE

Suppose you want to delete or modify some posts, and all comments associated to them. You don’t need to run DELETE or UPDATE on two tables separately. MySQL supports multi-table DELETE and multi-table UPDATE instead, which are somewhat similar to JOINs.

I recommend to check the documentation pages I linked. Using subqueries (UPDATE ... WHERE if IN (SELECT ...)) may be more intuitive, but the optimiser will not be able to choose a proper query plan.

Autocommit

In InnoDB (MySQL storage engine) nothing happens outside a transaction. It doesn’t really matter if we start and commit a transaction implicitly or not. Therefore the following pieces of code are equivalent:

SET autocommit := 1;
INSERT INTO t (a, b, c) VALUES (1, 2, 3);

START TRANSACTION;
INSERT INTO t (a, b, c) VALUES (1, 2, 3);
COMMIT;

Autocommit is ON by default. So, unless we changed it explicitly, we can just run:

INSERT INTO t (a, b, c) VALUES (1, 2, 3);

However, I saw ORMs enclosing single-statement transactions in START TRANSACTION and COMMIT statements. Try to avoid that.

AND CHAIN, AND RELEASE

When we complete a transaction, successfully or not, we probably want to start another or to disconnect from MySQL.

To start another transaction, we could use this syntax:

COMMIT AND CHAIN;
ROLLBACK AND CHAIN;

No lock or snapshot is taken until we write or read some data, so there is no drawback in using this feature. We simply avoid to run one more START TRANSACTION.

To close the connection when the transaction ends:

COMMIT AND RELEASE;
ROLLBACK AND RELEASE;

This doesn’t avoid to run another SQL instruction. But it shortens the time we stay connected to MySQL or the proxy.

If this is a web page or application that will reconnect soon to MySQL or a proxy, it may be better not to disconnect at all. In any case, if we use a proxy, it probably shouldn’t disconnect from MySQL.

You may want to make the or the default behavior, for the current session or globally, using the completion_type variable. This would be ok for a database abstraction library which handles the transactions, but I don’t suggest to use it in other cases, because this is not a well-known feature, and it would confuse developers.

Stored procedures and functions

I’ve already mentioned why stored programs can potentially be a huge performance optimisation, in the article 3 good reasons to use Stored Procedures. Let me summarise those reasons again:

  • Less statements sent from the client to the server (one CALL statement instead of several queries);
  • Less results sent from the server to the client (intermediate results are handled inside the stored procedure);
  • Less locks – the key here is to understand that locking problems remain, but at least they should be held for a short time.

I want to remind you about a particular type of stored programs: stored aggregate functions, currently supported in MariaDB but not MySQL. See my article MariaDB Stored Aggregate Functions by examples. With aggregate functions, you may loop on a huge amount of rows and return a single value, saving a lot of client-server traffic.

DO

The DO statement is an almost unknown MySQL syntax. It’s purpose is to invoke a function without returning any result to the client. I believe that it is worth using it in some situations:

  • When invoking a stored function which writes some data and returns a potentially big resultset, but sometimes we have no use for the resultset. If we are doing this, however, we are probably using a function to do something that should be done by a procedure.
  • When we run functions or queries because of their side effects inside a stored procedure, and we don’t want to send multiple resultsets to the client.

INSERT SELECT and CREATE TABLE SELECT

We have two useful statements to copy rows between tables:

  • INSERT SELECT copies the result of a SELECT to an existing table;
  • CREATE TABLE This will reduce server resource usageSELECT copies it to a new table.

Typically, these are one time operations, so using these statements is convenient but it’s not a relevant performance optimisation. However sometimes they are used in cron jobs, for example to automate the archival of old rows, moving them from current tables into compressed or remote tables.

DELETE … RETURNING

Sometimes we want to read a row and then delete it. These operations can be combined into one in MariaDB, with the DELETE RETURNING statement:

DELETE FROM user
    WHERE id = 24
    RETURNING email;

However, if our only goal is to delete a row that could exist or not, we can just try to delete it. Then we check the number of affected rows, and if it is 1 we have deleted a row.

Unfortunately we cannot insert the returned values into another table, in other words there is no INSERT DELETE RETURNING. To do this, we need to run two separate queries.

SET and SHOW

This is not a hint about something you should do; it is more about what you should avoid. ORMs tend to abuse of statements like SET and SHOW.

SET is generally ran when a session starts, or even before each query. It’s used to set some session variables to be sure, for example, that MySQL will expect the correct character set to be used. However, this is only necessary if the MySQL configuration needs some corrections. If it is the case, it’s better to fix it. Then, the ORM should be instructed not to run SET statements, and just assume that the configuration is correct.

That said, if you want to use SET, keep in mind that you can set more than one variable with the same statement:

SET @my_var    := 1,
    @your_var  := 2,
    @her_var   := 3;

SHOW statements (SHOW TABLES, SHOW COLUMNS, …) are used by some ORMs to retrieve the remote table definitions from the database. If this is done for every new session, this operation consume resources. The ORM should already know the tables structures instead.

I remember that in a company where I have worked, PMM per-statement graphs shown that SHOW CREATE TABLE was ran up to 2K times per second when worker threads started. No one had idea that this happened, because it was done by the ORM transparently. However, this should be avoided.

See also

Related articles:

Related courses:

Conclusions

When it comes to optimising an application performance, what we need to do is to find the slow queries and fix them or build proper indexes. But sensibly reducing the number of executed queries and communication between the application and the database also helps. This will reduce server resource usage, in particular CPU and network.

The statements I’ve mentioned are part of the SQL language supported by MySQL and MariaDB, so there is no reason not to use them. ORM authors should implement them, and developers should consider running at least part of their queries without using an ORM.

Do you have more ideas? Did I forget something, or made some mistakes? Please let me know in the comments!

Toodle pip,
Federico

Picture credit

Leave a Reply

Your email address will not be published. Required fields are marked *