Last Updated on
Reducing the number of queries is important with any DBMS. There are two reasons for this: reducing the latency of applications; and reducing the workload for the database servers.
It should be obvious that reducing the number of queries reduces the latency. A web page is not completely loaded until all queries are executed. And a query is not just a statement like a string concatenation; it involves sending a text over a network, and waiting for the results. I have already mentioned the cost of slow web pages in the article Setting performance and availability objectives.
But also remember that the DBMS is usually your application’s bottleneck, and we shouldn’t increase its workload without a reason. While simple queries are fast, they always involve a certain amount of work that we tend to forget: listening to data sent by clients, parsing the query, mapping the table/column names to actual data structures, checking permissions, optimising the query… and more. I already suggested a couple of ways to avoid queries that do nothing in the article MySQL/MariaDB: run less queries! (thanks to performance_schema).
Here we will see how to use SQL in a more expressive way to reduce the number of queries we run. This principle applies to any database, but this article focuses on MySQL and MariaDB syntax.
In relational databases, the following situation occurs quite often: we insert a new row into a table, we let the DBMS generate the id automatically, and then we need to know that id to insert another row into a linked table. For example we create a row in the order table, and than we create a row in ordered_product – which will contain the order_id.
MySQL has the
LAST_INSERT_ID() function, which returns the last id generated by the current session. This allows to insert rows in this way:
INSERT INTO order (id, user_id, date) VALUES (DEFAULT, 24, DATE()); INSERT INTO ordered_product (order_id, product_id) VALUES (LAST_INSERT_ID(), 12);
Inserting multiple rows
When we insert several rows into the same table, there is no need to run a separate
INSERT statement for each row. With MySQL, it is possible to insert many rows with a single statement:
INSERT INTO usert (first_name, last_name) VALUES ('Emma', 'Goldman'), ('Ada', 'Lovelace'), ('John', 'Doe');
Sometimes we want to insert a row that could be already there. In this context, “already there” means that trying to insert it will violate a
UNIQUE constraint or the primary key uniqueness. If the row is already there, depending on the situation, we may want to:
- Receive an error/warning and insert nothing;
- Replace the old row or some of the columns.
The most commonly used solution is to run a
SELECT to check if the row is present, and then run an
INSERT if the row is not found; otherwise, run nothing or an
UPDATE, or even a
INSERT. For all these cases, I propose a better solution.
If you don’t want to insert anything in case the row is already there, you can just run the
INSERT and check if it returned an error. If that’s the case, the transaction will not roll back automatically.
If you are running a multi-row
INSERT, you may want to insert the rows that are not duplicate. To do so, use
INSERT IGNORE. For each row that cannot be inserted, you will receive a warning, not an error, and non-duplicate rows will still be inserted. If you receive an error, you know that the whole statement failed for some other reason and nothing was written.
To entirely replace a row, the simplest solution is to use
REPLACE. I don’t suggest to do so. If you want to, be sure to understand what happens when you have foreign keys or triggers – and if you don’t have them now, ask yourself what will happen if they are added later. Even
AUTO_INCREMENT behaviour with
REPLACE is buggy (pardon, I mean: is interesting).
Modifying a value
Sometimes we know that a value is part of a certain row, and we need to modify it. Common examples are increasing a counter by one when a page is visited, or decreasing the quantity of products in stack. A less common example is modifying a string, by adding a suffix.
In such cases, the most common solution is to read a value from the database, modifying it in the application, and then writing it. So we have a
SELECT followed by an
Normally, you can achieve the same result in this way:
UPDATE product SET qty = qty - 1 WHERE id = 24;
(For string concatenation, check an older article: How to concatenate strings in MySQL and MariaDB)
But what if we also need to know the value? Well, we can use variables:
UPDATE product SET qty = (@qty := qty - 1) WHERE id = 24;
Unfortunately, while this still works in MySQL 8.0, we will get this error:
mysql> SHOW WARNINGS \G *************************** 1. row *************************** Level: Warning Code: 1287 Message: Setting user variables within expressions is deprecated and will be removed in a future release. Consider alternatives: 'SET variable=expression, ...', or 'SELECT expression(s) INTO variables(s)'. 1 row in set (0.00 sec)
They did this change because variables are sometimes set in a
SELECT as a dirty workaround because window functions were historically not supported in MySQL. However MySQL supports them, and this hack is not needed anymore.
Except that this particular case has nothing to do with windowing functions. The solution proposed in the error message doesn’t help here.
Multi-table DELETE and UPDATE
Suppose you want to delete or modify some posts, and all comments associated to them. You don’t need to run
UPDATE on two tables separately. MySQL supports multi-table
DELETE and multi-table
UPDATE instead, which are somewhat similar to
I recommend to check the documentation pages I linked. Using subqueries (
UPDATE ... WHERE if IN (SELECT ...)) may be more intuitive, but the optimiser will not be able to choose a proper query plan.
In InnoDB (MySQL storage engine) nothing happens outside a transaction. It doesn’t really matter if we start and commit a transaction implicitly or not. Therefore the following pieces of code are equivalent:
SET autocommit := 1; INSERT INTO t (a, b, c) VALUES (1, 2, 3); START TRANSACTION; INSERT INTO t (a, b, c) VALUES (1, 2, 3); COMMIT;
ON by default. So, unless we changed it explicitly, we can just run:
INSERT INTO t (a, b, c) VALUES (1, 2, 3);
However, I saw ORMs enclosing single-statement transactions in
START TRANSACTION and
COMMIT statements. Try to avoid that.
AND CHAIN, AND RELEASE
When we complete a transaction, successfully or not, we probably want to start another or to disconnect from MySQL.
To start another transaction, we could use this syntax:
COMMIT AND CHAIN; ROLLBACK AND CHAIN;
No lock or snapshot is taken until we write or read some data, so there is no drawback in using this feature. We simply avoid to run one more
To close the connection when the transaction ends:
COMMIT AND RELEASE; ROLLBACK AND RELEASE;
This doesn’t avoid to run another SQL instruction. But it shortens the time we stay connected to MySQL or the proxy.
If this is a web page or application that will reconnect soon to MySQL or a proxy, it may be better not to disconnect at all. In any case, if we use a proxy, it probably shouldn’t disconnect from MySQL.
You may want to make the or the default behavior, for the current session or globally, using the
completion_type variable. This would be ok for a database abstraction library which handles the transactions, but I don’t suggest to use it in other cases, because this is not a well-known feature, and it would confuse developers.
Stored procedures and functions
I’ve already mentioned why stored programs can potentially be a huge performance optimisation, in the article 3 good reasons to use Stored Procedures. Let me summarise those reasons again:
- Less statements sent from the client to the server (one
CALLstatement instead of several queries);
- Less results sent from the server to the client (intermediate results are handled inside the stored procedure);
- Less locks – the key here is to understand that locking problems remain, but at least they should be held for a short time.
I want to remind you about a particular type of stored programs: stored aggregate functions, currently supported in MariaDB but not MySQL. See my article MariaDB Stored Aggregate Functions by examples. With aggregate functions, you may loop on a huge amount of rows and return a single value, saving a lot of client-server traffic.
- When invoking a stored function which writes some data and returns a potentially big resultset, but sometimes we have no use for the resultset. If we are doing this, however, we are probably using a function to do something that should be done by a procedure.
- When we run functions or queries because of their side effects inside a stored procedure, and we don’t want to send multiple resultsets to the client.
INSERT SELECT and CREATE TABLE SELECT
We have two useful statements to copy rows between tables:
INSERT SELECTcopies the result of a
SELECTto an existing table;
CREATE TABLEThis will reduce server resource usage
SELECTcopies it to a new table.
Typically, these are one time operations, so using these statements is convenient but it’s not a relevant performance optimisation. However sometimes they are used in cron jobs, for example to automate the archival of old rows, moving them from current tables into compressed or remote tables.
DELETE … RETURNING
Sometimes we want to read a row and then delete it. These operations can be combined into one in MariaDB, with the
DELETE RETURNING statement:
DELETE FROM user WHERE id = 24 RETURNING email;
However, if our only goal is to delete a row that could exist or not, we can just try to delete it. Then we check the number of affected rows, and if it is 1 we have deleted a row.
Unfortunately we cannot insert the returned values into another table, in other words there is no
INSERT DELETE RETURNING. To do this, we need to run two separate queries.
In a search page, the results are often paged (for example a website could display 10 results per page, just like Google). It is not an uncommon practice to show the total number of results found.
Let me say that, in most cases, I consider that as a bad practice – because the query to show that number can be horribly slow. Also, it is a common practice to use
ORDER BY and
LIMIT to page the results. This is also a bad practice.
But if you are doing it, there’s at least something that you can do to improve the search page performance without a big effort. Instead of running a separate query with
COUNT(*) to get the number of total results, just add
SQL_CALC_FOUND_ROWS to your query and then run a
FOUND_ROWS()are deprecated in MySQL 8 and will be removed in a future version of MySQL. As a consequence, I can only recommend them on MariaDB – keeping in mind that they are just a mitigation of a bad practice.
SET and SHOW
This is not a hint about something you should do; it is more about what you should avoid. ORMs tend to abuse of statements like
SET is generally ran when a session starts, or even before each query. It’s used to set some session variables to be sure, for example, that MySQL will expect the correct character set to be used. However, this is only necessary if the MySQL configuration needs some corrections. If it is the case, it’s better to fix it. Then, the ORM should be instructed not to run
SET statements, and just assume that the configuration is correct.
That said, if you want to use
SET, keep in mind that you can set more than one variable with the same statement:
SET @my_var := 1, @your_var := 2, @her_var := 3;
SHOW statements (
SHOW COLUMNS, …) are used by some ORMs to retrieve the remote table definitions from the database. If this is done for every new session, this operation consume resources. The ORM should already know the tables structures instead.
I remember that in a company where I have worked, PMM per-statement graphs shown that
SHOW CREATE TABLE was ran up to 2K times per second when worker threads started. No one had idea that this happened, because it was done by the ORM transparently. However, this should be avoided.
- MySQL/MariaDB: run less queries! (thanks to performance_schema) – Find (and eliminate) queries that always return an error, or never return any rows
When it comes to optimising an application performance, what we need to do is to find the slow queries and fix them or build proper indexes. But sensibly reducing the number of executed queries and communication between the application and the database also helps. This will reduce server resource usage, in particular CPU and network.
The statements I’ve mentioned are part of the SQL language supported by MySQL and MariaDB, so there is no reason not to use them. ORM authors should implement them, and developers should consider running at least part of their queries without using an ORM.
Do you have more ideas? Did I forget something, or made some mistakes? Please let me know in the comments!