Last Updated on
Many developers and database professionals don’t like stored procedures. They often have good reasons for that. For example, procedures add more work into the database server, that is often the application’s bottleneck. This means that more CPU time is used, so procedures can increase the costs. It is not a coincidence that Oracle developed an incredibly advanced stored procedure system and they make us pay a price per CPU. And so on.
In PostgreSQL the situation is completely different, because Postgres has a very advanced implementation. This is the reason why this article focuses on the MySQL ecosystem. But the benefits that I will list are valid for all database systems.Are stored procedures always slower and not scalable?
Are MySQL stored procedures evil?
In the MySQL world, there are even more reasons to avoid stored procedures. The language is limited, verbose and not flexible. There is no native debugging facility. And they are dead slow. Run a simple loop from 1 to 1M – you won’t believe your system clock. The problem is about with queries – their speed will not change if you move them into stored procedures. But stored procedures code is slow.
Oracle developed a mechanism to be able to compile stored procedures in any language in the GraalVM. Even if being bound to a particular VM sounds irritating, the news is still exciting, right? Except that in practice it is not true. They advertised something that most probably will never be released.
Anyway, all these objections to stored procedures can be defeated in some contexts. For example, one can get used to the language – I did. And sometimes you don’t care too much about performance. An administrative stored procedure that runs once a week, for example. Or procedures for analytics.
But of course in most cases speed matters. But even in those cases, there are rare situations when a procedure improves the performance.
When stored procedures are faster
Let me list my points in favor of stored procedures:
- Less queries;
- Less results;
I will show how these things play in favor of better performance and more scalability – in some cases.
1 – Less queries
A stored procedure may contain several queries. Grouping queries in a procedure avoids some of the work that the DBMS does when it receives a query from a client.
Unfortunately, in MySQL and MariaDB, only the work related to the network communication is avoided. Procedures are not compiled, so the queries still need to be parsed (unless prepared statements are used).
Still, avoiding some network communication relieves some work from the server and the clients. It also makes slowdowns and failures less likely, if the network subsystems or the network itself have problems.
2 – Less results
This is potentially much more important. If our resultsets are big, sending them to the clients could be slow, and our network may suffer. This is also the reason why database professionals often insist that using
SELECT * is not a good idea.
Usually, less results is also a motivation against stored procedures. If rows must be processed in a loop, that should happen in the clients, not in the database server – because, as we mentioned, the database server is often the bottleneck. Cursors are based on temporary tables, so they consume memory. If many users use temporary tables, the server’s memory could be insufficient.
But there are exceptions. For example, what if this elaboration can be done once a week at a very low-traffic time, and then cached into a table? Potentially, this is not a heavy operation. If elaborated data can be cached in a database, that is often a good idea.
I know people who will be horrified reading this. Many think that caching should only happen in proper cache technologies, like Redis pr Memcached. But there are many types of cache. If we need to join those cached data to other data stored in a relational database, we should store them in the DBMS itself – any other solution has an unjustified complexity.
3 – Shorter locks
When we access some data and afterwards we want to modify or delete them, there are two cases:
- We just do it, hoping that no one will touch that data in the middle of the process. Developers understand the theoretical risks of course. But in practice, for various reasons, most of the times they modify data in an insecure way.
- Use a transaction, and if necessary make sure that the
SELECTs are locking.
These cases remain the same if we access data from a stored procedure. But there is a difference: instead of sending data to the client and waiting for the network overhead and client elaborations, we just do the elaboration immediately – and if we use locks, we release them immediately.
In the first case, we reduce the risk. This is a pleasant side effect, but I won’t argue that procedures are a solution. The solution is to use transactions properly – sorry if this sounds irritating.
In the second case, locks are held for a smaller amount of time. Don’t get me wrong: we will still see locks and potentially deadlocks. But the problem is alleviated. This is the same reason why developers should do their best to avoid retrieving data via HTTP or reading files in the middle of a transaction (yes, I saw this multiple times, and if you analyse your code you may find out that you do the same- that said, there can be good reasons for this).
Don’t get me wrong: in most cases, writing stored procedures is not a good idea. Especially with MySQL and MariaDB.
As I wrote, there are exceptions. I wanted to show the reasons why sometimes it is a good idea.
You may wonder about practical cases. Usually they are quite complex and strictly bound to the context.
A generic example is about means. MySQL only supports a built-in function for the non-weighted arithmetic average. MariaDB has it, plus
PERCENTILE_CONT() and ita special case
MEDIAN(). The mode can be found with a
But suppose that we want to calculate a geometric mean over 10M rows. We don’t want to send all those rows to a client when we could send a single value. Instead, we can write a stored procedure.
In MySQL it will still be far from being optimal, because we will need a cursor – as mentioned, cursors are based on temporary tables. In MariaDB 10.3 we can use an aggregate stored function instead. I’ll show this in a future article.
Do you use stored procedures? Please tell me how.
Did you have some ideas after reading this post? Do you disagree with me? Please write a comment below, or a post in your blog with a link to this article.
I’ll be looking forward to read your opinions!