Last Updated on
In my previous article 3 good reasons to use Stored Procedures, I mentioned that MariaDB 10.3 supports stored aggregate functions. If you don’t know what they are, think about the difference between
MAX(). The former accepts a single value. If your query finds 10 rows, it will call
ABS() 10 times, and 10 rows will be sent to the client.
MAX() takes any number of 1-value rows as in input, and it will always return a single value
In this article I show how to write a stored aggregate function with an example.
We’ll write a function for geometric means. An arithmetic mean is the sum of all elements divided by the number of elements. A geometric mean is similar: it is the Nth root of the product of N elements.
We want to ignore
NULL values, just like the
Let’s go straight to the code:
CREATE OR REPLACE AGGREGATE FUNCTION geometric_mean( p_num INT UNSIGNED ) RETURNS FLOAT BEGIN -- partial result DECLARE v_product INT UNSIGNED DEFAULT 1; -- number of values DECLARE v_count INT UNSIGNED DEFAULT 0; DECLARE EXIT HANDLER FOR NOT FOUND BEGIN RETURN POW(v_product, 1 / v_count); END; agg_main: LOOP FETCH GROUP NEXT ROW; -- skip NULLs IF p_num IS NULL THEN ITERATE agg_main; END IF; -- update the running product and the values count SET v_product := v_product * p_num; SET v_count := v_count + 1; END LOOP; END; SELECT geometric_mean(price) FROM product;
The semantics should not be too obscure, if you know stored functions.
All the declarations are at the beginning of the function body, as usual.
Then there is a loop. This is the core of stored aggregate functions. The loop begins with the statement
FETCH GROUP NEXT ROW. This statement is the only syntax difference between regular and aggregate stored functions. It reads the next row, and copies its values into the corresponding parameters. In this case, it copies the next value of
As mentioned, we want to skip
NULLs. When we find a
NULL we use
ITERATE to jump back to the beginning of the loop, and iterate the next value.
After we’ve read the last row, MariaDB will attempt again to iterate the loop. Since there are no more rows to fetch,
FETCH GROUP NEXT ROW will raise a
NOT FOUND error. It will be handled by the handler, and the correct value will be returned.
Before going to production
The function shown here is an example, and it’s not ready for production. Before going to production:
- You may want to handle the case where no row is provided in input. I suggest returning
NULLto be consistent with
AVG()and other aggregate functions.
- Consider returning an error if a negative number is received in input.
All native aggregate functions (
STDDEV(), etc) accept only one argument in input. Therefore, it may not be obvious that stored aggregate functions can accept any number of parameters (including 0).
In the following example, the function has two parameters and the first non-
NULL value is used to calculate the mean.
CREATE OR REPLACE AGGREGATE FUNCTION coalesce_geometric_mean( p_num1 INT UNSIGNED, p_num2 INT UNSIGNED ) RETURNS FLOAT BEGIN -- first non-NULL element DECLARE v_coalesce INT UNSIGNED DEFAULT NULL; -- partial result DECLARE v_product INT UNSIGNED DEFAULT 1; -- number of values DECLARE v_count INT UNSIGNED DEFAULT 0; DECLARE EXIT HANDLER FOR NOT FOUND BEGIN RETURN POW(v_product, 1 / v_count); END; agg_main: LOOP FETCH GROUP NEXT ROW; SET v_coalesce := COALESCE(p_num1, p_num2); -- skip NULLs IF v_coalesce IS NULL THEN ITERATE agg_main; END IF; -- update the running product and the values count SET v_product := v_product * v_coalesce; SET v_count := v_count + 1; END LOOP; END;
Testing the function
MariaDB support the
VALUES statement. I don’t know any case where this statement makes something possible. But I know cases where it makes something easier, and testing a function is one of those cases.
Instead of creating and populating a table, and modifying it to make more tests, we can directly pass the values to the function. The only problem is that
VALUES doesn’t allow to specify column names. The column names are always equal to the first values we generate. This forces us to use the little trick ther you’ll see in this example:
SELECT coalesce_geometric_mean (`15`, `NULL`) FROM ( VALUES (15, NULL), (NULL, 12), (NULL, 13), (19, NULL), (10, NULL) ) v;
A personal note
My stored procedures training covers MariaDB stored aggregate functions. If you make an intensive use of stored procedures in your job, you may consider attending my training.
This is just an example of aggregate stored functions. It is a very specific case: a geometric mean on integer values. But we may want to have a similar function for decimal values. And there are several other types of mean. And for each of them, a standard deviation or a variance can be calculated.
But an aggregate stored function could also compose structured values. The funcion
GROUP_CONCAT() for example is good for creating lists. And MySQL has the
As mentioned, my favorite use cases for stored aggregate functions is when they read many rows and return a single value. The reason is that I don’t like to send huge result sets over a network, if this can be avoided.
Do you have use cases for stored aggregate functions? Ideas? Doubts?
Please tell me your opinion! I love the discussions that we have in the post comments.
Kudos to Lorenzo Monti for reporting that the function description was inaccurate, and various observations about the code!
Photo credit: this page about being obsessed about drawing a Lorenz attractor. Check it, it’s amazing.