MariaDB Stored Aggregate Functions by examples

Functions can be brilliant.
Meet Lorenz attractor.

In my previous article 3 good reasons to use Stored Procedures, I mentioned that MariaDB 10.3 supports stored aggregate functions. If you don’t know what they are, think about the difference between ABS() and MAX(). The former accepts a single value. If your query finds 10 rows, it will call ABS() 10 times, and 10 rows will be sent to the client. MAX() takes any number of 1-value rows as in input, and it will always return a single value

In this article I show how to write a stored aggregate function with an example.

geometric_mean()

We’ll write a function for geometric means. An arithmetic mean is the sum of all elements divided by the number of elements. A geometric mean is similar: it is the product of the elements divided by the number of elements.

We want to ignore NULL values, just like the AVG() function.

Let’s go straight to the code:

CREATE OR REPLACE AGGREGATE FUNCTION geometric_mean(
        p_num INT UNSIGNED
    )
    RETURNS FLOAT
BEGIN
    -- partial result
    DECLARE v_product INT UNSIGNED DEFAULT 1;
    -- number of values
    DECLARE v_count INT UNSIGNED DEFAULT 0;

    DECLARE EXIT HANDLER
        FOR NOT FOUND
    BEGIN
        RETURN POW(v_product, 1 / v_count);
    END;

    agg_main:
    LOOP
        FETCH GROUP NEXT ROW;

        -- skip NULLs
        IF p_num IS NULL THEN
            ITERATE agg_main;
        END IF;

        -- update the running product and the values count
        SET v_product := v_product * p_num;
        SET v_count := v_count + 1;
    END LOOP;
END;

SELECT geometric_mean(price) FROM product;

The semantics should not be too obscure, if you know stored functions.

All the declarations are at the beginning of the function body, as usual.

Then there is a loop. This is the core of stored aggregate functions. The loop begins with the statement FETCH GROUP NEXT ROW. This statement is the only syntax difference between regular and aggregate stored functions. It reads the next row, and copies its values into the corresponding parameters. In this case, it copies the next value of price into p_num.

As mentioned, we want to skip NULLs. When we find a NULL we use ITERATE to jump back to the beginning of the loop, and iterate the next value.

After we’ve read the last row, MariaDB will attempt again to iterate the loop. Since there are no more rows to fetch, FETCH GROUP NEXT ROW will raise a NOT FOUND error. It will be handled by the handler, and the correct value will be returned.

Multiple parameters

All native aggregate functions (MAX(), AVG(), STDDEV(), etc) accept only one argument in input. Therefore, it may not be obvious that stored aggregate functions can accept any number of parameters (including 0).

In the following example, the function has two parameters and the first non-NULL value is used to calculate the mean.

CREATE OR REPLACE AGGREGATE FUNCTION coalesce_geometric_mean(
        p_num1 INT UNSIGNED,
        p_num2 INT UNSIGNED
    )
    RETURNS FLOAT
BEGIN
    -- first non-NULL element
    DECLARE v_coalesce INT UNSIGNED DEFAULT NULL;
    -- partial result
    DECLARE v_product INT UNSIGNED DEFAULT 1;
    -- number of values
    DECLARE v_count INT UNSIGNED DEFAULT 0;

    DECLARE EXIT HANDLER
        FOR NOT FOUND
    BEGIN
        RETURN POW(v_product, 1 / v_count);
    END;

    agg_main:
    LOOP
        FETCH GROUP NEXT ROW;

        SET v_coalesce := COALESCE(p_num1, p_num2);

        -- skip NULLs
        IF v_coalesce IS NULL THEN
            ITERATE agg_main;
        END IF;

        -- update the running product and the values count
        SET v_product := v_product * v_coalesce;
        SET v_count := v_count + 1;
    END LOOP;
END;

Testing the function

MariaDB support the VALUES statement. I don’t know any case where this statement makes something possible. But I know cases where it makes something easier, and testing a function is one of those cases.

Instead of creating and populating a table, and modifying it to make more tests, we can directly pass the values to the function. The only problem is that VALUES doesn’t allow to specify column names. The column names are always equal to the first values we generate. This forces us to use the little trick ther you’ll see in this example:

SELECT coalesce_geometric_mean (`15`, `NULL`) FROM (
    VALUES
        (15, NULL),
        (NULL, 12),
        (NULL, 13),
        (19, NULL),
        (10, NULL)
) v;

A personal note

My stored procedures training covers MariaDB stored aggregate functions. If you make an intensive use of stored procedures in your job, you may consider attending my training.

Conclusions

This is just an example of aggregate stored functions. It is a very specific case: a geometric mean on integer values. But we may want to have a similar function for decimal values. And there are several other types of mean. And for each of them, a standard deviation or a variance can be calculated.

But an aggregate stored function could also compose structured values. The funcion GROUP_CONCAT() for example is good for creating lists. And MySQL has the JSON_ARRAYAGG() and JSON_OBJECTAGG() functions.

As mentioned, my favorite use cases for stored aggregate functions is when they read many rows and return a single value. The reason is that I don’t like to send huge result sets over a network, if this can be avoided.

And you?

Do you have use cases for stored aggregate functions? Ideas? Doubts?

Please tell me your opinion! I love the discussions that we have in the post comments.

Toodle pip,
Federico

Photo credit: this page about being obsessed about drawing a Lorenz attractor. Check it, it’s amazing.

Comments (2)

  1. I had forgotten I had read about this feature in the release notes. Very insightful, and well written – thank you!

    > And MySQL has the GROUP_CONCAT() and GROUP_CONCAT() functions.

    Maybe you meant JSON_ARRAYAGG() and JSON_OBJECTAGG()? These are MySQL aggregate functions not yet found in MariaDB. Though I suppose we can now write our own stored aggregate function versions for MariaDB 🙂

    1. God morgen Karl.
      Yes, I meant that… a copy/paste error, thank you for pointing out.
      And yes, we can easily write them in MariaDB – I’m curious to see the performance.

Leave a Reply

Your email address will not be published. Required fields are marked *