Understanding tables usage with User Statistics (Percona Server, MariaDB)

Last Updated on

User Statistics is a plugin for MySQL developed by Percona. It is distributed and enabled by default in Percona Server and in MariaDB. I suppose it is possible to compile it with MySQL, but I didn’t try.

In this article we’ll see a few simple, yet useful queries to analyse how tables are used, which of them are used most – therefore, on what we need to focus on to optimise the server’s workload.

An old photo of the Vajont Dam before the disaster, in the 1960
Vajont Dam, in Italy – a sad reminder of what can happen
if you fail to analyse your workload.
https://commons.wikimedia.org/wiki/File:Vajont_Dam_1960_panorama.jpg

Enabling and disabling User Statistics

To enable it, just run this SQL statement:

SET GLOBAL userstat := 1;

Set to 0 to disable. But its impact on performance is very low. And having more information about a server performance is generally more important than a small speed difference.

Use Cases

In this article, we want to investigate table usage. This is not necessarily about bad queries. For that purpose, we can use the slow log and tables from the information schema. Also, an older article shows how to find useless queries that are executed by a server.

Understanding tables usage, in this context, means:

  • Seeing which tables are read and written mostly. Then we can focus on optimising them and the queries that involve them.
  • Seeing which indexes are being used. Then we can check if the mostly used are as well-designed as they should be.
  • Find tables and indexes that are not used at all.

User Statistics creates some tables in the information_schema. For our purposes, we’ll use two of them:

  • TABLE_STATISTICS;
  • INDEX_STATISTICS.

These tables can also be monitored with some monitoring tool. PMM monitors TABLE_STATISTICS automatically, if User Statistics is enabled. This allows to:

  • Have a visual representation of how reads and writes are distributed among tables;
  • Easily find spikes in one table’s usage, that could match some server’s performance decrease;
  • Easily seeing if a table is being used a lot all of a sudden. When I was a DBA in companies where multiple microservices could access the same database servers, checking this graph after a performance degradation allowed me to immediately know which team did something wrong.
A measuring tape
Measure your tables
https://www.pxfuel.com/en/free-photo-xnote

Queries

See how the workload is distributed over engines and databases

Workload by storage engine (where workload means reads and writes):

SELECT
        t.ENGINE,
        COUNT(*) AS table_count,
        COUNT(s.ROWS_READ > 0 OR s.ROWS_CHANGED > 0) AS used_table_count,
        SUM(IFNULL(s.ROWS_READ, 0)) AS SUM_ROWS_READ,
        SUM(IFNULL(s.ROWS_CHANGED, 0)) AS SUM_ROWS_CHANGED,
        SUM(IFNULL(s.ROWS_CHANGED_X_INDEXES, 0)) AS SUM_ROWS_CHANGED_X_INDEXES
    FROM information_schema.TABLES t
    LEFT JOIN information_schema.TABLE_STATISTICS s
        ON t.TABLE_SCHEMA = s.TABLE_SCHEMA AND t.TABLE_NAME = s.TABLE_NAME
    WHERE t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
    GROUP BY t.ENGINE
        WITH ROLLUP
;

Workload by database:

SELECT
        t.TABLE_SCHEMA,
        COUNT(*) AS table_count,
        COUNT(s.ROWS_READ > 0 OR s.ROWS_CHANGED > 0) AS used_table_count,
        SUM(IFNULL(s.ROWS_READ, 0)) AS SUM_ROWS_READ,
        SUM(IFNULL(s.ROWS_CHANGED, 0)) AS SUM_ROWS_CHANGED,
        SUM(IFNULL(s.ROWS_CHANGED_X_INDEXES, 0)) AS SUM_ROWS_CHANGED_X_INDEXES
    FROM information_schema.TABLES t
    LEFT JOIN information_schema.TABLE_STATISTICS s
        ON t.TABLE_SCHEMA = s.TABLE_SCHEMA AND t.TABLE_NAME = s.TABLE_NAME
    WHERE t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
    GROUP BY t.TABLE_SCHEMA
        WITH ROLLUP
;

Workload by database and engine:

SELECT
        t.TABLE_SCHEMA, t.ENGINE,
        COUNT(*) AS table_count,
        COUNT(s.ROWS_READ > 0 OR s.ROWS_CHANGED > 0) AS used_table_count,
        SUM(s.ROWS_READ) AS SUM_ROWS_READ,
        SUM(s.ROWS_CHANGED) AS SUM_ROWS_CHANGED,
        SUM(s.ROWS_CHANGED_X_INDEXES) AS SUM_ROWS_CHANGED_X_INDEXES
    FROM information_schema.TABLES t
    LEFT JOIN information_schema.TABLE_STATISTICS s
        ON t.TABLE_SCHEMA = s.TABLE_SCHEMA AND t.TABLE_NAME = s.TABLE_NAME
    WHERE t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
    GROUP BY t.TABLE_SCHEMA, t.ENGINE
        WITH ROLLUP
;

Workload by engine and database (same info, opposite approach):

SELECT
        t.ENGINE, t.TABLE_SCHEMA,
        COUNT(*) AS table_count,
        COUNT(s.ROWS_READ > 0 OR s.ROWS_CHANGED > 0) AS used_table_count,
        SUM(s.ROWS_READ) AS SUM_ROWS_READ,
        SUM(s.ROWS_CHANGED) AS SUM_ROWS_CHANGED,
        SUM(s.ROWS_CHANGED_X_INDEXES) AS SUM_ROWS_CHANGED_X_INDEXES
    FROM information_schema.TABLES t
    LEFT JOIN information_schema.TABLE_STATISTICS s
        ON t.TABLE_SCHEMA = s.TABLE_SCHEMA AND t.TABLE_NAME = s.TABLE_NAME
    WHERE t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
    GROUP BY t.ENGINE, t.TABLE_SCHEMA
        WITH ROLLUP
;

See index usage

Rows read with an index vs. read with a tablescan:

SELECT
        t.TABLE_SCHEMA, t.TABLE_NAME,
        t.ROWS_READ AS TOTAL_ROWS_READ,
        SUM(i.ROWS_READ) AS INDEX_ROWS_READ,
        t.ROWS_READ - SUM(i.ROWS_READ) AS TABLESCAN_ROWS_READ
    FROM information_schema.TABLE_STATISTICS t
    LEFT JOIN information_schema.INDEX_STATISTICS i
        ON t.TABLE_SCHEMA = i.TABLE_SCHEMA AND t.TABLE_NAME = i.TABLE_NAME
    WHERE t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
    GROUP BY t.TABLE_SCHEMA, t.TABLE_NAME, t.ROWS_READ
;

Note: for some small tables it’s ok to have a tablescan (eg: application configuration). For others it’s not, but if you dump them you have tablescans. This query still helps you to find tables largely read with tablescans where this is not expected.

Index usage with index type added (this allows to see, for example, FULLTEXT indexes usage; this query also includes unused indexes, which are not included in INDEX_STATISTICS):

SELECT
        s.TABLE_SCHEMA, s.TABLE_NAME,
        s.INDEX_NAME, s.INDEX_TYPE,
        IFNULL(i.ROWS_READ, 0) AS ROWS_READ
    FROM information_schema.STATISTICS s
    LEFT JOIN information_schema.INDEX_STATISTICS i
        ON s.TABLE_SCHEMA = i.TABLE_SCHEMA
            AND s.TABLE_NAME = i.TABLE_NAME
            AND s.INDEX_NAME = i.INDEX_NAME
    WHERE s.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
;

Specifically list unused indexes:

SELECT st.TABLE_SCHEMA, st.TABLE_NAME, st.INDEX_NAME
    FROM information_schema.STATISTICS st
    LEFT JOIN information_schema.INDEX_STATISTICS idx
        ON  idx.INDEX_NAME    = st.INDEX_NAME
        AND idx.TABLE_NAME    = st.TABLE_NAME
        AND idx.TABLE_SCHEMA  = st.TABLE_SCHEMA
    WHERE
        st.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
        AND idx.INDEX_NAME IS NULL
        AND st.NON_UNIQUE = 1
    ORDER BY st.TABLE_SCHEMA, st.TABLE_NAME, st.INDEX_NAME
;

Compose the ALTER TABLEs to drop unused indexes:

SELECT st.TABLE_SCHEMA, st.TABLE_NAME, st.INDEX_NAME
    FROM information_schema.STATISTICS st
    LEFT JOIN information_schema.INDEX_STATISTICS idx
        ON  idx.INDEX_NAME    = st.INDEX_NAME
        AND idx.TABLE_NAME    = st.TABLE_NAME
        AND idx.TABLE_SCHEMA  = st.TABLE_SCHEMA
    WHERE
        st.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
        AND idx.INDEX_NAME IS NULL
        AND st.NON_UNIQUE = 1
    ORDER BY st.TABLE_SCHEMA, st.TABLE_NAME, st.INDEX_NAME
;

Miscellaneous

Find unused tables (since last server restart or User Statistics activation):

SELECT
        t.TABLE_SCHEMA, t.TABLE_NAME, t.ENGINE
    FROM information_schema.TABLES t
    LEFT JOIN information_schema.TABLE_STATISTICS s
        ON t.TABLE_SCHEMA = s.TABLE_SCHEMA AND t.TABLE_NAME = s.TABLE_NAME
    WHERE t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
        AND ROWS_READ IS NULL AND ROWS_CHANGED IS NULL
;

See also

Related articles

Related courses

The articles on Federico-Razzoli.com share a lot of technical knowledge for free, in the hope that they can help solving specific problems and raise the level of technical culture. However, if you work with database technologies, as an administrator or as a developer, you may need to build more organic knowledge and skills. Here are some courses related to the subject of this article.

Related services

  • A health check will provide you with a comprehensive understanding of your workload and your index usage. You will discover your current problems, and eliminate future problems before they hit you.
  • Use some of my monthly or weekly time for periodic quick checks of your production databases. We will promptly take action to deal with any problems that may arise.

Conclusions

Here are the queries I occasionally use to get information from the User Statistics plugin. As a general rule, the performance_schema gives more in-depth information, but not all these data are available in it. User Statistics is also simpler to use. And it can be enabled at runtime.

These queries only concern the TABLE_STATISTICS and INDEX_STATISTICS tables. Other tables can also provide insightful information, and this can be a topic for one or more future articles, if there is interest.

Part of this information is included in my MySQL Health Checks, if User Statistics are enabled. If it is not enabled, the report includes some similar (but less specific) information obtained by querying the information_schema and performance_schema databases. This is meant to both give you a better understanding on your workload, and find some problems about your indexes that need to be fixed.

Do you have more queries to share? Did you find any bug in these queries? Please comment! Comments to my articles make this website not just one person’s texts, but also a more collective knowledge base.

Toodle pip,
Federico

Leave a Reply

Your email address will not be published. Required fields are marked *