Dropping useless MySQL indexes

This railway viaduct near Liverpool Street (London)
seems to allow to reach some places quickly, but like many indexes, it is unused.

Before reading this article, I suggest to read Indexes bad practices, if you missed it. In that article I mentioned that useless indexes should be deleted, and in the comments Vipul asked how to find such indexes. Here I explain how to do it.

Finding useless indexes

First of all, let’s define what a useless index is. There are two types of them:

  • Duplicate indexes;
  • Unused indexes.

Duplicate indexes

If you don’t know what duplicate indexes are, you should really check the previous article. We’re not discussing the matter again here.

There is at least one tool to find such indexes: pt-duplicate-key-checker.

This great tool finds duplicate indexes, and outputs a list of them, as well as the ALTER TABLEs to drop them. It also outputs the index definitions, so you will easily find out why a certain index is a duplicate.

The tool is safe with respect to constraints. UNIQUE indexes are not considered duplicates, unless they duplicate other UNIQUE indexes. Foreign keys are considered duplicates only if they are identical.

Paranoid modes

pt-duplicate-key-checker is usually called without any arguments, and that is what I do. For completeness, I will mention here some additional checks that could be done – but probably no one does.

  • --all-structs: Ignored the index types. An index can duplicate another even if they are of different types – for example, a FULLTEXT index and a regular BTREE. This is normally wrong, but there is an exception: a BTREE could duplicate a HASH index. If your database has improper HASH indexes, you may want to try this parameter. But then, you will have to check manually the output.
  • --ignore-order: Normally, an index on (a, b) should not be considered as a duplicate of (b, a). The reason is that some queries can only be server by one of these indexes, as explained in the previous article. This parameter causes these indexes to be considered as duplicates. This is only true if we compare both these columns with = and don’t use GROUP BY and ORDER BY on them. If you have these non-common case you may use --ignore-order to spot duplicates, but you will have to check manually the output.

Unused indexes

If you use MariaDB or Percona Server, you have a plugin called user_statistics. It can also be installed in MySQL.

The user_statistics method

If you enable it, you will have a table called INDEX_STATISTICS, in theinformation_schema database. All indexes not present in that table were not used since the last restart of MySQL. To spot them you can use this query:

SELECT st.TABLE_SCHEMA, st.TABLE_NAME, st.INDEX_NAME
    FROM information_schema.STATISTICS st
    LEFT JOIN information_schema.INDEX_STATISTICS idx
        ON  idx.INDEX_NAME    = st.INDEX_NAME
        AND idx.TABLE_NAME    = st.TABLE_NAME
        AND idx.TABLE_SCHEMA  = st.TABLE_SCHEMA
    WHERE
        (idx.INDEX_NAME IS NULL OR idx.ROWS_READ = 0)
        AND st.NON_UNIQUE = 1
    ORDER BY 1, 2, 3
;

The performance_schema method

There is a less safe alternative. For that, you need to have the performance_schema enabled. If it is disabled, you need to restart your server to enable it.

Unfortunately, the performance_schema is disabled by default in MariaDB since version 10.0. Let’s hope that they’ll change their mind, for a lot of reasons.

With the performance_schema enabled, you can run this query to find unused indexes:

SELECT object_schema, object_name, index_name
    FROM performance_schema.table_io_waits_summary_by_index_usage
    WHERE index_name IS NOT NULL AND count_star = 0
    ORDER BY object_schema, object_name, index_name;

Why is it less safe? Because it assumes that an index is only unused if it never caused an IO wait.

Risks

Duplicate indexes are not necessarily unused. This is not a problem if another index can be used in place of the duplicate. But there is an exception, that I mentioned in The perils of ALTER TABLE in MySQL/MariaDB. MySQL supports the IGNORE/USE/FORCE INDEX syntax, which mentions an index by name. If the index doesn’t exist, the query fails – even in the care of IGNORE INDEX.

Dropping an unused index is safe. As mentioned, indexes found with the user_statistics method are surely unused. But indexes found with the performance_schema method could theoretically be used.

To reduce these risks, MySQL 8 has a useful feature: invisible indexes. You can make an index invisible in this way:

ALTER TABLE t ALTER INDEX idx_a INVISIBLE;

If an index is invisible, MySQL will pretend that it doesn’t exist. Therefore, if it is mentioned by a query with IGNORE/USE/FORCE INDEX, the query will return an error. And if it is used by a query, the query will become slow – or it will use another index. The advantage is that making an index invisible or visible is an instantaneous operation. Therefore, even if the table is very big, if a problem appears, we can make the index visible again in a moment.

See also

Related articles:

Related courses:

Conclusions

We saw how to find duplicate indexes, including some edge cases that are not reported by default by pt-duplicate-key-checker. Then we saw how to find unused indexes: the performance_schema database should always be enabled, but for this particular task the user_statistics plugin is better. Finally, we discussed how to use invisible indexes to reduce risks.

As usual my conclusion is: what do you think? If you have questions, disagree with something or have more ideas, please comment.

Toodle pip,
Federico

Photo credit

Comments (6)

  1. It’s wise to use the unused indexes statistics to choose _potential_ unused indexes, and don’t take them as a guarantee. Use your judgment as well, to make sure the indexes aren’t needed for some query that is run infrequently. Because the index stats reset when mysqld restarts, or when certain FLUSH statements are run, you can get a false positive indication that a given index is unused, when it really means the query that needs the index was not run recently. This has happened to me on several occasions.

    1. Hi Bill. I agree that some human wisdom is needed. One problem I didn’t mention is that an index could be useful but not used because the optimiser makes a bad choice – so removing it is fine, but fixing the problem is better.

    1. Your comment made me very curious, and I made some quick tests. It still does make a difference – but only in edge cases, and I’m pretty sure that nowadays it makes much less difference than it did in the past. Anyway, this is done with MySQL 8.0.

      SELECT a FROM t FORCE INDEX (idx_a_last_name) WHERE last_name = ‘rossi’ AND a = 1;
      3079407 rows in set (1.87 sec)

      SELECT a FROM t FORCE INDEX (idx_last_name_a) WHERE last_name = ‘rossi’ AND a = 300;
      1615 rows in set (0.01 sec)

      mysql> SELECT COUNT(*), COUNT(DISTINCT a) FROM t WHERE last_name = ‘rossi’;
      +———-+——————-+
      | COUNT(*) | COUNT(DISTINCT a) |
      +———-+——————-+
      | 5550312 | 20804 |
      +———-+——————-+
      1 row in set (25.87 sec)

      mysql> SELECT COUNT(*), COUNT(DISTINCT last_name) FROM t WHERE a = 300;
      +———-+—————————+
      | COUNT(*) | COUNT(DISTINCT last_name) |
      +———-+—————————+
      | 2482 | 6 |
      +———-+—————————+
      1 row in set (0.02 sec)

  2. Be cautious about discovering an “unused index”. Consider a maintenance task that runs once a month, but you have collected statistics for only 3 weeks.

    1. Yes there may be several caveats. Another is that some indexes may be only used in slaves. I leave these things to the wisdom of DBA’s.

Leave a Reply

Your email address will not be published. Required fields are marked *