Dropping database objects in a safe way

A drop on a table
A rain DROP on a TABLE

Whenever I worked as a DBA, I was reasonably scared by 💧DROP commands. Don’t get me wrong: despite this, I badly wanted to drop obsolete tables. After that, I could more easily ask developers to fix bad practices, like tables without primary keys or without indexes at all. Similarly, I wanted to drop obsolete columns, especially from very big tables, and to normalise big sparse matrixes (which implies moving some columns to a new table). And I wanted to drop duplicate or unused indexes, as they add a useless overhead to writes and to the optimiser.

But every time a developer said something like “it’s ok, this table is not used anymore”, I knew I could not trust that statement. And actually, from time to time, their optimistic opinions proved wrong. Not because I worked with stupid developers! I am sure that they always ran at least a git grep <tablename> on their repositories. But maybe the table was used in some hideous job or script, and they missed it.

The same happened with columns. And I already wrote how dropping an index (even an unused one) can break an application.

But don’t let this stop you. Here I’m discussing the process I used to safely drop stuff. Of course not all companies are the same, so the process could vary, and should probably vary in your case.

XKCD strip about SQL injections
Technically, this woman could be in good faith

Checking if objects are used

If you use MariaDB or Percona Server, you should definitely enable the user_statistics plugin. It can be useful in many ways. PMM uses it to draw graphs on table usage, and you can probably do the same with other technologies, like Zabbix templates. If a table is used, you will see from a graph.

Check if the table or column is used in any stored routines, triggers or events. Some LIKE queries will help you, unless you have many procedures and you are looking for a table name like order:

SELECT ROUTINE_SCHEMA
    FROM information_schema.ROUTINES
    WHERE ROUTINE_BODY LIKE '%table_name%';
SELECT EVENT_OBJECT_SCHEMA, EVENT_OBJECT_TABLE
    FROM information_schema.TRIGGERS
    WHERE ACTION_STATEMENT LIKE '%table_name%';
SELECT EVENT_SCHEMA, EVENT_NAME
    FROM information_schema.EVENTS
    WHERE EVENT_BODY LIKE '%table_name%';

Check of course your scripts, including scripts for tasks like backups or anonymisation.

The rest is up to your developers. If they say that a table is not used, you will pretend you believe them. But be sure to check with all relevant teams. This typically includes analysts and data scientists – often, developers working on the operational side have no idea what analysts do or need. Don’t use private messages for this type of communications. The best way would be to add tasks in their bug tracker (JIRA or whatever) to ask them to check if a table or column is used. This will give more people the opportunity to report that an object shouldn’t be dropped, and will show that you did all the checks you should do in this cases. Also, they will probably take the task more seriously, as they can show their managers how they used their time.

Don’t forget that the same checks must be done for indexes, as they could be mentioned in optimiser hints.

Dropping objects

Whatever you are dropping – a table, a column or an index – the process is the same. Here’s a summary – below we’ll discuss every point:

  1. Drop or rename the object in staging.
  2. Rename the object in production.
  3. Wait and monitor.
  4. Take a backup.
  5. Set the backup retention time.
  6. Drop the object.

The first step must be dropping the table/column/index in staging. You already know this, and I won’t stress this obvious concept. Instead, I want to stress that in the real world this is not enough. I saw several cases where a table was never used in staging, but it was used in production, maybe by some job or by some forgotten feature.

In production, you should rename the object. Add a prefix, for example __old_. If I can establish a naming convention (a good topic for another time) I disallow names to start with a _ character. This makes __old_ and other special prefixes safer, so that you can reliably use them in your scripts (for example, excluding __old_ tables from dumps).

You may want to rename the object (instead of dropping it) in staging, too.

The next step is simply waiting. You will have a standard time period to wait until the table is considered safe to delete, for example 2 days. What a reasonable period is depends on factors that are specific to your company, for example the frequency of job runs.

Application errors are hopefully monitored. If something goes bad in that period, you can very quickly restore the old situation by renaming back the table, column or index. Since this operation only affects metadata, it is very fast (unless it’s locked by a long-running query/transaction).

If nothing went wrong, you should take a backup of the object, including its SQL definition. In the case of a column, a backup could be a dump of the primary key plus the column to delete. In the case of an index, its definition is all you need.

Decide for how much time the backup should be preserved. The wise answer could easily be forever. Why? Because you never know what can happen. For example, you may have a table with data about the translators of contents that disappeared from your website. Developers will probably tell you that the table can be deleted. And that’s wrong: your company could still have some legal dispute with those people in the future.

Finally, you can drop the object!

Invisible indexes

Indexes can be renamed, just like tables and columns. But this only affects queries that explicitly try to use them using optimiser hints. Queries may still use those indexes because MySQL optimiser knows thinks that it’s a good idea. If the optimiser is right, and usually it is, dropping an index could make some queries slow.

MySQL 8.0 implements invisible indexes. An invisible index cannot be used in any way by a query. The optimiser will not take such index into consideration. And it an optimiser hint suggests to use (or even ignore) that index, the query will fail.

A small overhead

Note that these extra steps don’t add a big overhead. Once a table is renamed, you can go back to your everyday work. And maybe once a week you can backup and drop renamed tables. This step can even be automated, and quite easily – just don’t forget that every automation needs to be properly monitored.

See also

Related articles:

Related courses:

A personal note

Do you find this type of advise useful? I can help your team to establish good practices to reduce the risks without slowing down development and operations. Contact me to discover what I can do for you.

Toodle pip,
Federico

Meme: If a coffee in the morning doesn't wake you up, try deleting a table in production instead!

Comments (2)

  1. This tips are really helpful. Please keep posting such tips and some more about mysql dba activities for high performance availability and maintenance.

    1. Hi, Anand.
      I wrote this because I thought that operational tips for DBAs are as useful as technical contents, but it’s hard to find them on the web. Thanks for your feedback, I will write again on this topic.

      Federico

Leave a Reply

Your email address will not be published. Required fields are marked *