Use cases for MariaDB Invisible Columns

Last Updated on

MariaDB 10.3 introduces support for invisible columns, implemented with the MDEV-10177. This is a feature present at least in IBM Db2, Oracle and MS SQL Server, but not in any open source database, as far as I know.

The documentation page explains concisely pretty much what you need to know to use them. The MDEV gives some more information about how invisible columns are used internally.

Here I want to give a couple of hints, and talk about the use cases, which are not obvious – at least not completely.

I admit it is a pity when beautiful columns are invisible

Working with Invisible Columns

The idea of Invisible columns is not hiding columns from the users. Instead, the idea is that we should use them when we really need to, and knowing what we are doing.

In practice this means that:

  • SELECT * queries don’t return virtual columns, unless these columns are explicitly requested;
  • INSERT, UPDATE and DELETE statements are not affected;
  • Yet, since those columns are likely to be omitted in INSERTs, they are required to have a DEFAULT value or be NULLable.

If SELECT * doesn’t return some information we need, we’ll need to see which virtual columns exist in a certain table. We can do that in these ways:

-- get a list of invisible columns names
SELECT COLUMN_NAME FROM information_schema.COLUMNS WHERE EXTRA LIKE '%INVISIBLE%'
    AND TABLE_SCHEMA  = 'test'
    AND TABLE_NAME    = 'customer'
;
-- or, get a CREATE TABLE statement.
-- then look for columns defined as INVISIBLE
SHOW CREATE TABLE customer \G

Use cases

You may be wondering why one would want to use this feature.

Adding columns

Most database drivers allow to retrieve values from a resultset based on their position, instead of using column names. I see two reasons for this (please let me know if there are others I am missing):

  • The developers think that this is an optimisation. If you agree, try to measure the performance difference. If you manage to measure it, I congratulate with you on the absolute perfection of the rest of your code, not to mention the quality of the technology stack you are using! 🙂
  • A portion of code needs to show all data from a query, but it doesn’t know in advance which columns the resultset will contain. This choice is reasonable, but not great for reasons that will soon be clear. It’s better to pass that code a list of the expected column names.

Whatever the reason, suppose that now we add a column to the table. Your query will start to show that column. Is that what you want? In most cases, surely not.

Theoretically, the SELECT * queries should be fixed at this point. But experience teaches me that developers hate to do this (especially when their queries return many columns), they feel frustrated, and they occasionally introduce bugs in the process (by forgetting or mistyping a column). Many of them don’t even know how to do it, in case they select an SQL expression (SELECT size_in_bytes / 1024 AS size_in_mega).

Adding the column and defining it as INVISIBLE is a decent workaround. At least, the existing code will not break. Queries that need to extract the new column will be modified to mention is explicitly. In the worst case, they will do something like this:

SELECT *, new_invisible_column FROM good_old_table;

Unmaintained applications

Your organisation may run applications that it doesn’t control: because they’re proprietary or because they are not properly maintained (eg: one guy maintains millions rows written in COBOL). And maybe those unmaintained applications use the same database as some maintained applications, that occasionally need the table structures to change.

We can discuss for ages the pro’s and con’s of such a situation, but it is what it is. And in such cases, Invisible Columns are the only easy way to make changes (yes, there are also difficult solutions, but hopefully you don’t need them).

Dropping columns

When we need to drop a column, we have the opposite problem: we need to delete all mentions of that column. I’m not suggesting to use SELECT *: most of the mentions are in the code itself.

For big applications, this requires a certain amount of testing, like unit testing and a manual testing of certain functionalities of the application. I also suggest to drop the column in the test databases for a certain amount of time, say a week, to give obscure parts of the code the time to be executed by someone and fail.

For more advice on how to drop columns safely from production, see Dropping database objects in a safe way. One of the suggestions you will find in that page is renaming the column for a certain time before dropping it. But think about the case I describe above: a portion of code accesses column values by offset, not by name. Renaming the column will change nothing.

In this case, it is far better to make the column invisible.

Forbidding SELECT *

We mentioned an important reason why SELECT * is a bad practice: its results will change after a table is modified. Another reason is that they tend to return more columns than needed, increasing the disk usage, memory usage and network usage. In particular, in MariaDB and MySQL BLOB columns are stored separately, and reading them when not needed causes unnecessary memory/disk access operations. See TEXT and BLOB good practices.

But convincing developers to list the needed columns in their SELECTs is hard or impossible. They not evil or lazy – they have periodic goals to reach, and those goals are typically only about implementing new features. If their managers included some goals related to quality and application performance, they would probably act differently (a good topic for a future article).

Nonetheless, listing columns is not as time-consuming and frustrating as they say. They have to mention those columns anyway, in the code. Every time they first mention a column in the code, they could very well add it to the query.

Invisible columns can make virtually impossible for developers to use SELECT *. I’m not suggesting to make columns invisible in existing tables – that would be a disaster and would cause a war between DBAs and developers. But for new table, all columns can be declared as invisible. The only exceptions would be the primary key columns, because they cannot be invisible.

If all non-primary key columns are invisible, developers can still use SELECT * to check if a row exists. But in practice they will only read the primary key columns, or a secondary UNIQUE index columns, which is fine.

Related contents

Related articles

Related courses

The articles on Federico-Razzoli.com share a lot of technical knowledge for free, in the hope that they can help solving specific problems and raise the level of technical culture. However, if you work with database technologies, as an administrator or as a developer, you may need to build more organic knowledge and skills. Here are some courses related to the subject of this article.

Conclusions

Let me be clear: Invisible Columns are far from being a clean feature that I’m happy to use in a database. But application code also does dirty things. Invisible Columns are the easiest and more realistic way to deal with the cases I described above.

Do you have experiences or thoughts to share, related to Invisible Columns? Do you have alternative solutions you want to share? Did you find anything wrong in the article? Please comment!

Comments to my articles make this website not just one person’s texts, but also a more collective knowledge base.

Toodle pip,
Federico

Leave a Reply

Your e-mail address will not be published. Required fields are marked *