Primary Key in InnoDB: how they work and best practices

Last Updated on

A vintage primary key

InnoDB is MySQL default storage engine – the one you use if you don’t know what a storage engine is, or you don’t have special needs. This foreword will surprise my readers from the MySQL community, but recently I’ve talked (about databases) with a lot of people who don’t know what InnoDB is.

This article shows what a proper primary key looks like for InnoDB, and why.

You may also want to check the next article: Indexes bad practices.

TL;DR: copy+paste this

It is a good thing to always use a primary key. For most tables, just use this:

id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY

Why is this important?

If you’re still reading, you are interested in why good practices are good, and how things work internally. Let’s see the reasons.

How the primary key and secondary indexes are stored

In InnoDB, under the hood, the primary key is actually the table. The first columns are the ones that the we define as part of the primary key (id if you follow the above suggestion). Rows are physically ordered by the primary key columns (id).

Secondary indexes are stored separately. Each index contains the columns that we define as part of the index.

Suppose for example that we have the following table:

CREATE TABLE t (
    a INTEGER UNSIGNED,
    b INTEGER UNSIGNED,
    c INTEGER UNSIGNED,
    d INTEGER UNSIGNED,
    PRIMARY KEY (a, b),
    INDEX idx_c (c)
) ENGINE InnoDB;

The physical table will look like this:

abcd
1013
1256
2110
3154
4520
4617

idx_c will look like this:

cab
110
121
146
245
512
531

Tables that follow these rules are called organised by primary key. Most databases are not organised by primary key.

This information leads us to a series of performance considerations.

Performance of searches by primary key

A search by primary key implies a search in a data structure – in this context we don’t really care about the used data structure. What really matters is that, when InnoDB finds the values it is looking for, it finds all the columns that it may have to return.

A secondary index is different. When InnoDB finds the entries it needs, it finds a reference to the corresponding primary key entries (a and b, in this case). Usually this is not enough, because more columns need to be read to answer the query. In this case, for each row found in the secondary index, InnoDB will have to read the corresponding rows in the table (primary key). Each of this search takes approximately the same time as the search by secondary index.

One big reason why each table should have a primary key is that queries by primary key are very fast.

There are, however, queries by secondary index that are as fast as a query by primary key. Consider the following queries:

SELECT c FROM t WHERE c = 1;
SELECT a, b FROM t WHERE c = 1;

Those queries can be satisfied by checking the index, without any additional operation. For these queries, idx_c is a covering index.

Replication

Another performance concern is related to MySQL replication and Galera. It is not limited to InnoDB: in a replication chain (but not in Galera) we can use any storage engine, but this will not mitigate the problem.

Replication works by sending the contents of the binary log from a master to the slaves. The recommended replication format is ROW (binlog_format = ROW), which means that a binary representation of the row and the corresponding primary key value is sent. For DELETEs and UPDATEs, this makes the search for the row to erase/modify fast.

What happens if there is no primary key? A NOT NULL UNIQUE index will be used instead. But what if there is no such index? In that case, the whole rows to modify are written in the binary log and sent to the slave. To replicate the change, the slave will have to find the rows with a full table scan, which is obviously much slower.

Avoid the mutex

If we create an InnoDB table without a primary key, it will have a so-called clustered index. It is an internal 6 bytes index, invisible to the user.

The problem with this index is that its write access is governed by a global mutex: dict_sys->mutex. The same mutex governs several, apparently unrelated operations in InnoDB. As a consequence, the use of clustered indexes sensibly reduces our databases scalability.

Jeremy Cole analyses this problem in his article: How does InnoDB behave without a primary key?. It is from 2013, but occasionally I still find myself pointing people to it to explain why a primary key is highly desirable.

Primary key size

Primary keys are contained by all secondary indexes, so a big primary key leads to big secondary indexes. Also a smaller key is faster, and the primary key is involved in most of the queries.

These are good reasons to have small primary keys. To achieve this, we want to use an integer, as strings take more space. INTEGER UNSIGNED is 4 bytes, and the maximum value is more than 4 billions. This is almost always enough – when it isn’t, we can use BIGINT UNSIGNED.

I don’t see a reason to use less than 4 bytes, even when this can be done safely. Often we have a column pointing to another table’s primary key – call it a referencing column. Primary keys and their referencing columns should be of the same type, otherwise JOINs will cause a lot of type conversions that will lower performance. It sounds like an easy task, but in practice I see this error quite often in production databases. Always using INTEGER UNSIGNED – except for the rare cases where BIGINT UNSIGNED is necessary – makes the task easier.

EDIT: It was pointed out in a comment on Facebook that migrating from INT to BIGINT can be a multi-months project, therefore it is better to create BIGINT primary keys from the start. Based on my experience, just a few primary keys per database (or none at all) need to use BIGINT – but every organisation is different so, if you are in doubt (and your doubt is realistic), then use BIGINT UNSIGNED.

Values insert order

A table rows are physically ordered by primary key. Therefore, for performance reasons, new values should be appended to the end of the key. If values are inserted in the middle, InnoDB will have to do additional operations to make room for them.

To guarantee that new values are inserted to the end of the key, we should always use the AUTO_INCREMENT attribute.

Using a TIMESTAMP column will also help to insert the values in the correct order. But nothing guarantees that the generated values are unique, which could results in error during insertions. It’s better to avoid TIMESTAMP primary keys.

Common exceptions

Let’s see a couple of common practical cases where the above general hint doesn’t apply, and why that’s the case.

If you have more exceptions, please explain them in a comment.

Many to many relationships

A many to many relationship between C and t2 is represented as a separate table, with a pointer to those tables:

t3 {t1_id, t2_id}

We could add an id column which is also the primary key. Then we should also ensure that the relationships are unique, by adding a UNIQUE index on t1_id, t2_id. And if we want to optimise searches by t2_id, we should also add an index on that column. But then the question is: will there be a real gain? Probably not. Writes could even be slower, because they will write much more stuff.

So it’s better to build the primary key on both the columns, but in which order? In most practical cases, putting them in the right order guarantees that the rows will usually be written in the primary key order. For example, suppose we need to link books with authors. It is an M2M relationship, no doubt about it. The correct order is (book_id, author_id). Why?

Because rows will typically be written when a new book appears. And most of the books are written by existing authors. We could even ask the developers to add rows for a new book ordered by the authors id’s – that shouldn’t be hard. But probably it will be useless, as most books are written by a single author.

These considerations ‘are driven by logic. In your organisation things could be different, so make your hypothesis and run some queries to check if they are correct. For new features, you may want to talk to developers to find out how they expect data to look like. If they don’t know, you can reach a product manager.

I gave similar advise in other posts, and I will do it again: I strongly believe that the communication between different parts of an organisation is essential for building high quality products and services.

Read-only key/value tables

In this context, by read only we mean that a table won’t change as a part of normal operations. For example, it may:

  • Never change (eg: chemical elements);
  • Rarely change (eg: our organisation’s offices);
  • Change in batch (eg: nightly, once a week).

This rules out the concern that we may have performance problems during insertions.

In a key/value table, normally we have these access patterns:

  • Get a single value by key – in a simple query or in a JOIN;
  • Check if a query exists;
  • Get a list of keys optionally ordered and paginated.

An id column will not help any of these queries, and will slightly slow them down for reasons we already discussed. Thus, a read only K/V table should look like this:

CREATE TABLE constant (
    constant_name VARCHAR(100) NOT NULL,
    constant_value VARCHAR(1000) NOT NULL,
    PRIMARY KEY (constant_name)
) ENGINE InnoDB;

The reality is usually a bit more complex and dirty than clean examples designed for technical writings. For example, a similar table could have a timestamp or it could be some sort of multivalue. Yet, if it’s read only, similar considerations should apply.

See also

Related articles:

Related courses:

Conclusions

Now we know why primary keys are so important in InnoDB and how they are implemented. We know what a key should look like in the general case, and in some special cases, namely: many to many relationships, and read-only K/V tables.

You may also want to check the next article: Indexes bad practices.

As usual, I’d like to have some discussions with you in the comments.

  • Do you already follow these practices?
  • Do you agree with them?
  • Do you have more special cases to add to the list?

Please let me know!

Toodle pip,
Federico

Image credit

Comments (5)

  1. Also, more and more, I go in the direction of using UUIDs for primary keys. This means the PK is a BINARY(16) (please, no VARCHAR(36)…). This takes a little more storage, but is can avoid storing a created_at column. I have a few reasons to recommend that and I might one day do a talk or a blog post about “The Dark Side of AUTO_INCREMENT”. In the meantime, one justification is in [1].

    [1]: https://www.slideshare.net/JeanFranoisGagn/autopsy-of-an-automation-disaster/27

    1. Hi JF!
      I agree that the id generation in the DBMS is a problem. But I have some questions.
      * Do you see a noticeable slowdown in queries by UUID, compared to an INT id?
      * Do you prefer a UUID generated by the application or by MySQL?
      * Which UUID version do you recommend?

      1. Hi Federico, sorry for the late reply, I was not notified of your comment by mail.

        > Do you see a noticeable slowdown in queries by UUID, compared to an INT id?

        I have not checked query speed, and IMHO this is not important. I would be ok with a 10% slowdown to avoid problems with UUIDs.

        > Do you prefer a UUID generated by the application or by MySQL?

        I prefer generated by the application: more flexibility, etc… Also, because you know which data you have INSERTed, it is easier to than use this application-generated UUID to INSERT another row in another table, or others… Also, by generating by the application, the application can choose its timestamp bits, which can avoid having a created_at column in the table.

        > Which UUID version do you recommend?

        I do not mind that much, as long as the timestamps bits are at the beginning of the UUID to INSERT in Primary Key order. This also gives an “un-explicit” index by created_at columns (SELECT […] FROM table WHERE UUID >= some_value AND uuid < another_value will use the PK as the index). A good way of doing that is documented on Percona's blog in [1].

        [1]: https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/

        1. No worries, thanks for replying.
          I really hope that the slowdown is lower than 10%!
          I remember that Percona post. I agree that using UUID has a lot of advantages, including preserving the chronological order in case of failover. I will take a look at this, including the differences between UUID versions.
          As a consultant, I can tell people about UUID benefits, but it may imply too much work for them. This feature should be implemented in ORMs.

Leave a Reply

Your email address will not be published. Required fields are marked *