PostgreSQL hstore: key/value in relational tables

Last Updated on

hstore is a PostgreSQL extension that adds support for a key/value data type, and a set of operator and functions that are useful to work with key/value logics. In practice, hstore makes it possible to store key/value pairs in relational tables. This article is a practical introduction to hstore.

Forth Bridge is very long, but it seems to have an incredibly solid structure
Forth Bridge, in Scotland, shows the importance of a sound columnar structure.

Use cases

If a table has an hstore column, every row can have a set of key/value pairs. You can use them as attributes that don’t have to be the same for all the rows. In other words, the schemaful relational data model becomes partially schemaless when we have hstore columns.

But we will see that key/value pair don’t have to be completely schemaless. Thanks to CHECK constraints and UNIQUE constraints, we can define precise rules about the keys and their values. If we do this, hstore becomes an alternative to table inheritance for storing records with some common columns in the same table. We may want to do this, for example, to store a catalogue of products with some common attributes (name, description, price…) plus a set of attributes that depend on the product type (a car and a telly have indeed different characteristics).

PostgreSQL hstore: key/value data in a column of a relational table. A collection in every row.
A key/value collection in every row

An hstore tutorial

Let’s explore hstore with an introductory tutorial.

Installing hstore

CREATE SCHEMA hstore_example;
SET search_path TO hstore_example;
CREATE EXTENSION hstore SCHEMA hstore_example;

As you can see, we installed hstore with CREATE EXTENSION and we bounded it to the hstore_example schema.

Creating hstore columns

We have now an hstore data type. We can create hstore columns just like we create any other columns, they have nothing special.

CREATE TABLE product (
    id SERIAL,
    name VARCHAR(50) NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    type VARCHAR(20) NOT NULL,
    characteristics HSTORE NOT NULL,
    PRIMARY KEY (id)
);

Any number of hstore columns is allowed, an hstore column default value can be specified, hstore arrays are permitted, and so on.

Indexes on hstore

At this point we need to index hstore columns:

CREATE INDEX idx_characteristics
    ON product
    USING GIN
    (characteristics)
;

To index hstore columns we canhstore use GIN or GIST indexes. As a general rule, GIN is usually sensibly faster for reads, but depending on your workload sometimes it could even be slower. Modifying GIN index is slower than modifying a GIN index. If you are not satisfied with your performance, it may be worth testing your workload with both types of indexes.

More about indexes after we discuss the basic hstore operations.

Inserting hstore data

Let’s see the syntax we need to insert some test data:

INSERT INTO product
        (name, price, type, characteristics)
    VALUES
          ('Yellow Shirt', 15.00, 'shirt', 'size => "M", color => "yellow"')
        , ('White Shirt', 15.00, 'shirt', 'size => "S", color => "white"')
        , ('McPiper Biscuits', 4.50, 'shortbread', 'brand => "McPiper", flavor => "Original Highlander"')
        , ('Phone X', 65.00, 'phone', 'brand => "Acme", color => "white", weight => "200", os => "WarnerOS"')
        , ('Phone Y', 70.00, 'phone', 'brand => "HAL Phone", color => "black", weight => "150", os => "Odissey"')
        , ('Mysterious Object', 1.00, '?', '')
;

So the rules are simple:

  • hstore values are written as strings;
  • The string can be empty (we can avoid NULL here, see What does NULL mean in SQL?);
  • Each K/V pair is written as 'key => "value"' ;
  • Pairs are separated by commas;
  • Spaces outside of values don’t matter, use them if you like them (I do).

Selecting hstore data

We can filter the rows based on its properties:

tutorial=# SELECT *
tutorial-#     FROM product
tutorial-#     WHERE type = 'phone' AND characteristics->'os' = 'Odissey'
tutorial-# ;
 id |  name   | price | type  |                             characteristics                              
----+---------+-------+-------+--------------------------------------------------------------------------
  5 | Phone Y | 70.00 | phone | "os"=>"Odissey", "brand"=>"HAL Phone", "color"=>"black", "weight"=>"150"
(1 row)

All properties are stored in hstore as strings. If we want to test them using the correct data type, we need to convertthem:

tutorial=# SELECT *
tutorial-#     FROM product
tutorial-#     WHERE type = 'phone' AND (characteristics->'weight')::Integer < 200
tutorial-# ;
 id |  name   | price | type  |                             characteristics                              
----+---------+-------+-------+--------------------------------------------------------------------------
  5 | Phone Y | 70.00 | phone | "os"=>"Odissey", "brand"=>"HAL Phone", "color"=>"black", "weight"=>"150"
(1 row)

We can select the rows that have a certain key:

tutorial=# SELECT *
tutorial-#     FROM product
tutorial-#     WHERE characteristics ? 'weight'
tutorial-# ;
 id |  name   | price | type  |                             characteristics                              
----+---------+-------+-------+--------------------------------------------------------------------------
  4 | Phone X | 65.00 | phone | "os"=>"WarnerOS", "brand"=>"Acme", "color"=>"white", "weight"=>"200"
  5 | Phone Y | 70.00 | phone | "os"=>"Odissey", "brand"=>"HAL Phone", "color"=>"black", "weight"=>"150"
(2 rows)

Many more operations are supported, but here we are only covering the most common cases.

Modifying hstore properties

We can completely overwrite an hstore value, just like any other value (numbers, strings…). But in most cases we only want to change, add or delete a specific property. Let’s see how to perform these operations.

Adding a new property or modifying an existing property is done the same way:

UPDATE product
    SET characteristics = characteristics || 'color => "grey"'
    WHERE id = 5
;
UPDATE product
    SET characteristics = characteristics || 'os_version => "10.0"'
    WHERE id = 4
;

To delete a property:

UPDATE product
    SET characteristics = delete(characteristics, 'weight')
    WHERE type = 'phone'
;

Advanced indexes

Let’s see some more advanced indexes, taking advantage of hstore operators and PostgreSQL native features.

Partial indexes

Indexes take space on disk. Bigger indexes can be slower than smaller indexes. If we don’t need to include certain values in an index, it is often a good thing to avoid that.

To only index rows that match a condition:

CREATE INDEX idx_characteristics_with_brand
    ON product
    USING GIN
    (characteristics)
    WHERE characteristics ? 'brand'
;

Functional indexes

The term functional index seems to have disappeared from PostgreSQL documentation some years ago, but I find it very clear (and correct: an expression without side effects is a function).

Here’s how to index a single property:

CREATE INDEX idx_characteristics_color
    ON product
    ((characteristics -> 'color')
;

To index the list of keys present in an hstore value:

CREATE INDEX idx_characteristics_keys
    ON product
    USING GIN
    ((skeys(characteristics)))
;

We are still using GIN because we are indexing an ARRAY.

UNIQUE indexes

It could make sense to build a UNIQUE index on a specific value:

CREATE UNIQUE INDEX unq_characteristics_shirt_color
    ON product
    ((characteristics -> 'color'))
    WHERE type = 'Shirt' 
;

It is also possible (but weird) to build an index on the whole hstore column, just use a normal BTREE:

CREATE UNIQUE INDEX unq_characteristics
    ON product
    (characteristics)
;

CHECK constraints

Adding CHECK constraints to an hstore column:

ALTER TABLE product
    ADD CONSTRAINT chk_characteristics_keys
        CHECK (characteristics->'type' <> 'phone' OR characteristics ? 'size'),
    ADD CONSTRAINT chk_characteristics_weight
        CHECK ((characteristics->'weight')::Integer BETWEEN 50 AND 500)
;

In this case we decide that all objects except for phones must have a size property; and we set a valid range for weight. I believe that CHECK constraints are extremely useful for hstore data.

Foreign keys

We cannot build foreign keys on hstore properties. While indexes can be built on expressions, foreign keys cannot.

References

Conclusions

Do you have experiences or thoughts to share? Did you find anything wrong in the article? Please comment!

Comments to my articles make this website not just one person’s texts, but also a more collective knowledge base.

Toodle pip,
Federico

Photo credit

Leave a Reply

Your e-mail address will not be published. Required fields are marked *