What does NULL mean in SQL?

SQL NULL is not less consistent
than Escher’s cube

There have been endless discussions already if NULL should be implemented in relational databases, and if it should even be included in the relational theory. My favourite quote on this subject is from the person who helped Ted Codd to divulgate – and probably refine – his relational algebra:

I hate NULL.

Chris Date

I tend to hate it too, except that sometimes it is necessary. But my point is totally different from Date’s. I have nothing against the idea of NULL, I just think that it is wrong by design in SQL.

Because NULL has two incompatible meanings, and depending from the context, the SQL standard and implementations randomly choose one of them.

The theory

Ted Codd didn’t propose anything corresponding to NULL in his fist version of the relational algebra. In a later document, he proposed two markers. A marker is something that is placed where you’d expect a value, but is not a regular value. These markers were:

  • I-MARKER: Non-applicable, absent value;
  • A-MARKER: Applicable but unknown value.

This idea was later abandoned by Codd. Anyway, it has very little to do with SQL design choice of merging these markers into one.

Personally, I tend to believe that Codd’s idea would be a bit overcomplicated for most common practical cases. But it is correct. SQL should have chosen one of the following: implementing both these markers, implementing only one of them with consistent semantics, or implementing none.

SQL’s inconsistency

Again: my point is that SQL merged two ideas into one, and mixed their semantics in an inconsistent way, depending on the context. I will describe the behaviours of NULL in different contexts.


NULL = NULL returns NULL. In a previous article I explained how to properly compare values that could be NULL.

This makes perfectly sense if NULL is unknown. Is an unknown value equal to an unknown value? The answer is unknown: they could be the same value or not.

But if NULL is a missing value, this is puzzling. Is a missing value identical to another missing value? The answer in my opinion should be yes. But for sure, the answer shouldn’t be a missing value itself.

However, consider this expression:

null_column >= 100 OR null_columns < 100

It returns NULL. This defies the idea that comparisons make sense if NULL is an unknown value. Comparisons are simply inconsistent in all cases.


For NULL-safe comparisons, the IS NULL and IS NOT NULL operators can be used. They are synonyms for IS UNKNOWN and IS NOT UNKNOWN, which suggests that NULL is an unknown value.

Scalar expressions

Scalar expressions seem to consider NULL as an unknown value. Scalar subqueries are a different matter, see below. For example:


If NULL was a missing value, it would behave exactly as 0, so the above expression would return 1.

Aggregate functions

Aggregate functions are those that accept any number of 1-column rows and return exactly one value.

COUNT(*) returns the number of rows. If one row only contains NULLs, it still counts as one. So, in this context, NULL is considered as an unknown existing value.

COUNT(column) ignores NULLs. This means that, in this context, NULL is considered as a missing value.

For a funny fact about COUNT() and NULL, see this old article by Baron Schwartz.

Other aggregate functions consider NULL as a missing value – so it is ignored by MIN(), MAX(), AVG(), etc.


GROUP BY and DISTINCT treat NULL as a regular value.

This could be consistent with both interpretations of NULL.


Some DBMSs return NULLs before values, others return values first. I’d like to see more consistency, but in both cases it is just a practical choice that does not contradict any interpretation of NULL.

Outer joins

Left, right and full outer can return rows that have no match in the other table. When this happens, columns from the other table are populated with NULLs.

This behaviour treats NULLs as absent values.

It is interesting to note that, while in most cases this is just fine, sometimes this behaviour is ambiguous. Consider the following query:

SELECT t1.a, t2.b
    FROM t1 LEFT JOIN t2
    ON t1.a = t2.b OR t2.b IS NULL;

In the results, it is impossible to distinguish an absent match in t2 from a match in t2 where the b column is NULL.

This shows that, even if NULL represents an absent value, this is not enough to answer joins properly, as it doesn’t say anything about why the value is missing.

Scalar subqueries

Scalar subqueries are nested queries that return one row consisting of one column. In practice, they return a single value, or NULL in case no value was found. And this is the trap: normally, you don’t know if a subquery returned NULL because it found it in a table, or because it found nothing.

As mentioned for joins, at least one more marker would be needed to eliminate ambiguity.


SELECT DISTINCT always treats NULL as a normal value. This could be coherent with considering NULL as a missing value or not. It is incoherent with the idea of treating it as an unknown value.

Primary keys don’t allow NULL columns. Primary keys logically identify each row, this is one of the pillars of relational algebra. So this is consistent with treating NULL as an unknown value. But if the primary key consists of multiple columns, it should be allowed to store NULL markers – as long as no row only consists of NULLs and unicity is certain. Of course this is just a theoretical objection, since a practical implementation would probably be not desirable.

Some DBMSs allow to insert any number of NULLs in a UNIQUE index. This is consistent with treating NULL as a missing value: an absence doesn’t duplicate another absence.

Other DBMSs only allow one NULL per duplicate index. This is inconsistent for all NULL interpretations. Not with absences, as mentioned earlier. And not with unknown values: an unknown value could duplicate another unknown value, but it could also duplicate any known value – the chances are exactly the same. So NULL shouldn’t be allowed if the table contains more than one row.

Some DBMSs, like Db2, don’t allow any NULL value in UNIQUE indexes. This can be a good idea for any interpretation of NULL.

About specific implementations

Every SQL implementation is different, and none of them reflects a big portion of the standard. A couple of them deserve some words.

All of the below oddities is documented and should be considered as a design choice, not a bug.


PostgreSQL has several markers, or special values. NULL is special even between them.

SELECT FLOAT8 'Infinity' = FLOAT8 'Infinity';

This expression returns true, but this is wrong. Some infinities are bigger than others. PostgreSQL allows us to talk about infinity, but it treats is like a regular number.

If you consider NULL as an absence, the result is funny. The number of points in a plane is the number of points in a straight line, but two absences are not equal.


Oracle is fantastic in this respect. Sometimes it treats NULL as inconsistently as standard SQL. Sometimes it treats it as… an empty string! Despite this, the documentation says:

Do not use null to represent a value of zero, because they are not equivalent.

I’m not going to dig more into this absurdity, so for more info see Oracle documentation.


SQLite is kind enough to let us decide if NULL should be treated as one value in SELECT DISTINCT or not. By default it is. To change this behaviour, we can change the NULL_ALWAYS_DISTINCT macro and recompile the code.

See also

Photo credit

Leave a Reply

Your email address will not be published. Required fields are marked *