Subscribe to this thread
Home - General / All posts - Bug in evaluating CASE x expression with NULL
tjhb

8,410 post(s)
online
#13-Nov-18 22:04

Test setup (9.0.168.4):

CREATE TABLE [test]

    (

    [mfd_id] INT64,

    [v] INT32,

    INDEX [mfd_id_x] BTREE ([mfd_id])

    )

;

INSERT INTO [test] ([v])

VALUES (1), (NULL), (3), (NULL)

;

Test query:

SELECT

    [v],

    CASE [v]

        WHEN NULL THEN -1 -- returns NULL

        ELSE [v]

    END AS [a],

    CASE 

        WHEN [v] IS NULL THEN -1 -- returns -1

        ELSE [v]

    END AS [b],

    Coalesce([v], -1) AS [c]

FROM [test]

;

The results in [a] in [b] should both be -1, but result [a] incorrectly returns NULL.

Coalesce in [c] works as expected.

I'll also send in a report.

tjhb

8,410 post(s)
online
#14-Nov-18 02:11

I should have been more circumspect.

After thinking about it some more, I can see how it could be intentional.

The explanation would involve insisting that in the construct

CASE x WHEN y THEN z ELSE x END

the implied test is for equality between x and y.

The result of

x = NULL

is neither TRUE nor FALSE, for any x (including NULL), but rather unknown, represented again by NULL. In that case, evaluation should pass to ELSE x, which is in this case is also NULL.

Accepting that didactic logic, the current result would be correct, no bug.

I tend to think that that argument would be too didactic. At first sight it would seem OK (and more natural) for the implied comparator to be IS here, rather than =, where one argument is NULL (or both arguments, since NULL IS NULL).

On the other hand, how could argument values (not types) change the meaning of a comparator? That seems cart before horse. That is possibly a good objection.

So: if it is intended then I think I can see why. I should have asked.

tjhb

8,410 post(s)
online
#14-Nov-18 02:48

Alright, I am certain now that this is intended.

The reason is that Dan had already pointed out a very similar issue offline (almost exactly a year ago Dan...) and after much hair splitting we had come to the conclusion that while Manifold 9 might be being pedantic, it was correct.

But re-reading my comments from that discussion with Dan, I notice there is something wrong in my second post above. The second sentence here is wrong:

The result of

x = NULL

is neither TRUE nor FALSE, for any x (including NULL), but rather unknown, represented again by NULL. In that case, evaluation should pass to ELSE x, which is in this case is also NULL.

Taking the syntax from above again:

CASE x WHEN y THEN z ELSE x END

Let x (or y, or both) be NULL. So the result of the comparison x = y is also NULL, and THEN z is not evaluated (because the case is not TRUE). But is ELSE evaluated? What does ELSE mean in this context? (I know, this is getting tricky. This is where I got to with Dan last year, and couldn't put the question clearly enough to try for an answer.)

After comparing x and y (one or both of which is NULL), we have the result NULL. Now we hit ELSE... but not so fast. Is NULL TRUE? No. Is it FALSE? No. We know what FALSE ELSE TRUE means (it is TRUE). And we know what TRUE ELSE FALSE means (it is TRUE as well, short-circuited in this case). But what does NULL ELSE TRUE mean?

I strongly suspect that NULL ELSE TRUE (and NULL ELSE FALSE) must evaluate to NULL (unknown). Since, if the answer is unknown before the ELSE, how can adding a second possibility (whether TRUE or FALSE) make it any clearer? It can't. It is still unknown.

So in the example above, we never get THEN z (clearly), and we also never get ELSE x, regardless of the value of x (whether NULL or something determinate), because ELSE is never evaluated.

Instead, we get the result of the first comparison, short-circuited. Which is NULL.

A concrete example to make it clearer:

CASE 1 WHEN NULL THEN 2 ELSE 3 END

or equivalently

CASE NULL WHEN 1 THEN 2 ELSE 3 END

The result of this expression is not 3, but NULL (the result of 1 = NULL, unknown). ELSE doesn't arise.

tjhb

8,410 post(s)
online
#14-Nov-18 03:12

By the way, the equivalence of those last two examples makes it clear why the comparator here should be =, not IS.

That is because while we can evaluate

1 IS NULL

which is FALSE, we can't evaluate

NULL IS 1

because that is a syntax error.

For the relationship here to be symmetrical, it must be equality/inequality, not IS.


Last point: Manifold 9 behaves significantly diffferently (and if I have understood, more correctly) from Manifold 8 in this regard. It may take some getting used to!

tjhb

8,410 post(s)
online
#14-Nov-18 03:33

For the relationship here to be symmetrical, it must be equality/inequality, not IS.

OK you could extend IS to operate symmetrically (1 IS 1, etc)--though since IS is about values in SQL, not about objects as for related concepts in other languages, it would then be a mere synonym for '=', except in the specific case of x IS (NOT) NULL, and that would be bad. In any case I think the earlier

how could argument values (not types) change the meaning of a comparator?

was a stronger point.

adamw


8,259 post(s)
#14-Nov-18 09:02

Well.

You reasoned everything out correctly, however, ironically enough it looks like we do have a bug there, it is just different from what you thought it was in the first post.

Let's walk through CASE briefly.

CASE x WHEN y THEN z ... END does use equality, yes. That's the definition of the short form of CASE that everybody uses, with the long form being CASE WHEN x THEN y ... END.

Yes, if we compare NULL with NULL, we get NULL. With NULL meaning 'an unknown value', we cannot tell whether the first unknown value is equal to the second unknown value (they could be equal or they could be different), so we get 'we do not know whether the values are equal' = NULL as a result. This means that when we write CASE 1 WHEN NULL THEN y ... END, or CASE NULL WHEN 1 THEN y ... END, or CASE NULL WHEN NULL THEN y ... END, we won't get y, because neither of 1=NULL, NULL=1, NULL=NULL are true and we get y only when they are true.

However, while, as you say, we are following a similar logic for ELSE and so for CASE NULL WHEN 1 THEN y ELSE z END we return NULL, because we don't know what the value of NULL is, it is neither equal nor unequal to 1, so we tell 'we don't know which branch in CASE to even take' -- this is actually wrong and ELSE should work as a catch-all 'for when you don't know' branch, and the mentioned CASE should proceed into ELSE and return z. We should proceed into ELSE even if previous conditions are NULL and not FALSE because that's how CASE ... ELSE is defined and that's what other databases do. That's also consistent with proceeding into ELSE even if previous conditions are NULL and not FALSE in the long form of CASE - which we do handle correctly. This is a corner case scenario (no pun intended) in the short form of CASE which we got wrong. We will fix it.

Thanks for the thread!

tjhb

8,410 post(s)
online
#14-Nov-18 10:12

Excellent. Thank you!

So I did go astray in my third post.

ELSE does not mean "or if the preceding is FALSE" as I had tried to see it there, but "or if the preceding is not TRUE".

So NULL ELSE TRUE means TRUE, and NULL ELSE FALSE means FALSE. (Not NULL in either case. NULL should not short-circuit.)

This is more useful and, as you say, consistent with behaviour in the long form of CASE.

Yay (and thanks again for reading).

tjhb

8,410 post(s)
online
#14-Nov-18 21:54

A related observation.

Using the table setup in the first post above, what should this query return?

SELECT * FROM [test]

WHERE [v] IN (1, NULL);

It should only return the row where [v] = 1, not also the 2 rows where [v] IS NULL.

Like short-form CASE, the IN operator tests for equality. (It must do, for the same reasons.) For the rows where [v] IS NULL, the test [v] = NULL returns NULL (unknown).

To get the rows where [v] IS NULL as well, we must ask for them explicitly.

SELECT * FROM [test]

WHERE [v] = 1 OR [v] IS NULL;

Manifold User Community Use Agreement Copyright (C) 2007-2017 Manifold Software Limited. All rights reserved.