Subscribe to this thread
Home - General / All posts - Another Fehily question (grouping rows with GROUP BY) and the CASE operator
dyalsjas95 post(s)
#17-Feb-18 15:48

I'm having trouble understanding how to integrate a CASE operator into a GROUP BY query.

Here's the Fehily text listing (6.15):

SELECT

    CASE

      WHEN sales IS NULL

        THEN 'Unknown'

      WHEN sales <= 1000

        THEN 'Not more than 1,000'

      WHEN sales <= 10000

        THEN 'Between 1,001 and 10,000'

      WHEN sales <= 100000

        THEN 'Between 10,001 and 100,000'

      WHEN sales <= 1000000

        THEN 'Between 100,001 and 1,000,000'

      ELSE 'Over 1,000,000'

    END

      AS "Sales category",

    COUNT(*) AS "Num titles"

  FROM titles

  GROUP BY

    CASE

      WHEN sales IS NULL

        THEN 'Unknown'

      WHEN sales <= 1000

        THEN 'Not more than 1,000'

      WHEN sales <= 10000

        THEN 'Between 1,001 and 10,000'

      WHEN sales <= 100000

        THEN 'Between 10,001 and 100,000'

      WHEN sales <= 1000000

        THEN 'Between 100,001 and 1,000,000'

      ELSE 'Over 1,000,000'

    END

  ORDER BY MIN(sales) ASC;

I've gotten the query to work by replacing or deleting the CASE operators and inserting a field name.

SELECT

    COUNT(*) AS [Num_titles]

  FROM [titles]

  GROUP BY [sales]

  ORDER BY MIN([sales]ASC;

How can I get GROUP BY and CASE to work together?

tjhb
8,009 post(s)
#17-Feb-18 21:48

Another very good question!

Manifold 9 departs here from the standard SQL that Fehily uses throughout his book-and also from Manifold 8.

The departure is (a) partly because of the way Manifold 9 SQL is wired (which has its own good reasons), and (b) partly because Manifold 9 SQL does not need a construction like the above, which is, let's face it, artificial and convoluted.

Take point (b) first. (Point (a) will follow in a second post.) Why is there a CASE expression in the SELECT list that matches verbatim the CASE expression in the GROUP BY expression?

Because the general rule is, that in a query with GROUPing, each expression in the SELECT list either must have been used to define a GROUP, else must be enclosed within an aggregate function in the SELECT.

(The rule does not also run the other way. That is, an expression can be used for GROUPing, without necessarily appearing in the SELECT list. That's fine, though in practice, the match will run both ways, as in Fehily's query above.)

At first glance, this looks wasteful. In the query above, it looks as though the same CASE expression will be evaluated twice. In fact an SQL parser will normally detect the exact match, and "look through" the SELECT to the GROUP, evaluating any matching expressions only once. It doesn't look efficient, but it normally is.

We discussed this in a Radian Studio beta thread, not for CASE expressions but for expressions generally.

Adam's comments there were valuable and some are worth repeating here.

First, he pointed out that the requirement to match expressions exactly, if they appear in both SELECT and GROUP BY, is

a hack. It relies on the expressions in the SELECT list and the GROUP list coinciding from the point of view of the parser, not logically.

Example log from SQL Server:

...

select a, sum(b) from t group by a

-- works

select a+1, sum(b) from t group by a+1

-- works

select a+1, sum(b) from t group by 1+a

-- fails with an error, the expressions are logically the same but the parser thinks they are different (because it does not have the right tools to reason about them, they come in later phases)

select a+1, sum(b) from t group by a+1+0

-- fails for the same reason

You get the point. I hadn't thought about this before Adam pointed it out.

OK, a hack, but is it necessary? Not any more. Adam again (same post):

It does not feel right to *have* to repeat the same - possibly complex - expression in two places basically via copy and paste, because anything else might scare the parser. We think it is better to write the expression once and then refer to it by name. This whole thing with allowing expressions in GROUP as long as they coincide exactly with expressions in the SELECT list exists solely as a relic from those times when nobody could do nested SELECTs.

The part I have put in bold sums up the whole matter nicely, and also points to what we should do, to write a query that does not rely on a hack.

We should use a subquery (a nested SELECT).

So to rewrite Fehily's query for Manifold 9 SQL (this should equally work in Manifold 8, if single quotes are changed to double quotes):

SELECT

    [Sales category],

    COUNT(*) AS [Num titles]

FROM

    (

    SELECT

        [sales],

        CASE

            WHEN [sales] IS NULL

                THEN 'Unknown'

            WHEN [sales] <= 1000

                THEN 'Not more than 1,000'

            WHEN [sales] <= 10000 

                THEN 'Between 1,001 and 10,000'

            WHEN [sales] <= 100000

                THEN 'Between 10,001 and 100,000'

            WHEN [sales] <= 1000000

                THEN 'Between 100,001 and 1,000,000'

            ELSE 'Over 1,000,000'

        END AS [Sales category]

    FROM [titles]

    )

GROUP BY [Sales category]

ORDER BY MIN(sales) ASC

;

[For anyone following along, we are looking at listing 6.15 in Fehily, SQL Database Programming, p.175 in the print edition, location 3992 in Kindle.--@dyalsjas, it would be nice to the listing or section number with Fehily questions when you think it makes sense.]

tjhb
8,009 post(s)
#17-Feb-18 23:20

So that is what we can do in Manifold 9 SQL, to group by a complex expression (it could be CASE or something else) which we also want to return in the SELECT list. Produce the complex expression in a subquery, name it, and pass to the outer query for GROUPing.

Why do we have to do it that way in 9, while in Manifold 8 (and in standard SQL) we can use the clunky/hacky method of writing exactly the same expression twice, expecting the SQL compiler to "look through", notice the matching syntax, and just evaluate once?

That is about how Manifold 9 is wired (more accurately, SQL9, the language of the underlying Radian engine and its compiler).

To cut a long story short (and bearing in mind that everything here that is correct is due to Adam), logically distinct parts of a query are compiled and executed in Manifold 9 SQL with a much greater degree of separation than in Manifold 8 SQL or in other SQL dialects.

Those distinct parts, or nodes, can include FROM, WHERE, GROUP, ORDER, and SELECT--as well as any subqueries and FUNCTIONs. (I'm not sure to what extent JOINs or embedded COLLECT statements also constitute separate nodes. I suspect they are both more tightly coupled with one of the other types of node. Anyway you get the idea.)

As I understand it, this increased separation is tightly bound up with the ability of the Radian engine to schedule logical operations amongst many threads, possibly in different languages, executed all at once, and possibly on a combination of CPU and GPU.

So to enjoy the benefits of Radian parallelism, we sometimes must adapt SQL coding patterns to allow the Radian engine to do its work. In other cases we may have a free choice, but one style or pattern may prove significantly faster (perhaps many times faster) than another. This takes some extra learning and experimentation but the performance gains make it clearly worthwhile.

tjhb
8,009 post(s)
#18-Feb-18 07:07

In the second to last para, "executed all at once" should have read e.g. "despatched and completed asynchronously" or "with overlapped execution". The hard part is the scheduling, which is what Radian does.

dyalsjas95 post(s)
#18-Feb-18 15:17

Thank you for the comprehensive reply. It's helped me understand more about SQL9. I'll make sure to add more about where in the Fehily text to find context for the questions I'm asking. As a parallel point, I wasn't part of the Radian beta group, so I don't have access to those discussion threads. I'm not sure how open the Manifold team wants to make them

lionel

444 post(s)
#18-Feb-18 16:54

Hi

where i can find the data asset to test the SQL query ?

Thank's


join image "Because my dad promised me" interstellar from Manifold: Time by Stephen Baxter. power Math destruction

dyalsjas95 post(s)
#18-Feb-18 17:27

It’s available from the authors website at

http://www.fehily.com/books/sql.html

The MS Access .mdb has been imported into a Manifold .map file and is available on the Manifold products download website at

http://www.manifoldsoftwarelimited.mobi/updates/books2.map

lionel

444 post(s)
#18-Feb-18 21:35

thank's


join image "Because my dad promised me" interstellar from Manifold: Time by Stephen Baxter. power Math destruction

lionel

444 post(s)
#17-Apr-18 00:45

Attachments:
Frehhily_listing652_CASE.png
frehily_652_CASE.mxb


join image "Because my dad promised me" interstellar from Manifold: Time by Stephen Baxter. power Math destruction

Manifold User Community Use Agreement Copyright (C) 2007-2017 Manifold Software Limited. All rights reserved.