Subscribe to this thread
Home - General / All posts - cannot think in SQL
hugh
200 post(s)
#09-Sep-18 06:57

-- Why does this not work? Has to have obvious solution but I can't see it. Inner SELECT works but UPDATE fills in nulls.

UPDATE [xydata]

SET [z] =

( SELECT [xyzdata].[z]

FROM

[xyzdata] INNER JOIN [xydata]

ON [xydata].[id] = [xyzdata].[id]

);

-- would try

SELECT Coalesce([xyzdata].[z])

-- but inner SELECT already works

-- Same result in M8

Attachments:
xyzM9.mxb

tjhb
10,094 post(s)
#09-Sep-18 08:43

Because there is no relationship at all between rows in the outer table (addressed by UPDATE) and the inner table. No link whatsoever.

Look at SET [Z] =... from the perspective of a given row. What does it get? A whole table.

tjhb
10,094 post(s)
#09-Sep-18 09:11

What you need, within the query, is a table that can be updated.

So, it must have a source column (or expression) and a target column to be updated.

For Manifold 9 it must also have a column with a BTREE index so that target rows can be enumerated. mfd_id is usually good.

hugh
200 post(s)
#09-Sep-18 17:59

Thank you, that clarifies a lot but I am still being dense. Seems like both [xydata] and [xyzdata] are tables:

imported xydata.csv    imported xyzdata.csv 

"id","x","y"           "id","x","y","z" 

1,44,12                1,3,0,0 

2,64,34                2,71,32,2236 

3,42,16                3,71,40,2804 

4,53,28                4,25,14,362 

5,53,24                5,78,30,2340 

6,70,29                6,91,54,4960 

7,32,21                7,25,20,488 

8,30,8                 8,82,43,3526 

9,34,10                9,53,20,1086 

10,65,36               10,0,0,0

and both get indexed like so and column z added to [xydata]

--SQL9

ALTER TABLE [xydata] (ADD [mfd_id] int64);

ALTER TABLE [xydata] (ADD INDEX mfd_id_x BTREE (mfd_id));

ALTER TABLE [xyzdata] (ADD [mfd_id] int64);

ALTER TABLE [xyzdata] (ADD INDEX mfd_id_x BTREE (mfd_id));

ALTER TABLE [xydata] (ADD INDEX id_x BTREE (id));

ALTER TABLE [xyzdata] (ADD INDEX id_x BTREE (id));

ALTER TABLE [xydata] 

  (ADD [z] int32);

INTO works to create a new table

--SQL9

       SELECT [xydata].[id][xydata].[x][xydata].[y][xyzdata].[z]

        INTO [xyzdata2]

        FROM 

        [xyzdata] INNER JOIN [xydata] 

            ON [xydata].[id] = [xyzdata].[id]

but this does not work as UPDATE . . . SET because "Expression should return table with single field".

--SQL9

UPDATE [xydata] 

   SET [z] = 

     (

       SELECT [xydata].[id][xydata].[x][xydata].[y][xyzdata].[z]

        INTO [xyzdata2]

        FROM 

        [xyzdata] INNER JOIN [xydata] 

            ON [xydata].[id] = [xyzdata].[id]

     );

There is still something really basic I am not understanding. Thank you for helping me try

Attachments:
xyzM9.mxb

tjhb
10,094 post(s)
#09-Sep-18 23:52

You're not being dense at all, you're being extremely thorough, it's great--there's just something that isn't clicking yet. It will!

Try thinking about it like this.

Let's say you had not one field (a.k.a. column) to update, but several.

So after the SET command, you would list all of the target fields, each followed by an = sign, then an expression defining its new value. (The expression could be a another field, a literal, or something involving functions or conditions for example.)

The target fields are on the left of the = sign, the expressions are on the right. The = can also be thought of as ->. It's a mapping.

The important thing is that the mapping is one-to-one. Each field on the left is matched by exactly one expression on the right.

So for every row in the target table, each target field is mapped to its own expression.

Now look at what you've got above. The list of target fields has just one member, z. But on the right, you have a table with 4 fields and (potentially) many rows.

What you're literally asking is: for each row in the target table [xydata], map the field [z] to the whole table of 4 x N rows on the right.

That can't be done. That's what the error message "Expression should return table with single field" is telling you.

Pausing there.

tjhb
10,094 post(s)
#10-Sep-18 00:00

The next thing is, there's absolutely nothing wrong with an expression on the right being a table, provided that it is a table with just one column and just one row.

In fact in your example, the (...) around the SELECT statement tell the SQL compiler to expect a table that can be cast as a single value. It can't here, since it's 4 x N values, so again you get "Expression should return table with single field". One field/column would be fine.

There's also the problem that your SELECT statement is currently SELECT INTO, which is incorrect in the context of an UPDATE query. Like UPDATE, SELECT INTO is an action query; you don't want to perform an action inside the (...), you just want to create a virtual table. One action query cannot contain another. Let's put that aside.

Pausing again.

tjhb
10,094 post(s)
#10-Sep-18 00:42

Here though, the task is different.

You have two tables, and you want to match their rows in pairs on a shared value in the ID field, then for each matched pair, copy the value in field [z] from one table to the other.

The join is good, that does the matching.

Note that the field [z] must already exist in the target table [xydata], before you can update it. If it doesn't exist yet, create it.

I think the missing concept here might be like this. You are thinking "I need to update the [xydata] table using values from the [xyzdata] table". That's true in substance, but it's not how it's done.

Instead what you should think is "I need to create a virtual table that combines rows from [xydata] with matching rows from [xyzdata], then copy [z] values within each row".

If the virtual table is suitably constructed (there are caveats), then its records will consist of references to actual records in the source tables, meaning that changes to the virtual table will be reflected in the source (or sources--you can update more than one source table at the same time if that is ever useful).

That's the important concept.

The last thing to mention is in my second post in this thread. The SQL engine must be able to identity which row to update, and for Manifold 9, that means that the virtual table must include at least one source column which has a BTREE index defined on it. Usually this would be [mfd_id], but it can be something else.

To summarize: you need to build a virtual table, by joining [xydata] and [xyzdata] on matching [ID], which includes at least [xydata].[z], [xyzdata].[z] and [xydata].[mfd_id]. One of the two [z] columns will need to be given an alias--say, AS [z']-- to avoid naming ambiguity.

Then your SET statement can copy values from [z'] to [z] within each virtual row, and this will be reflected in the source table.

hugh
200 post(s)
#10-Sep-18 01:44

this is so helpful, thank you. I'm starting to understand. The ideas are somewhat clearer to me than the code however. After long misunderstanding I have somewhat come to grips with query results being not just console output but a table of sorts (virtual though it be) capable of further SQL manipulation.

This is key:

To summarize: you need to build a virtual table, by joining [xydata] and [xyzdata] on matching [ID], which includes at least [xydata].[z], [xyzdata].[z] and [xydata].[mfd_id]. One of the two [z] columns will need to be given an alias--say, AS [z']-- to avoid naming ambiguity.

For one thing you have clarified for me the status of aliases [AS] in SQL code now that I understand better the importance of correctly and uniquely naming columns as something the SQL engine needs. And some better understanding of what the SQL engine does with this kind of entity since it is more than just an object naming simplification.

At this point I need to see how this might be written in SQL but please don't send an example. Only way I can learn what code does with entities it calls is try it out. I will get back to you in a few days with what I come up with.

tjhb
10,094 post(s)
#10-Sep-18 01:58

but please don't send an example...

I wholeheartedly agree with your approach.

tjhb
10,094 post(s)
#10-Sep-18 02:41

The ideas are somewhat clearer to me than the code however.

But that is much better than the reverse! Much much much.

I have somewhat come to grips with query results being not just console output but a table of sorts (virtual though it be) capable of further SQL manipulation.

Well put. At some point you might want to at least skim the Editable Results Tables topic in the manual, to see how fundamentally the idea is implemented in SQL9.

In a real sense, an UPDATE query is just a special case of the manual examples there. The wiring underneath is the same.

tjhb
10,094 post(s)
#10-Sep-18 01:34

Two more comments. First, you have already added the [z] column in [xydata]. I missed that before.

Secondly--this might be stupid, but FWIW--if you get really stuck in SQL, stop looking at the screen and look at your hands.

In my experience, asking "what is on the left, what is on the right" solves most SQL problems. If I find that I don't immediately know the answer to those questions, then that is the problem--not the syntax.

SQL was clearly invented by people with two hands (and opposable thumbs).

hugh
200 post(s)
#10-Sep-18 01:52

just sent reply to your previous post. Good advice on left/right and I will be thinking about it while trying to figure SQL to figure out my problem

hugh
200 post(s)
#10-Sep-18 10:02

--SQL9

-- it is starting to click.  Doing this:

ALTER TABLE [xydata] (ADD [mfd_id] int64);

ALTER TABLE [xydata] (ADD INDEX mfd_id_x BTREE (mfd_id));

ALTER TABLE [xyzdata] (ADD [mfd_id] int64);

ALTER TABLE [xyzdata] (ADD INDEX mfd_id_x BTREE (mfd_id));

-- note that I did not make an index for [id] in either table

ALTER TABLE [xydata] 

  (ADD [z] int32);

-- ok

-- now understand why in the following the SET [z] can't write (edit) the new value for [z] to [xydata]

UPDATE [xydata] 

   SET [z] = 

     (

       SELECT [xyzdata].[z]

        FROM 

        [xyzdata] INNER JOIN [xydata] 

            ON [xydata].[id] = [xyzdata].[id]

     );

-- something syntactical throwing me off here was this I realized from your saying this (paraphrasing to this context): 

-- "UPDATE is an action query; you don't want to perform an action inside the (...), you just want to create a virtual table. 

-- One action query cannot contain another". 

-- so seems to me like "(...)" nests queries, does not just define scope like function x {function y {  } } 

-- in the more procedural way I usually interpret code

-- Now trying this:

SELECT [xydata].[z][xyzdata].[z] AS [Z2][xydata].[mfd_id][xyzdata].[mfd_id] 

 FROM 

  [xyzdata] INNER JOIN [xydata] 

     ON [xydata].[id] = [xyzdata].[id];

-- resulting table is not writeable since duplicate values make both indices on mfd_id btreedup

  -- so index [id]:

ALTER TABLE [xydata] (ADD INDEX id_x BTREE (id));

ALTER TABLE [xyzdata] (ADD INDEX id_x BTREE (id));

-- then run this query again:

SELECT [xydata].[z][xyzdata].[z] AS [Z2][xydata].[mfd_id][xyzdata].[mfd_id] 

 FROM 

  [xyzdata] INNER JOIN [xydata] 

     ON [xydata].[id] = [xyzdata].[id];

-- and the resulting table is now editable since indices on mfd_id stay btree

-- so I am making progress.  Question for me now is how I could have read 

-- the Join Statements and Editable Results Tables topics in the manual I ----- thought carefully, more than once, and not have gotten it.  Part of it -

-- might be the flexibility one has specifying columns for SELECT where 

-- ability to match uniquely is so important and aliases can help rather ---

-- obscure doing that--made it hard to for me to understand the code 

-- examples.  Also as old-time procedural programmer I tend to read code in much more mechanical way

adamw


10,447 post(s)
#22-Sep-18 09:59

I am late and I get that you probably solved the issue already (the SELECT in the last post above that creates an updatable join just needs an UPDATE on top of it), but here is a slightly different way to do the same thing that is pretty close to what you were doing in the beginning, in case you want it.

As said, the UPDATE in the first post won't work because when it tries to compute the Z value for a particular record in XYDATA, it goes to join XYDATA with XYZDATA *and returns the entire join, that is, the entire table*. As Tim shows, if you change your thinking a bit, you can first do the join, make sure it is updatable (1 to 1), then add the fields you want to update to the join, and then update them. But you can also do it more directly going from "I have a record in XYDATA, let me just reach into XYZDATA, take the Z value I want and put it into the record I have" - it's just that this reaching into a second table is not done using a join, it's done using a simpler WHERE:

--SQL9

UPDATE [xydata] -- this table...

   SET [z] = 

     (

       SELECT [xyzdata].[z]

       FROM [xyzdata]

       WHERE [xydata].[id] = [xyzdata].[id]

          -- ^^ ...provides this value

     );

The above is a modification of the UPDATE in the first post, it will run with or without indexes on ID, but having indexes on ID will make it run faster.

Now, which way is better - UPDATE ... SET ... WHERE ... or UPDATE (SELECT ... JOIN ...) - the second way is faster when updates concern multiple fields and it is more general, but the first way is not that bad either and it might be simpler to understand. Both ways can be adopted to cases when the relation is not 1 to 1 as well (insert an aggregate to choose the value you want out of multiple matching values).

Hope this helps.

hugh
200 post(s)
#22-Sep-18 22:23

I finally got it thanks to Tim's clear and patient explanation and this helps too. Question is why it was so hard for me get it--normally I eventually figure out SQL problems via the manual, Fehily examples, and repeatedly trying things out. In this case I had previously looked at the Editable Results Tables topic Tim mentioned a couple of times but it did not clarify.

My current theory about my mental block (just a theory, once you see the solution it is hard to go back) is my constant use of Table > Relations in M8 made me want to see a simpler multiple column UPDATE operation that made no sense. M8 Table > Relations has always been a favorite since it is 4 clicks to set up and one more click for each column to add. Then right click Flatten All and if needed right click column Rename (screen shot attached, though usually it is a drawing I am updating and if many columns at the end I batch a script of SQL ALTER TABLE .. RENAME COLUMN queries).

I guess something like M8 Table > Relations could be a wizard but I'm not sure where it would go in 9 -- maybe in Edit>Schema?

Attachments:
M8relations.png

hugh
200 post(s)
#22-Sep-18 22:46

no -- of course not Edit>Schema is just one table.

hugh
200 post(s)
#22-Sep-18 23:11

though maybe <new field> could allow opening a wizard to bring in a field from another table??

hugh
200 post(s)
#24-Sep-18 07:04

<new field> from another table would be sort of like M8 table>relations -- and thinking about what Tim said earlier in the thread:

In my experience, asking "what is on the left, what is on the right" solves most SQL problems. If I find that I don't immediately know the answer to those questions, then that is the problem--not the syntax.

Problem is that M8 table>relations gives an immediate answer (current table on the left, click to select fields from table on the right to add) which makes it hard to think in terms of SQL. For one thing if what is on the left is one <new field> then one SQL solution works (Adam's). But clicking in more <new field>s as M8 allows then Tim's SQL is needed.

adamw


10,447 post(s)
#24-Sep-18 10:59

This is not the first time the topic of a possible analog of Table - Relations from 8 pops up. Currently, our thinking is this:

Related fields can be done as computed fields now. But that requires knowledge of SQL - and not something too simple either since we are talking about computed fields - so that's quite a big barrier to jump. We could provide the UI to add such fields and generate necessary SQL automatically. What's stopping us is two things. First, we want to rework the Edit - Schema dialog (we accumulated quite a number of things to add / change there), so whatever we want to do with relations is going to wait until that rework. Second, we aren't sure we want related fields to be done as computed fields in the long run. We would have felt much better about generating a join query, if we can find the right UI metaphors for that - because the join query is something you fire up, use and then throw away, while computed fields are something that you define and then live with, they fire up when you add or change records, this produces delays (understandable, but do you really need to have the values of related fields the moment you add a record? why not get to know them when you will actually use them instead?), and computed fields also don't work for data sources other than MAP files - unlike queries.

Long and short, you can have related fields right now if you want, this requires some SQL work, we realize this is a big barrier that puts related fields out of reach for many users, we want to put them within reach for everyone but there are multiple ways to go about it, neither of them very simple, and we didn't decide which way to go with yet.

hugh
200 post(s)
#26-Sep-18 04:03

Understand and thanks for the update and insights on the complexities involved.

because the join query is something you fire up, use and then throw away

Seems essential to be like the clean finish of "flatten all" in 8.

In the meantime some of us SQL challenged folk might just need some examples to tide us over. Maybe a temporary page like Art Lembo might do "How do I do that in the current version of 9 (at the moment) that I could do easily in 8." With explanations of the underlying SQL differences like tjhb did so well at the beginning of this thread.

Manifold User Community Use Agreement Copyright (C) 2007-2021 Manifold Software Limited. All rights reserved.