Subscribe to this thread
Home - General / All posts - cannot think in SQL
hugh
153 post(s)
#09-Sep-18 06:57

-- Why does this not work? Has to have obvious solution but I can't see it. Inner SELECT works but UPDATE fills in nulls.

UPDATE [xydata]

SET [z] =

( SELECT [xyzdata].[z]

FROM

[xyzdata] INNER JOIN [xydata]

ON [xydata].[id] = [xyzdata].[id]

);

-- would try

SELECT Coalesce([xyzdata].[z])

-- but inner SELECT already works

-- Same result in M8

Attachments:
xyzM9.mxb

tjhb

8,219 post(s)
online
#09-Sep-18 08:43

Because there is no relationship at all between rows in the outer table (addressed by UPDATE) and the inner table. No link whatsoever.

Look at SET [Z] =... from the perspective of a given row. What does it get? A whole table.

tjhb

8,219 post(s)
online
#09-Sep-18 09:11

What you need, within the query, is a table that can be updated.

So, it must have a source column (or expression) and a target column to be updated.

For Manifold 9 it must also have a column with a BTREE index so that target rows can be enumerated. mfd_id is usually good.

hugh
153 post(s)
#09-Sep-18 17:59

Thank you, that clarifies a lot but I am still being dense. Seems like both [xydata] and [xyzdata] are tables:

imported xydata.csv    imported xyzdata.csv 

"id","x","y"           "id","x","y","z" 

1,44,12                1,3,0,0 

2,64,34                2,71,32,2236 

3,42,16                3,71,40,2804 

4,53,28                4,25,14,362 

5,53,24                5,78,30,2340 

6,70,29                6,91,54,4960 

7,32,21                7,25,20,488 

8,30,8                 8,82,43,3526 

9,34,10                9,53,20,1086 

10,65,36               10,0,0,0

and both get indexed like so and column z added to [xydata]

--SQL9

ALTER TABLE [xydata] (ADD [mfd_id] int64);

ALTER TABLE [xydata] (ADD INDEX mfd_id_x BTREE (mfd_id));

ALTER TABLE [xyzdata] (ADD [mfd_id] int64);

ALTER TABLE [xyzdata] (ADD INDEX mfd_id_x BTREE (mfd_id));

ALTER TABLE [xydata] (ADD INDEX id_x BTREE (id));

ALTER TABLE [xyzdata] (ADD INDEX id_x BTREE (id));

ALTER TABLE [xydata] 

  (ADD [z] int32);

INTO works to create a new table

--SQL9

       SELECT [xydata].[id][xydata].[x][xydata].[y][xyzdata].[z]

        INTO [xyzdata2]

        FROM 

        [xyzdata] INNER JOIN [xydata] 

            ON [xydata].[id] = [xyzdata].[id]

but this does not work as UPDATE . . . SET because "Expression should return table with single field".

--SQL9

UPDATE [xydata] 

   SET [z] = 

     (

       SELECT [xydata].[id][xydata].[x][xydata].[y][xyzdata].[z]

        INTO [xyzdata2]

        FROM 

        [xyzdata] INNER JOIN [xydata] 

            ON [xydata].[id] = [xyzdata].[id]

     );

There is still something really basic I am not understanding. Thank you for helping me try

Attachments:
xyzM9.mxb

tjhb

8,219 post(s)
online
#09-Sep-18 23:52

You're not being dense at all, you're being extremely thorough, it's great--there's just something that isn't clicking yet. It will!

Try thinking about it like this.

Let's say you had not one field (a.k.a. column) to update, but several.

So after the SET command, you would list all of the target fields, each followed by an = sign, then an expression defining its new value. (The expression could be a another field, a literal, or something involving functions or conditions for example.)

The target fields are on the left of the = sign, the expressions are on the right. The = can also be thought of as ->. It's a mapping.

The important thing is that the mapping is one-to-one. Each field on the left is matched by exactly one expression on the right.

So for every row in the target table, each target field is mapped to its own expression.

Now look at what you've got above. The list of target fields has just one member, z. But on the right, you have a table with 4 fields and (potentially) many rows.

What you're literally asking is: for each row in the target table [xydata], map the field [z] to the whole table of 4 x N rows on the right.

That can't be done. That's what the error message "Expression should return table with single field" is telling you.

Pausing there.

tjhb

8,219 post(s)
online
#10-Sep-18 00:00

The next thing is, there's absolutely nothing wrong with an expression on the right being a table, provided that it is a table with just one column and just one row.

In fact in your example, the (...) around the SELECT statement tell the SQL compiler to expect a table that can be cast as a single value. It can't here, since it's 4 x N values, so again you get "Expression should return table with single field". One field/column would be fine.

There's also the problem that your SELECT statement is currently SELECT INTO, which is incorrect in the context of an UPDATE query. Like UPDATE, SELECT INTO is an action query; you don't want to perform an action inside the (...), you just want to create a virtual table. One action query cannot contain another. Let's put that aside.

Pausing again.

tjhb

8,219 post(s)
online
#10-Sep-18 00:42

Here though, the task is different.

You have two tables, and you want to match their rows in pairs on a shared value in the ID field, then for each matched pair, copy the value in field [z] from one table to the other.

The join is good, that does the matching.

Note that the field [z] must already exist in the target table [xydata], before you can update it. If it doesn't exist yet, create it.

I think the missing concept here might be like this. You are thinking "I need to update the [xydata] table using values from the [xyzdata] table". That's true in substance, but it's not how it's done.

Instead what you should think is "I need to create a virtual table that combines rows from [xydata] with matching rows from [xyzdata], then copy [z] values within each row".

If the virtual table is suitably constructed (there are caveats), then its records will consist of references to actual records in the source tables, meaning that changes to the virtual table will be reflected in the source (or sources--you can update more than one source table at the same time if that is ever useful).

That's the important concept.

The last thing to mention is in my second post in this thread. The SQL engine must be able to identity which row to update, and for Manifold 9, that means that the virtual table must include at least one source column which has a BTREE index defined on it. Usually this would be [mfd_id], but it can be something else.

To summarize: you need to build a virtual table, by joining [xydata] and [xyzdata] on matching [ID], which includes at least [xydata].[z], [xyzdata].[z] and [xydata].[mfd_id]. One of the two [z] columns will need to be given an alias--say, AS [z']-- to avoid naming ambiguity.

Then your SET statement can copy values from [z'] to [z] within each virtual row, and this will be reflected in the source table.

hugh
153 post(s)
#10-Sep-18 01:44

this is so helpful, thank you. I'm starting to understand. The ideas are somewhat clearer to me than the code however. After long misunderstanding I have somewhat come to grips with query results being not just console output but a table of sorts (virtual though it be) capable of further SQL manipulation.

This is key:

To summarize: you need to build a virtual table, by joining [xydata] and [xyzdata] on matching [ID], which includes at least [xydata].[z], [xyzdata].[z] and [xydata].[mfd_id]. One of the two [z] columns will need to be given an alias--say, AS [z']-- to avoid naming ambiguity.

For one thing you have clarified for me the status of aliases [AS] in SQL code now that I understand better the importance of correctly and uniquely naming columns as something the SQL engine needs. And some better understanding of what the SQL engine does with this kind of entity since it is more than just an object naming simplification.

At this point I need to see how this might be written in SQL but please don't send an example. Only way I can learn what code does with entities it calls is try it out. I will get back to you in a few days with what I come up with.

tjhb

8,219 post(s)
online
#10-Sep-18 01:58

but please don't send an example...

I wholeheartedly agree with your approach.

tjhb

8,219 post(s)
online
#10-Sep-18 02:41

The ideas are somewhat clearer to me than the code however.

But that is much better than the reverse! Much much much.

I have somewhat come to grips with query results being not just console output but a table of sorts (virtual though it be) capable of further SQL manipulation.

Well put. At some point you might want to at least skim the Editable Results Tables topic in the manual, to see how fundamentally the idea is implemented in SQL9.

In a real sense, an UPDATE query is just a special case of the manual examples there. The wiring underneath is the same.

tjhb

8,219 post(s)
online
#10-Sep-18 01:34

Two more comments. First, you have already added the [z] column in [xydata]. I missed that before.

Secondly--this might be stupid, but FWIW--if you get really stuck in SQL, stop looking at the screen and look at your hands.

In my experience, asking "what is on the left, what is on the right" solves most SQL problems. If I find that I don't immediately know the answer to those questions, then that is the problem--not the syntax.

SQL was clearly invented by people with two hands (and opposable thumbs).

hugh
153 post(s)
#10-Sep-18 01:52

just sent reply to your previous post. Good advice on left/right and I will be thinking about it while trying to figure SQL to figure out my problem

hugh
153 post(s)
#10-Sep-18 10:02

--SQL9

-- it is starting to click.  Doing this:

ALTER TABLE [xydata] (ADD [mfd_id] int64);

ALTER TABLE [xydata] (ADD INDEX mfd_id_x BTREE (mfd_id));

ALTER TABLE [xyzdata] (ADD [mfd_id] int64);

ALTER TABLE [xyzdata] (ADD INDEX mfd_id_x BTREE (mfd_id));

-- note that I did not make an index for [id] in either table

ALTER TABLE [xydata] 

  (ADD [z] int32);

-- ok

-- now understand why in the following the SET [z] can't write (edit) the new value for [z] to [xydata]

UPDATE [xydata] 

   SET [z] = 

     (

       SELECT [xyzdata].[z]

        FROM 

        [xyzdata] INNER JOIN [xydata] 

            ON [xydata].[id] = [xyzdata].[id]

     );

-- something syntactical throwing me off here was this I realized from your saying this (paraphrasing to this context): 

-- "UPDATE is an action query; you don't want to perform an action inside the (...), you just want to create a virtual table. 

-- One action query cannot contain another". 

-- so seems to me like "(...)" nests queries, does not just define scope like function x {function y {  } } 

-- in the more procedural way I usually interpret code

-- Now trying this:

SELECT [xydata].[z][xyzdata].[z] AS [Z2][xydata].[mfd_id][xyzdata].[mfd_id] 

 FROM 

  [xyzdata] INNER JOIN [xydata] 

     ON [xydata].[id] = [xyzdata].[id];

-- resulting table is not writeable since duplicate values make both indices on mfd_id btreedup

  -- so index [id]:

ALTER TABLE [xydata] (ADD INDEX id_x BTREE (id));

ALTER TABLE [xyzdata] (ADD INDEX id_x BTREE (id));

-- then run this query again:

SELECT [xydata].[z][xyzdata].[z] AS [Z2][xydata].[mfd_id][xyzdata].[mfd_id] 

 FROM 

  [xyzdata] INNER JOIN [xydata] 

     ON [xydata].[id] = [xyzdata].[id];

-- and the resulting table is now editable since indices on mfd_id stay btree

-- so I am making progress.  Question for me now is how I could have read 

-- the Join Statements and Editable Results Tables topics in the manual I ----- thought carefully, more than once, and not have gotten it.  Part of it -

-- might be the flexibility one has specifying columns for SELECT where 

-- ability to match uniquely is so important and aliases can help rather ---

-- obscure doing that--made it hard to for me to understand the code 

-- examples.  Also as old-time procedural programmer I tend to read code in much more mechanical way

Manifold User Community Use Agreement Copyright (C) 2007-2017 Manifold Software Limited. All rights reserved.