Subscribe to this thread
Home - General / All posts - INNER JOIN
chris lang
130 post(s)
#24-Jan-18 15:00

Good Morning,

See, attached, an .mxb file in which I have on query named "Query_Test1". The INNER JOIN is on two nvarchar fields from two different tables, Centr and VT_cNL. There are texts data that are identical in the two fields (Centr.Text and VT_cNL.NO_LOT), but the query return no results.

I wonder if spaces in the fields may cause any issue.

Thank's in advance for any help,

Best Regards,

Chris

Attachments:
test1 _SQL.mxb

chris lang
130 post(s)
#24-Jan-18 20:30

I found the issue, as one of the text field has an extra space for each text, so the texts in the two fields could not be exactly the same, and the query is returning the good result.

Chris

tjhb
10,094 post(s)
#24-Jan-18 22:19

How did you decide to fix it?

Did you correct the source data, or adjust the join condition? It is more efficient to correct the source data (if that does not create new problems).

You may also want to consider optimising indexes.

Starting from the .mxd project you posted:

First remove leading and trailing whitespace from the join column in both tables. (Only one might be necessary but it doesn't hurt to be sure.)

UPDATE [Centr] SET [Text] = StringTrim([Text], WHITESPACE);

UPDATE [VT_cNL] SET [NO_LOT] = StringTrim([NO_LOT], WHITESPACE);

Now add indexes to speed up the join. Assuming the worst here, i.e. that both join columns might contain duplicates and/or NULLs. (If you know that they don't, then you can add narrower indexes which is a bit better.)

ALTER TABLE [Centr] (

    ADD INDEX [Text_x] BTREEDUPNULL ([Text])

    );

ALTER TABLE [VT_cNL] (

    ADD INDEX [NO_LOT_x] BTREEDUPNULL ([NO_LOT])

    );

Then run your join query.

SELECT [mfd_id][Geom][Text][o_mfd_id][o_ID][o_Color][Id_No_Lot]

FROM [Centr] INNER JOIN [VT_cNL]

ON [Text] = [NO_LOT];

Afterwards you can get rid of the added indexes if you don't need them again.

ALTER TABLE [Centr] (

    DROP INDEX [Text_x]

    );

ALTER TABLE [VT_cNL] (

    DROP INDEX [NO_LOT_x]

    );

Mike Pelletier

2,122 post(s)
#15-Feb-18 23:47

This is helpful stuff Tim. I'm wondering if there is a good generic way to add indexes to a series of queries that pull data from the previous query. In my case, these queries then are all joined at the end and its very slow, so I'm thinking indexes are in order. This is a scenario where I'd prefer to not create a bunch of temporary tables and then add indexes to them before running the final join query. I tried it doing as in the following example but get invalid object reference.

ALTER TABLE 

(Select * from [mfd_meta])

ADD INDEX [name_x] BTREEDUPNULL ([name])

    );

tjhb
10,094 post(s)
#16-Feb-18 03:26

This is a scenario where I'd prefer to not create a bunch of temporary tables and then add indexes to them before running the final join query.

Why is that?

In this situation, for the purpose you are describing (I think), a temporary database and one or more temporary tables are exactly what you want.

I think you have hit on the right solution, but by dismissing it!

tjhb
10,094 post(s)
#16-Feb-18 03:48

The Temporary Databases topic in the manual is currently very rudimentary and a bit misleading.

There are a couple of examples on the forum you can find by searching on "USE ROOT" (with quotes).

Or post data!

Mike Pelletier

2,122 post(s)
#16-Feb-18 04:12

I'll look into temporary databases tomorrow but whatever solution needs to be automated to allow easy tweaking and rerunning.

The data is coming from Oracle (faster than 8 by the way) and each individual query is pretty fast. I gave up on the final join query after 1.3 hours, thinking that indexes are mandatory in this case. Cannot post these tables unfortunately.

tjhb
10,094 post(s)
#16-Feb-18 04:17

Temporary databases are hugely powerful, fundamental in 9, a core bit of kit...

...and not widely known yet.

If there's a way we can help correct that, it would be very worthwhile. (I'm thinking mainly of forum examples, but I would also volunteer to redraft the manual page on this.)

adamw


10,447 post(s)
#16-Feb-18 07:24

An example:

Suppose we have a table of states with names and a table of roads where each road refers to a state by its name. Here is a join:

--SQL9

SELECT * INTO temp_1 FROM [States]

  INNER JOIN [Roads] ON [States].[Name] = [Roads].[StateName];

Since neither field in the ON has an index built on it, the join is slow.

We can add indexes to the source tables (one would be enough in this particular case), but suppose we cannot do this because they are readonly, or we don't want to do this because the tables aren't ours. We can also make a copy of one of the source tables and add an index to that copy, but suppose we cannot do this either because the data source is readonly, or we don't want to do this because the data source isn't ours.

We can use a temporary database:

--SQL9

CREATE ROOT x; -- create temporary database

USE ROOT x; -- switch to temporary database

CREATE DATASOURCE d AS ROOT-- link MAP file as data source 'd'

SELECT * INTO [States] FROM d::[States];

  -- copy table from MAP file to temporary database

ALTER TABLE [States] (ADD INDEX [Name_x] BTREE ([Name]));

  -- add index on name in copied table

SELECT * INTO d::temp_2 FROM [States]

  INNER JOIN d::[Roads] AS r ON [States].[Name] = r.[StateName];

  -- perform join between copy of first table in temporary database

  --   and second table that remains in MAP file

  -- write result table into MAP file

I added comments to explain what does what.

See attached MXB.

Attachments:
temp-databases.mxb

Mike Pelletier

2,122 post(s)
#16-Feb-18 22:22

Thanks for detail example Adam.

Well from an uneducated perspective, this sure seems to add lots of complexity for dealing with indexes. Some questions.

1. Is Mfd 9 more dependent on indexes than 8 (9 seems super slow without) or did 8 just create the indexes it needed on the fly?

2. Might this be a temporary way to deal with this index issue as the manual doesn't mention it as one of the purposes for temporary databases.

3. What actions beyond exiting the program close the temporary database?

4. Seems odd that we create root, use root, create datasource as root, and then run a query to put a table into the temporary database, but it then just shows up as another table in the project pane with a temp name that isn't the name in the temporary database. Seems confusing. Perhaps put them in a temporary database folder with its name with some special marking.

5. Can there be more than one temporary database active?

6. If we run a series of queries that pull from the previous query, I'm assuming that goes into some internal tables. Perhaps explaining the pros/cons of this approach vs temporary databases vs a series of functions would be helpful. More "why" would be great.

tjhb
10,094 post(s)
#17-Feb-18 01:00

Good questions Mike. I'm not the best to answer them, but I'll have a try.

1.

Rather than saying that 9 is more dependent on indexes than 8, I think it's more accurate to say that it can make much better use of them--and it puts the power to control this in our hands. The power--and the obligation.

8 uses a narrower range of indexes--and handles them implicitly and invisibly.

2.

Yes, the ability to add a custom indexes to interim result tables is (I think) the main reason to use a temporary database and a temporary table.

3. All temporary databases (and the tables and other components they contain) can be explicitly destroyed with DROP.

When a temporary database is implicitly dropped depends on the type of temporary database created. There are two different types. See the notes in the next section below.

4.

but it then just shows up as another table in the project pane with a temp name that isn't the name in the temporary database

That's not quite right. Yes it seems confusing.

Again, to clear it up we need to distinguish the two types of temporary database (below).

5.

Yes. As many as you like.

6.

More "why" would be great.

Absolutely. More "what" as well!


So, the two types of temporary database (or temporary datasource).

The manual is less than helpful on this point, because it starts off with one type (the less obvious one), then switches to the other type partway through, without explaining the difference.

Type 1

The first type (in the manual) is created like this.

CREATE DATASOURCE [Scratch] (PROPERTY 'Type' 'manifold');

Note the lack of any connection string--that is what makes it a temporary database. If we specified a connection to an existing source, then this would be exactly like using Create Data Source in the GUI. But here we're not specifying an existing source, so the new datasource is (a) local and (b) empty. It is like a new Manifold project, contained within the current project--and visible in the Project tree. (It is said to be backed by a temporary .map file--though I haven't been able to figure out where that is.)

A temporary database of type 1 persists until the project is saved (unless it is deleted or dropped). However, changes remain volative (i.e. temporary) unless they are explicitly saved, either by invoking the Save command from the context menu for the database, or by saying yes to the "Save changes to child data source?" prompt when saving the project.

Type 2

The second type of temporary database is created like this.

CREATE ROOT [Scratch];

This new "root" is also a database/datasource, but different from type 1 in three main ways (perhaps there are others).

First, it does not appear in the Project pane--it is invisible to the user. That's quite nice, since it means we can just use it, without distracting the user (or ourselves) unnecessarily.

Secondly, it can only be accessed from within the Command window or Query component that created it. (And different windows and queries can have temporary databases with the same name.)

Third, it does not persist. Its lifetime is tied to the state of the Command window or Query component that "owns" it. Close the Command window, and the temporary database goes with it. Close the Query, and the temporary database is also gone--and must be created again next time the query is opened and run.

So if type 1 is a public temporary database for the project, Type 2 is a private temporary database for one window or component.

When would we want to use type 1--given that also we have type 2? It persists, is shared, and is visible. So maybe we want to write some temporary data with one batch of queries, then perform some GUI operations on the result, then address the data with another set of queries, and so on. Possibly with project saves in between.

But in that case, why not use ordinary fixed tables (and drawings, and so on), within the project root? I don't have an answer for that one. Perhaps the main thing type 1 adds is confusion.


For the mechanics of using temporary databases of type 2, as well as Adam's notes above, the examples in these threads might be useful (though some of the code is incomplete or flawed in other ways).

http://www.georeference.org/forum/t138060.50#138161

http://www.georeference.org/forum/t138305.18#138380

adamw


10,447 post(s)
#19-Feb-18 08:19

1. Tim said it well. Indexes are a bigger deal in 9 than in 8 because we use them to do a lot more than before and because they are managed explicitly rather than implicitly. 8 is mostly creating temporary indexes on the fly. 9 is mostly using indexes that the data owner / transform / import / a similar operation set up for it. We agree there is some room for creating temporary indexes automatically, particularly with user's consent (perhaps a pragma that controls how wild the query engine can go), this is something we are considering doing.

2. Temporary databases are used to rearrange data / store temporary partial results outside of the main data source. Yes, creating temporary indexes on one or more tables is a good reason to use a temporary database. Another good reason is to create a temporary component, eg, create a filtered version of a drawing to perform spatial overlay with.

3. Temporary databases of type 2 (following Tim's classification, the type used in my query above) are destroyed automatically after the query that uses them completes. For an interactive query in the Command window, that means that they are destroyed automatically when the window is closed. As Tim says, you can destroy them earlier using DROP ROOT.

4. Not showing in the Project pane is a feature (of temporary databases of type 2), not a bug! :-) You use them *because* you don't want them to show up anywhere, not to step onto names of other components, etc. You might not even have permissions to write to the data source. If you do want something that does show up in the Project pane, create a regular data source component. This is simple, either interactively of from a query.

5. Yes, you may use multiple temporary databases including in the same query.

6. What's pros and cons of something like:

--SQLhypothetical

ALTER TABLE 

(Select * from [mfd_meta])

ADD INDEX [name_x] BTREEDUPNULL ([name])

    );

...compared to temporary database and what's the general picture?

The most important question is this: do you need a persistent index on persistent data or a temporary index on temporary data? (Sometimes there is also an additional angle: do you own the data and do you have permissions / are willing to alter it? If not, we are talking about a temporary index on temporary data.)

If you need a persistent index on persistent data, then things are very simple - just add it using the UI or using ALTER TABLE ...

If you need a temporary index on temporary data, there are plenty of choices:

  • You can create a regular temporary table which you will be able to see in the Project pane. You can copy the data there, create the indexes you want - you can do everything up to this point either interactively or using a query - and then use the indexed data for the analysis you wanted. Pros: simple and visual. Cons: temporary components take space, you have to care about the names of temporary components, perhaps adopting some prefix so that you can tell them from other components, and if there are too many temporary components, they might be getting in the way. All cons are solvable, of course.
  • You can create a data source for a regular MAP file which you will be able to see in the Project pane and host temporary components there. You can copy the data there, create the indexes you want, then use the indexed data for the analysis you wanted. Pros: as above, simple and visual, plus temporary components do not clash with non-temporary ones. Cons: queries become slightly more complex (SELECT * FROM t becomes SELECT * FROM d::t).
  • You can create a data source for a MAP file using a query, then create your temporary components and perform the analysis, then drop the data source. Pros: all space used by temporary components is automatically reclaimed at the end of the query and if something in the query goes wrong, you will have the data source hanging in the Project pane and be able to look into it and hopefully see what the problem was. Cons: as above, plus queries have CREATE DATASOURCE ..., DROP DATASOURCE ... .
  • You can create a temporary database that will persist for the duration of the query using CREATE ROOT. Pros: all space used by temporary components is automatically reclaimed at the end of the query, no name clashes whatsoever, main data source can be readonly. Cons: have to use CREATE ROOT / USE ROOT, etc.

I imagine there are other possibilities as well, but hope the above list illustrates it well enough.

Mike Pelletier

2,122 post(s)
#23-Feb-18 21:48

Thanks Adam and Tim for all your efforts in answering these questions and sorry for taking so long to get back after this. I've attached a project with 2 folders, each with the same queries with this exception. The folder ending in "local clip" uses a clip of the databases and the one ending in "root" attempts to use temporary databases on the full dataset which cannot be shared and is read only. Each folder has 4 queries that pull data and then are merged via joins by the query the contains "summary" in the name. The queries work with the local data but the queries using temporary databases on the remote data fails with the error "cannot parse query". What am I doing wrong here?

To answer Adam's #6, the use of temporary databases for indexes adds some amount of complexity, which at the moment is a lot right now since I can't get it to work :-). The other small hangup with temporary databases I think is during debugging them and getting errors from rerunning the commands to create the temporary databases. I know you can highlight portions of query to run but how best to use Drop at the beginning of the query to clear it for rerunning?

As an alternative, I'd be happy with importing the tables and using a local copy to add indexes, but what would be the best way to to that so that I could quickly update the imported table(s) with a fresh copy?

Appreciate the help.

Attachments:
tt.mxb

tjhb
10,094 post(s)
#24-Feb-18 01:14

To take a step back...

I think you may have read Adam's

6. What's pros and cons of something like:

--SQLhypothetical

ALTER TABLE 

(Select * from [mfd_meta])

ADD INDEX [name_x] BTREEDUPNULL ([name])

    );

...compared to temporary database and what's the general picture?

without noticing the words "SQLhypothetical"?

This was an idea about a possible future direction. We can't add an index to the result of a view or in-line query (for now anyway).

Mike Pelletier

2,122 post(s)
#24-Feb-18 14:50

Actually that I understand and I'm assuming it has to do with the way indexes are supposed to recalculate themselves as the data changes.

tjhb
10,094 post(s)
#25-Feb-18 21:25

It doesn't have to do with data.

It only has to do with if and when indexes are added to data.

We can make some worthwhile suggestions on this topic.

More in Klaus's thread.

adamw


10,447 post(s)
#26-Feb-18 16:39

See attached MXB.

Three of the four queries used by the summary query already return tables with indexes on the fields used in further joins. You only need to add an index to the result table of the remaining query. I created a query named 'Residential Imp Summary by Tax Year 3' which does just that, using a temporary database. I am first copying the entire result of 'Residential Imp Value by Tax Year 2' (the only query whose result table does not have an index) into the temporary database, then adding an index to it, then running the rest of the query exactly as it is written in 'Residential Imp Summary by Tax Year 2', but taking three small queries directly from the MAP file are and the remaining query from the temporary database.

You can run 'Residential Imp Summary by Tax Year 2' (your query that was working) and 'Residential Imp Summary by Tax Year 3' (my query) side by side and check their results - they are the same. There is next to no difference in the runtime, but that's because the number of records is small, on bigger numbers of records the query with the temporary database will likely be noticeably faster.

But generally, it might be simpler in this case to solve the issue from the other end: as I said, the result tables of the three queries are already just fine, and it is just one query which does not offer an index. We can make the result table of that one query to also expose an index simply by creating explicit indexes on the fields it uses in joins. If you run this:

--SQL9

ALTER TABLE [ENCOMPASS.TBLIMPS] (

  ADD INDEX [ACCOUNT_x] BTREEDUP ([ACCOUNTNO])

);

ALTER TABLE [ENCOMPASS.TBLIMPSOCC] (

  ADD INDEX [ACCOUNT_x] BTREEDUP ([ACCOUNTNO])

);

...then the result table of 'Residential Imp Value by Tax Year 2' also contains an index on ACCOUNTNO and the only slow join in 'Residential Imp Summary by Tax Year 2' becomes fast, and you can just use 'Residential Imp Summary by Tax Year 2'.

We might actually have means to make 'Residential Imp Value by Tax Year 2' expose an index on ACCOUNTNO automatically from what the source tables already contain - that case lands onto an optimization which we don't currently have but may add. If and when we do this, you will be able to just use 'Residential Imp Summary by Tax Year 2' as is with all of its joins being fast (instead of 2 out of 3).

Attachments:
tt-mod.mxb

tjhb
10,094 post(s)
#26-Feb-18 17:14

My two cents on last para:

There may be some cases where it makes great sense to add indexes implicitly and either automatically or guided by a pragma.

Beside that, and perhaps more important, is the visibility and usability of indexes, for new and old hands alike.

I mean no criticism here. Indexes are perfectly visible now, already. Ctrl-E is right there.

But I would like to see, in addition, the logging of index use in all "manual" SQL ops. Lines like "LEFT JOIN using existing RTREE index Geom_x" or "Convert existing BTREE index ACCOUNT_x to BTREEDUPNULL"--that is very rough. (But for that matter, rough would be OK! The perfect should not be the enemy of the good.)

This would help users to learn, understand and optimize index use for Manifold. It would help to make a topic that can seem obscure and opaque seem (at least) translucent. (Oh, I did that! It listened--and now it is 15 times faster! Save that log.)

The logging of index use could perhaps be controlled by a 'verbose' pragma or !verbose directive?

adamw


10,447 post(s)
#26-Feb-18 17:25

That's a good idea, thanks.

It could be helpful even without elaborate suggestions - just issuing a "please check queries as they execute and tell me what you think" pragma / directive and then having a warning like "the JOIN on <expr> is unindexed" would be helpful.

tjhb
10,094 post(s)
#26-Feb-18 17:28

Yes that would be worth a lot, really helpful.

Anything that makes us pause and scratch our head... aha, but... OK yes of course. (11 minutes -> under a second. Fine, I didn't really want coffee.)

Mike Pelletier

2,122 post(s)
#27-Feb-18 23:52

Thanks again Adam. Unfortunately, still not working well on the full data. I imported those 2 tables used in the query that didn't automatically create indexes. The ALTER TABLE query would not work on them (tables are grey) so I manually added the index via schema dialogue. Then removed the temporary database references and ran the query but cancelled it after 4 hours.

Why do 3 of those queries create indexes?

I'll try creating more tables from the initial queries tomorrow and run the join on those to see if that helps.

tjhb
10,094 post(s)
#28-Feb-18 01:59

I'm really intersted in following along here Mike, but I'm losing track a bit of what is what.

Can you possibly give a step-by-step list of workflow elements, when you are woorking on the full dataset?

Usual stuff e.g.

I imported those 2 tables used in the query that didn't automatically create indexes

What are these tables' names, and what source or format are they imported from? And so on.

This seems mysterious:

The ALTER TABLE query would not work on them (tables are grey) so I manually added the index via schema dialogue.

Anything you can do via Edit > Schema you can also do with ALTER TABLE. (Exceptional bugs aside of course.) What exact SQL were you trying, and what exactly happened?

When you added the index using the GUI, what was the result?

Can you post the table schema(s) before and after, i.e. directly after the import, and then after you have made changes using SQL and/or Edit Schema?

The easiest way to post a schema for the forum is

  • open a fresh Command window
  • drag the table into the Tables panel
  • right-click on the table name in the Tables panel, and select "Insert definition"
  • copy the resulting CREATE TABLE query and either paste it to the forum as code, or attach as text file.

It's easy to remove anything confidential or proprietary first.

And then can you post exactly what query you are running on the adjusted table(s)? Or just name it, if there have been no changes to a query posted above, and the whole text is run.

Why do 3 of those queries create indexes?

Three of which queries? Again, hard to keep track--or rather, easy to assume and misunderstand.

adamw


10,447 post(s)
#28-Feb-18 06:26

An easier way to post a schema of any table: copy the table in the Project pane, paste into the forum post.

Example (created a code section, copied a table, pasted into the section):

--SQL9

CREATE TABLE [mfd_root] (

  [mfd_id] INT64,

  [Name] NVARCHAR,

  [Type] NVARCHAR,

  INDEX [mfd_id_x] BTREE ([mfd_id]),

  INDEX [Name_x] BTREE ([Name] NOCASE),

  PROPERTY 'Folder' 'System Data',

  PROPERTY 'Path.0.Description' 'System components used by Manifold.',

  PROPERTY 'Path.0.Value' 'System Data'

);

You can delete the properties you don't want after the paste.

tjhb
10,094 post(s)
#28-Feb-18 06:34

I should have thought of that! You guys are seriously ecumenical. Text is text.

adamw


10,447 post(s)
#28-Feb-18 06:23

The ALTER TABLE query would not work on them (tables are grey) so I manually added the index via schema dialogue.

I guess what might have happened is this: the ALTER TABLE queries were being ran in the context of a wrong data source (if the tables are in the data source named D and you open Command Window for the MAP file using View - New Command Window, then the queries should be ALTER TABLE D::T ... - and if you open Command Window for D by right clicking it in the Project pane and using New Command Window, the queries should be ALTER TABLE T ...).

3 out of 4 queries expose indexes in the result table and 4th doesn't because the queries are structured differently. For each query, the query engine looks at what indexes are in the source tables and at what the query statements do and then it determines what indexes it can keep (sometimes, add) in the result table. None of that is in the SQL standard. The standard merely talks about what values the result table should contain, it says nothing about having additional means to traverse the result table fast (which is what indexes are), so we do not have a hard set of rules for that written anywhere - in fact, that's a bit of a moving target as we are adding various enhancements to the query engine. We can still provide some rough guidelines and explanations, we will try to do so in the upcoming API doc for queries.

Anyway, we can try this: export your data to an MXB, archive into a ZIP with encryption and send it to tech support (as a bug report and citing this thread - I don't think we have any bugs here, but the execution profile is interesting). We'll check what happens on full data and report here.

tjhb
10,094 post(s)
#28-Feb-18 06:33

This does not help anyone trying to follow! OK, you know what tables and queries Mike is talking about Adam (or you think you do), but how does anyone else?

Please, at least name each table and query as you refer to it, even if you don't think it worthwhile to post the detail.

That's mainly on Mike.

(Actually, it doesn't help you either, Mike or Adam. Either of you might be making false assumptions, because you don't say what you are talking about.)

adamw


10,447 post(s)
#28-Feb-18 07:16

Apologies.

The 4 queries are those from TT.MXB from here. That MXB contains two folders, one with queries that work, another with queries that do not work (a failed attempt at temporary databases, corrected in my later post). The folder with queries that work is named 'Residential Value by Year local clip'. It contains 5 queries, the main 'summary' one and 4 smaller queries which it calls and joins together. 3 of these smaller queries expose good indexes in the result table, 1 does not (the 'imp value' one). Because of that last smaller query that does not expose indexes, the main 'summary' query works slowly.

tjhb
10,094 post(s)
#01-Mar-18 05:34

That helped a lot Adam, thank you.

adamw


10,447 post(s)
#28-Feb-18 07:30

OK, a follow up.

I thought that I could set up a little test on synthetic data, so I took the TT.MXB you sent and filled each of the four involved tables (ENCOMPASS.TBLACCT, ENCOMPASS.TBLIMPS, ENCOMPASS.TBLIMPSOCC, ENCOMPASS.TBLLANDABSTRACT) with copies of the existing records, altering SEQID and ACCOUNTNO to pretend that we have data for new accounts. I doubled the number of records in each table 10 times, growing TBLACCT to 16k records, TBLIMPS to 8k records, TBLIMPSOCC to 8k records, TBLLANDABSTRACT to 10k records.

Then I ran 'Residential Imp Summary by Tax Year 2' without making any modifications to the tables, to see what the 'slow' time is. 'Slow' ended up being pretty slow with the progress bar ticking at 3 records per second.

I canceled the query, deleted the table it created and partially filled, then added indexes to TBLIMPS and TBLIMPSOCC as shown here (two ALTER TABLE statements) to make the result table of the 'Residential Imp Value by Tax Year 2' expose an index and make all joins in 'Residential Imp Summary by Tax Year 2' fast. I then re-run 'Residential Imp Summary by Tax Year 2' and it blasted through the data producing ~1200 records per second and completed with the total run time of 0.549 sec.

So, either the real data (on which you say the process is slow) is noticeably different from the synthetic data I generated or there was some simple mistake somewhere.

Mike Pelletier

2,122 post(s)
#28-Feb-18 22:21

Adam you are spot on with interpretation of my posts. Thanks for the explanation about why some queries generate their own indexes and others do not. My tables are a 2 or 3 times larger than your synthetic data tables, but nothing that should make a difference. Perhaps there is more complication in my data with the number of 1 to many joins, but that also seems unlikely to cause such a big slow down.

As you said, there must be a simple mistake somewhere so I'll have to go back and find it. I'm surprised though that it ran, as usually simple mistakes keep the query from running. Thanks for creating the synthetic data as I would have to remove lots of stuff from my tables in order to share.

Regarding the ALTER TABLE problem. I exported the linked tables ENCOMPASS.TBLIMPS, ENCOMPASS.TBLIMPSOCC,to CSV and then imported them into my project, so no temporary databases are involved. All of the columns are grey after import and these queries won't run.

--SQL9

ALTER TABLE [ENCOMPASS_TBLIMPS] (

 ADD INDEX [ACCOUNT_x] BTREEDUP ([ACCOUNTNO])

);

ALTER TABLE [ENCOMPASS_TBLIMPSOCC] (

 ADD INDEX [ACCOUNT_x] BTREEDUP ([ACCOUNTNO])

);

So I added indexes via the schema dialogue. What does a grey table mean? No indexes or read-only?

What is a better way to import tables linked from Oracle? It would be nice to have a quick way to re-import the data to update the table.

tjhb
10,094 post(s)
#01-Mar-18 05:27

Mike,

You've said a couple of times that ALTER TABLE queries "won't run".

I imagine that the queries do run, but when they do they log an error.

What is the error? Is it always the same? (Otherwise, what happens instead?)

What exactly do you then do in Edit > Schema?

I suggested before that you post the schema of each table directly after import, and again after you have made schema changes (or tried to--maybe you haven't). I still think that's a good idea--and Adam showed an easier way.

It really looks like that apparent anomaly is the real puzzle. It's fishy!

A grey table means that records can't be addressed or accessed individually... because there is no BTREE or BTREEDUP index on any field. (It doesn't mean read-only.)

tjhb
10,094 post(s)
#01-Mar-18 05:52

P.s. here's my guess.

The error you get when you try ALTER TABLE is "Can't add index", meaning that the values in the given field(s) don't satisfy the requirements for the index type. For BTREE, the failure be that values are not unique. For BTREEDUP it would be that there are, perhaps, unexpected NULL values in the given field.

("Can't add index" can also arise if an index with the given name already exists, but that seems less likely.)

Then I would guess that when you use Edit > Schema, you use it incorrectly, and it has no effect. (But because there is no error message, you assume it has done what you planned.)

On that see Klaus's recent post. If you find the dialog counter-intuitive then you have good company. (FWIW I like it as it is.)

adamw


10,447 post(s)
#01-Mar-18 08:53

A grey table means that records can't be addressed or accessed individually... because there is no BTREE or BTREEDUP index on any field. (It doesn't mean read-only.)

Two minor corrections:

BTREEDUP indexes don't help, the table window currently requires BTREE. The query engine is slightly less strict and allows (updates for a table with) either BTREE or BTREENULL (NULLs are allowed and records with NULLs are uneditable, otherwise values are unique and records are editable). We will adjust the table window to allow BTREENULL as well, that's on our list.

If a table has a BTREE index, it might still be read-only because it is on a read-only data source (eg, due to permissions or due to the data source being linked with the 'Open as read-only' option on), or because the table is synthesized from some data and writing back does not make much sense (eg, SELECT ... GROUP ... - does not make much sense to edit sums and averages), or for some other reason. In this case, the table will also be grey and will disallow editing records (but will allow selecting them).

tjhb
10,094 post(s)
#01-Mar-18 08:58

Damn, I tried to get that right!

Not the first time I've got it wrong. I will pin it to my wall.

Thanks Adam.

adamw


10,447 post(s)
#01-Mar-18 08:38

Like Tim, I do not completely understand what you mean by queries not running. I suspect you run the queries and they report an error, and then since our error messages aren't terribly descriptive (agree we should fix that, this keeps sliding for various reasons, we have to just do it already), you just say that they fail to run. But I am not sure and it could be something else.

Apart from that:

A grey table means it is not editable. In your case this happens because the table has no BTREE index (with no duplicates or NULLs). You have no BTREE index because you imported the table from a CSV file. CSV files do not describe which of their fields contain unique values, etc, so linking or importing them creates tables with no indexes. To make the table editable, add a BTREE index on a unique field (SEQID). The name of the index could be anything.

Copying data from Oracle by exporting and re-importing a CSV is suboptimal, this loses indexes and data types. It is much better to just copy a table from an Oracle data source in the Project pane and paste it into the MAP file in the same Project pane (select a component or folder in the MAP file, then click Paste in the Project pane toolbar).

If you already have a table to paste into (which you perhaps somehow altered, eg, added some indexes), open it, select all records (Ctrl-A), delete them (Del), then copy the table from an Oracle data source in the Project pane and paste into the table window (Ctrl-V). The system will try to match fields and if it has any questions regarding which field from the original table should go where, it will display a dialog.

(Obviously, you can also automate delete / copy using a query, that's just DELETE FROM <copy> / INSERT INTO <copy> SELECT FROM <original>.)

Dimitri


7,413 post(s)
#01-Mar-18 08:53

Adamw is being generous as always, but I think in this case the polite use of the word "suboptimal" would be better replaced by a more explicit statement: exporting first to a terribly non-DBMS format, CSV, and then re-importing and attempting to rebuild in Manifold what the format lost is a big mistake, compared to just using copy/paste.

I'm not sure why the hesitancy to use copy/paste and why so often people go on long detours instead of just using copy/paste. There's talk about why that happens, what can be done to avoid it, and one idea is to add copy/paste to the context menus.

There is some resistance to that given the ctrl-c/ctrl-v and the toolbar icons already in place and the desire to not double-up everything in the context menus, making them very long. But still, I suspect people have been so accustomed to seeing Copy and Paste in context menus that perhaps what is going on is that when those are not seen in a context menu there might be a perception that copy/paste is not available in this instance.

So... perhaps we should add copy/paste to more context menus if that will help encourage use of copy/paste.

Mike Pelletier

2,122 post(s)
#01-Mar-18 15:38

Sorry for not being more specific, I was working off of memory and couldn't be sure. The error message is "invalid entity type".

Ah, I didn't realize the index had to be a BTREE. In general (beyond this project), I've been just adding BTREEDUPNULL to tables in case there are duplicates or nulls and because I'm less worried about speed. I see now that is bad practice and why I've had trouble with grey tables. As long as one can remember these rules, indexes are not awful, but of course anything Mfd can do to simplify or give better error (or suggestion) messages would be very welcome.

Okay I definitely deserve a lashing for export/import to CSV. I just tried copy/paste and it is fast. The reason I didn't do that before is because when copying a screenful of records from the linked oracle table it is very slow (starts out at 1 kb/s and then goes down to 200 b/s) so I assumed (incorrectly) that copy/paste the whole table is a bad way to go. Why is it slow on selected records?

I'll set up a query for updating the tables I need in the future. Thanks everyone for your insights.

adamw


10,447 post(s)
#02-Mar-18 06:41

Copying selected records may have to read the entire table and since the progress report is showing the number / size of data in the copied records, it looks like the operation is slow. Ie, in the first second copying scans 5,000 records, but only 2 of them are selected and so the reported speed is 2 records/s (problem 1: the speed appears low while it isn't low). Then in the next second copying scans another 5,000 records, but none of them are selected and so the reported speed is now 1 records/s (problem 2: the speed appears to change when it doesn't).

We can obviously show the number of scanned records instead of the number of resulting records, but this is not a universal solution - for example, for a process that performs a join and re-reads some of the records again and again, the reported number then simply skyrockets and stops being relatable. The number of resulting records is typically much more useful, so we report that and live with the drawbacks. This tends to fix itself over time as we add improvements and optimizations, ie, when copying selected records does not have to read the entire table, this both performs faster and reports more accurate speeds.

Mike Pelletier

2,122 post(s)
#02-Mar-18 18:23

Thanks for the explanation Adam.

Still no luck finding a simple error with my queries so I just emailed tech with the data.

tjhb
10,094 post(s)
#24-Jan-18 22:26

Comparative timings (after trimming strings at source, fullfetch on):

Without BTREEDUPNULL indexes on join columns: 44s.

With BTREEDUPNULL indexes on join columns: 0.016s.

Time to add both indexes: 0.047s.

chris lang
130 post(s)
#24-Jan-18 22:44

Thank you tjhb for your reply!

I corrected the table field in Manifold, using The Content pane, then in the Template tab I used "Replace Text, all" with the associate atributes.

My Query worked afterword. But I have a new issue :

Is it possible to transform the resulting query to a drawing (I have Geom field (points) in the query) ?

Thank's in advance,

Best Regards,

Chris

adamw


10,447 post(s)
#25-Jan-18 09:34

You can, for example, adjust the SELECT to do SELECT INTO to put the result into a persistent table, then create a drawing for that table.

adamw


10,447 post(s)
#25-Jan-18 09:49

Alternative to the above:

Create a drawing based on the SELECT query. The easiest way to do this now is to make a copy of an existing drawing, then edits the 'Table' property of the new drawing to point to the query and edit the 'FieldGeom' property to point to the geom field. You also have to set the coordinate system of the geom field.

See attached MXB. Open Map. If you open Query_Test1 Drawing directly, it will show nothing, because the map window cannot figure out the bounds (the table returned by the query does not report them, unlike a regular table would)- we have a couple of ideas for how to fix this in the future, but right now the simplest is to use a layer with a bigger bounding box than in the query.

Attachments:
test1 _SQL-mod.mxb

chris lang
130 post(s)
#25-Jan-18 15:17

Thank you adamw for your replies.

Effectively, opening the query drawing, even staring with a darwing with bigger bounding box, still show nothing.

So I'm trying your first alternative with SELECT INTO but I'm having a non valid reference object with the following SQL statement :

SELECT (SELECT [mfd_id],[Geom],[Text],[o_mfd_id],[o_ID],[o_Color],[Id_No_Lot] FROM [Requete_ID]) INTO [Table_TEST];

Any idea?

Best Regards,

Chris

adamw


10,447 post(s)
#25-Jan-18 16:20

Does opening Map not show the drawing? It should.

The syntax for SELECT INTO is simply SELECT [mfd_id], ... INTO [table_test] FROM [...]; -- no need for two SELECTs.

chris lang
130 post(s)
#26-Jan-18 17:05

Thank you adamw for your Reply!

Working fine now with the following syntax :

SELECT [mfd_id],[Geom],[Text],[o_mfd_id],[o_ID],[o_Color],[Id_No_Lot] INTO [Table_TEST_4] FROM [Requete_ID];

The table I named [Table_TEST_4]is generated automatically by Manifold.

The resulting drawing is Table_TEST_4 Drawing.

The .mxb file is attached for your info.

Thank you for your help!

Best Regards,

Chris

Attachments:
test1(Copy).mxb

Manifold User Community Use Agreement Copyright (C) 2007-2021 Manifold Software Limited. All rights reserved.