Subscribe to this thread
Home - General / All posts - SQL vs. Transforms
artlembo


2,819 post(s)
#07-Aug-17 02:46

Just out of curiosity, I ran the TransformTopology Overlay, Intersect and also an SQL query with GeomClip:

SELECT GeomClip(production.[Geom (I)], soils.[Geom (I)]true, 1) as g

into newtable

FROM [Production],[Soils]

WHERE GeomTouches(production.[Geom (I)], soils.[Geom (I)],1)

I have found that the Transform runs way faster than the SQL. Now, the Topology overlay transform is actually writing all the columns, whereas the SQL is only writing the geometry. Is there something different that the topology overlay is doing that the GeomClip is not doing?

adamw


7,129 post(s)
#07-Aug-17 06:16

It's the difference between a join done in the query engine with GeomTouches and a join done inside a query function with GeomOverlayXxx. Plus threads, if you used them in the transform.

We might make joins done in the query engine do more or less what joins inside a function do in the future as an optimization step.

tjhb

7,387 post(s)
#07-Aug-17 08:44

This is really important (though I'm not 100% sure I understand).

When we configure a transform, we normally have a button named 'Edit Query' which will show in a Command window the SQL to reproduce exactly what the internal engine will do in this case. Right?

Exactly.

So there are zero optimation steps performed in a GUI transform, which are not also made explicit in the automatically-generated SQL from Edit Query.

Not ever. Right?

adamw


7,129 post(s)
#07-Aug-17 09:04

Yes. When you direct the dialog to apply the transform, it generates the query as if you pressed the Edit Query button and runs it without modifications.

tjhb

7,387 post(s)
#07-Aug-17 09:25

Whew. Thanks.

artlembo


2,819 post(s)
#07-Aug-17 13:50

ok, that's good. The reason I was asking is twofold:

1. I wanted to be more database centric for my dbms friends (no need to think about drawings, overlays, etc.). So, this is more terse SQL.

2. I might want to throw in a few WHERE clauses: (i.e. WHERE soils = 'sandy')

and yes, I can always throw the THREADS 4 at the end of the SQL.

artlembo


2,819 post(s)
#11-Aug-17 18:25

I can confirm that the GeomClip is many orders of magnitude slower than the transform for topology intersect. The Transform for Topology Geometry Intersection runs about as fast as the following SQL Server Query (a little over a minute):

SELECT ditches.Geometry.STIntersection(esparcels.geometry) AS g, 

   esparcels.acctid, esparcels.DESCLU

FROM esparcels, ditches

WHERE ditches.Geometry.STIntersects(esparcels.geometry) = 1

But of course the Transform forces you to bring in ALL the columns from the two drawings.

Further, if I want to only bring in a few columns, and then sum up the length of the ditches in each property (this I'm doing for the entire Eastern Shore of Maryland), the SQL Server (or Postgres) query would simply be:

SELECT acctid, sum(g.STLength())  as L

INTO newTable

FROM 

(SELECT ditches.Geometry.STIntersection(esparcels.geometry) AS g, 

   esparcels.acctid, esparcels.DESCLU

FROM esparcels, ditches

WHERE ditches.Geometry.STIntersects(esparcels.geometry) = 1) AS T

GROUP BY acctid

that is super easy but I have to say, it takes a long time to complete - 6 minutes.

Unfortunately, using GeomClip in the same way simply doesn't ever complete. The computer hangs for about 20 minutes, and then I just kill the process.

With Manifold 8, I issue:

SELECT sum(length(g,"ft")) as totallength, acctid

INTO newTable

FROM 

(

SELECT parcels.[ACCTID], ClipIntersect(ditches.[Geom (I)],parcels.[Geom (I)]as g

FROM parcels, ditches

WHERE touches(parcels.[Geom (I)],ditches.[Geom (I)])

)

and guess what - it completes in 4 minutes!

So, Manifold 8 is rivaling SQL Server. But, there are some problems in using GeomClip with Radian that I think need to be checked out.

NOTE: I uploaded the file joshditches.map to the FTP site.

Dimitri

4,118 post(s)
#11-Aug-17 19:47

the Transform forces you to bring in ALL the columns from the two drawings.

Could you not set the Transform Options to ignore the columns you don't want transferred? Then they would not be transferred.

Yes, the topology overlays are much faster. I'll let adamw discuss that.

artlembo


2,819 post(s)
#11-Aug-17 21:39

Could you not set the Transform Optionsto ignore the columns you don't want transferred? Then they would not be transferred.

Yes, that does certainly help things. I should not have just addressed it to the "ALL fields" only. Quickly typing in two fields in the SQL is rather nice, and, for some results I want to perhaps trim a string field, or multiply a field by x or take a square root, etc. Writing this with SQL and throwing in a spatial clause like GeomClip is really convenient for that, and I think lives better within the database ecosystem (this is how database guys think).

The Transform Dialog is fast, and really slick - I totally agree. And, I like it better than what other GIS products do, as it is one standard dialog where you choose what you want to do, rather than a zillion different "wizards" that a user gets presented with. So, I hope I didn't communicate that I am not a fan of Transforms. In fact, since the transforms are faster than GeomClip, etc., I often just use them instead of SQL -this is especially convenient when you are working and drinking a beer or eating a sandwich and you only have one hand free

I look forward to Adam's explanation to better understand what the thinking is behind this.

tjhb

7,387 post(s)
#12-Aug-17 02:56

Art,

I'm glad Dimitri's correction worked. There is no reason why all fields needed to be included.

Certainly not "of course"--it is just the default.

By the way, I would agree that taking all fields over is usually wasteful, especially since public data tends to show terrible "attribute bloat", often including really useless things like fixed area or length measurements, plus who knows how many indecipherable in-house fields that are now mainly kept because no one can quite remember what they were for, or what the field name means (that guy left)...

But I can't think of a better default than everything (all fields)--I think it has to be that way, unless the default were nothing (no fields) except geometry. Woudl that be less confusiong for new users, or more confusing?

Quickly typing in two fields in the SQL is rather nice, and, for some results I want to perhaps trim a string field, or multiply a field by x or take a square root, etc. Writing this with SQL and throwing in a spatial clause like GeomClip is really convenient for that, and I think lives better within the database ecosystem (this is how database guys think). ...

...I hope I didn't communicate that I am not a fan of Transforms. In fact, since the transforms are faster than GeomClip, etc., I often just use them instead of SQL...

It's not a choice between transforms like Topology Overlay, Intersect or GeomClip in SQL here.

All of the transforms have direct, exact equivalents in SQL, which we have access to using the Edit Query button. It's an education looking at what the engine does for the Topology Overlay functions, and very easy to edit the auto-generated SQL in light of our specific purpose and knowledge. E.g. to reduce the number of fields taken from overlay to result.

Here the performance difference is between the generic GeomClip function, which has to suit all sorts of purposes; and GeomOverlayTopologyIntersect[Par], which has just one purpose, and can therefore make helpful assumptions, skip some checks, take shortcuts.

Up to a point I think it's just that simple, though this doesn't explain the difference you are seeing between timings using GeomClip in Radian and ClipIntersect in Manifold 8.

One the other hand, you haven't said what query you used in Radian to compare with the query used in Manifold 8 (or the one for SQL Server).

Could you post it? So far it's not even apples against oranges here, but apples against... I don't know. Fire and fury.

Unfortunately, using GeomClip in the same way simply doesn't ever complete. The computer hangs for about 20 minutes, and then I just kill the process.

[No Radian query is given]

With Manifold 8, I issue: ...

Test data would be good too.

artlembo


2,819 post(s)
#12-Aug-17 03:04

I should have copied the Radian query before I hit execute - it froze up.

Headed to Asia for two weeks without my PC (daughter's wedding), so I don't know if I can put the query together - only have my phone now. Might try it from memory later.

Edit: maybe something like this:

SELECT parcels.acctid, 

GeomClip(ditches.geometry,parcels.geometry,true,0) AS g

INTO newTable

FROM parcels, ditches

WHERE GeomIntersects(ditches.geometry, parcels.geometry,0)

That's the best I can recall - either way, Adam has the .map file

tjhb

7,387 post(s)
#12-Aug-17 23:45

Thanks Art.

Really glad you noticed the timing difference, and that it has been fixed so quickly. This will make a noticeable difference to daily work.

A bit of a lesson here for me (perhaps others too): don't just gloss over timings that feel less than stellar.

Dimitri

4,118 post(s)
#12-Aug-17 05:49

It's not a choice between transforms like Topology Overlay, Intersect or GeomClip in SQL here.

All of the transforms have direct, exact equivalents in SQL, which we have access to using the Edit Query button

Very true. What also is interesting is a hybrid use of the Transform dialog's Expression tab, as seen in the example combining BIL bands topic.

In that example what could have been written as an update query instead is phrased as a quick use of a single SQL function within the Transform dialog's Expression tab, using the dialog to do the supporting work of manipulating the table. The hybrid approach can be very convenient when you want more than a template but don't feel like doing everything in SQL.

---

I don't like the topic cited above, by the way, because of the highly unsatisfactory application of too much manual work in the workflow, where the dialogs should be doing more for you automatically. The example even ends up incomplete because it does not go into how to build the index for the image, instead just saying to "click the message".

Overly-manual workflow like that needs to get obsoleted quickly. We will see that improved in the weeks ahead.

adamw


7,129 post(s)
#12-Aug-17 13:33

I look forward to Adam's explanation to better understand what the thinking is behind this.

This was a bug on our side. The query engine was refusing to use the spatial index for some table scans, including the most important one = the default scan, used in SELECT INTO in particular.

We fixed the bug.

Here is how it looks after the fix:

The original query in 8 - 213.397 sec on a test system:

--SQL

SELECT parcels.[ACCTID],

  ClipIntersect(ditches.[Geom (I)],parcels.[Geom (I)]as g

INTO ditchparcel

FROM parcels, ditches

WHERE touches(parcels.[Geom (I)],ditches.[Geom (I)])

The second query in 8 builds on this one by aggregating the resulting values by ACCTID, the execution time is about the same, because the number of records returned by the first query is small and aggregation is simple.

Radian / Viewer after the fix, direct port of the query from 8 - 215.001 sec (slightly higher than 8, but small overhead on a direct port is acceptable):

--SQL9

SELECT parcels.[ACCTID],

  GeomClip(ditches.[Geom (I)], parcels.[Geom (I)], TRUE, 0) AS g

INTO ditchparcel

FROM parcels, ditches

WHERE GeomTouches(parcels.[Geom (I)], ditches.[Geom (I)], 0)

Adjusting the query in Radian / Viewer to use 4 threads, a trivial change to the query - 69.133 sec:

--SQL9

SELECT parcels.[ACCTID],

  GeomClip(ditches.[Geom (I)], parcels.[Geom (I)], TRUE, 0) AS g

INTO ditchparcel

FROM parcels, ditches

WHERE GeomTouches(parcels.[Geom (I)], ditches.[Geom (I)], 0)

THREADS 4

...and altering the second query the same way produces the same results - execution time with 4 threads is about 69 seconds in Radian compared to 210+ in Manifold 8.

Thanks a lot for the detailed report and example data, and sorry for the issue.

We are going to deliver the fix in a few days in an update to Radian / Viewer.

artlembo


2,819 post(s)
#12-Aug-17 14:18

Thanks! Is this bug only in GeomClip, or all containment clauses (interest, etc.).

Any chance the bug gets fixed by Monday - I have a 14 hour flight, and this really changes things so I can explore a lot more things.

adamw


7,129 post(s)
#12-Aug-17 15:40

The bug is not related to GeomClip at all.

The query engine was looking at WHERE with a filter that should engage the spatial index and was refusing to use the spatial index (due to being too cautious for a very obscure technical reason). Anything that had SELECT ... INTO ... FROM ... WHERE <spatial filter> or something equivalent was affected. The exact circumstances are a bit hard to describe succinctly, ie, the issue was not hitting drawings linked from queries when they were rendering, but it was hitting simple queries like SELECT INTO using these drawings, etc.

We didn't detect the bug automatically when it was introduced because the test queries continued to be working properly, just slower than they should have worked - and it is very, very difficult to write useful tests for expected performance. After the fix we added other tests to protect from this issue and other potential issues like it (re)appearing.

We were planning to have the update early next week, possibly on Monday, yes.

adamw


7,129 post(s)
#15-Aug-17 07:21

Try 9.0.162.4. It should fix the issue (as well as make drag and drop between data sources in the Project pane - which is another thing you commented on - safer, etc).

artlembo


2,819 post(s)
#16-Aug-17 06:18

thanks, but two things:

1. Can you check the joshditches.map map that I had put on the FTP site? The coordinate systems should be different, and this query returns nothing:

-- $manifold$

SELECT GeomClip(a.[Geom (I)], b.[Geom (I)]true, 0) AS g 

FROM [ditches] AS A, [parcels] AS B

WHERE GeomIntersects(a.[Geom (I)], b.[Geom (I)],0)

but, the Transform does in fact return the result. So, does the GeomClip not work with mixed coordinate systems?

2. I'm on the road, and am only using my Surface tablet - even when I change the coordinate systems, the query takes a very long time (in fact, I haven't waiting long enough for it to complete).

It will probably be a day to two before I will have the ability to check back with you on this. Sorry, but I wont be able to follow up for a while.

adamw


7,129 post(s)
#16-Aug-17 07:29

I cannot reproduce this.

1 - The coordinate systems in the two drawings in the file that we took from the FTP are the same.

If the coordinate systems were different, then yes, you would have to make them the same first for GeomClip (and other geom functions) to work. This can be done by reprojecting one of the components statically, or reprojecting geoms dynamically inside a query.

2 - Need specifics.

The above query, as written, is instantaneous - because it is a SELECT and not a SELECT INTO and the table window computes it progressively.

If I add INTO, then, on the same test machine as above, the query completes in 223.284 sec and returns 4658 records on the JOSHDITCHES.MAP file I have. In Manifold 8, the query completes in 216.828 sec and returns the same 4658 records. (The times differ a bit from the ones in the previous test, because this query uses GeomIntersects instead of GeomTouches for the test.)

If I add THREADS, the time in Radian drops to ~76 sec for 4 threads, ~54 sec for 8 threads, etc.

artlembo


2,819 post(s)
#16-Aug-17 08:27

Thanks. I think I sent you an updated version on the ftp site. The original had different coordinate systems.

Thanks for confirming the need for reproject. This is like Postgis and SQLServer. Works for me, but 8 allowed different projections. Is that something you are considering?

adamw


7,129 post(s)
#16-Aug-17 08:34

Functions that work on entire drawings (GeomOverlayXxx / GeomOverlayTopologyXxx) are taking care of coordinate systems automatically. It is just functions that work on geoms that don't do that - and we think they should not, reprojection on the level of records should be explicit. We do add calls to perform reprojection in transforms that require it, they aren't terribly involved, but I agree we should (a) simplify them as much as we can jumping on any chance, and (b) have plenty of examples performing reprojection on the fly, including into temporary components.

Regarding (a), maybe we should have a simple function: GeomProject(<drawing>, <component or system to project to>) -> <table> of geoms.

artlembo


2,819 post(s)
#16-Aug-17 09:01

Yes, similar to ST_TRANSFORM.

Thanks.

Manifold User Community Use Agreement Copyright (C) 2007-2017 Manifold Software Limited. All rights reserved.