Home - General / All posts - Help rewriting a query in Radian please.
 bclement207 post(s) #30-Aug-17 20:21 I am in well over my head here, but usually if I swim hard enough I can get through. But now I seem to be swimming more like a rock and the surface seems unattainable. I have the query below downloaded from the forum (thanks tjhb) which draws a line from a point to the nearest line. I need this in order to detect the “side of street” of address points.--SQL -- For each point in [Drawing 1] not on a line in [Drawing 2] -- add a line to [Drawing 2] connecting the point -- to the nearest point on the nearest line  INSERT INTO [Drawing 2] ([Geom (I)]) SELECT  Project( NewLine([Point], [Intersection]), COORDSYS("Drawing 2" AS COMPONENT)) FROM -- <9> (SELECT [ID], [Point], CASE WHEN [DS] = 0 THEN -- The point is on the line [Point] WHEN [DS] = [DC] THEN -- The point is closest to a coord of the line, so use the coord [Coord] WHEN [S1] = 0 THEN -- The point is closest to a location along a horizontal segment AssignCoordSys(NewPoint( CentroidX([Point]), CentroidY([Segment])), COORDSYS("Drawing 1" AS COMPONENT)) WHEN [S2] = 0 THEN -- The point is closest to a location along a vertical segment AssignCoordSys(NewPoint( CentroidX([Segment]), CentroidY([Point])), COORDSYS("Drawing 1" AS COMPONENT)) ELSE -- The point is closest to a location along an oblique segment -- (the normal case) AssignCoordSys(NewPoint( ([K2] - [K1]) / ([S1] - [S2]), ([S1] * [K2] - [S2] * [K1]) / ([S1] - [S2])), COORDSYS("Drawing 1" AS COMPONENT)) END AS [Intersection] FROM -- <8> (SELECT [ID], [Point], [Segment], [DS], [Coord], [DC], [S1], [S2], CentroidY([Start]) - [S1] * CentroidX([Start]) AS [K1], -- The constant in the equation of the segment CentroidY([Point]) - [S2] * CentroidX([Point]) AS [K2] -- The constant in the equation of a normal to the segment  -- passing through the point FROM -- <7> (SELECT [ID], [Point], [Segment], [DS], [Coord], [DC], [Start], CASE WHEN RectHeight([Segment]) = 0 THEN -- Horizontal 0 WHEN RectWidth([Segment]) = 0 THEN -- Vertical NullIf(1, 1) ELSE  (CentroidY([End]) - CentroidY([Start])) /  (CentroidX([End]) - CentroidX([Start])) END AS [S1], -- The slope of the segment CASE WHEN RectHeight([Segment]) = 0 THEN -- Horizontal NullIf(1, 1) WHEN RectWidth([Segment]) = 0 THEN -- Vertical 0 ELSE  (CentroidX([Start]) - CentroidX([End])) /  (CentroidY([End]) - CentroidY([Start])) END AS [S2] -- The slope of a normal to the segment (equals -1 / [S1]) FROM -- <6> For each point, the closest line segment, -- the closest coord of the segment, and its start and end points (SELECT [ID], [Point],  First([Segment]) AS [Segment],  First([DS]) AS [DS], First([Coord]) AS [Coord],  First([DC]) AS [DC], -- Ranked by this column StartPoint(First([Segment])) AS [Start],  EndPoint(First([Segment])) AS [End] FROM -- <5> For each point, the closest line segment,  -- and the coords of that segment ranked by distance (SELECT [ID], [Point], [Segment], [DS], [Coord], Distance([Point], [Coord]) AS [DC] FROM -- <4> For each point, the closest line segment (SELECT [ID], [Point], First([Segment]) AS [Segment], First([DS]) AS [DS] FROM -- <3> For each point, the segments of the closest line, ranked by distance (SELECT [ID], [Point], [Segment], Distance([Point], [Segment]) AS [DS] FROM -- <2> For each point, the closest line (SELECT [ID], [Point], First([Line]) AS [Line] FROM  -- <1> For each point, all lines, ranked by distance  (SELECT  [D1].[ID], Geom([D1].[ID]) AS [Point], Project(Geom([D2].[ID]), COORDSYS("Drawing 1" AS COMPONENT)) AS [Line] FROM [Drawing 1] AS [D1] INNER JOIN [Drawing 2] AS [D2] ON IsPoint([D1].[ID]) AND IsLine([D2].[ID]) ORDER BY [D1].[ID] ASC, Distance([D1].[ID], [D2].[ID]) ASC ) --  GROUP BY [ID], [Point] ) --  SPLIT BY Branches(IntersectLine([Line], [Line])) AS [Segment] ORDER BY [ID] ASC, [DS] ASC ) --  GROUP BY [ID], [Point] )--  SPLIT BY Coords([Segment]) AS [Coord] ORDER BY [ID] ASC, [DC] ASC )--  GROUP BY [ID], [Point] )--  ) --  ) --  ) --  WHERE [Point] <> [Intersection];Below, I have made it (I think) to step three where I have run headlong into “Split.” Looking at the M8 version of the query, the split comes after the from clause. Reading about split in the M9 documentation though, all of the split examples show split before from. At any rate, in my rewrite efforts, if I put the split clause in the M8 version position, the alias is unknown. If I move it before the from clause, I get an invalid object reference.--SQL9-- \$manifold\$SELECT [ID], [Point], [Segment], GeomDistance([Point], [Segment], 0) AS [DS]FROM -- <2> For each point, the closest line (SELECT [ID], [Point], First([Line]) AS [Line] FROM -- <1> For each point, all lines, ranked by distance (SELECT [D1].[mfd_id] AS [ID], [D1].[geom] AS [Point], [D2].[geom] AS [Line] --Should reproject the second drawing to the first but haven't figured it out yet. FROM [POints] AS [D1] INNER JOIN [Lines] AS [D2] ON GeomIsPoint([D1].[geom]) AND GeomIsLine([D2].[geom]) ORDER BY [D1].[mfd_id] ASC , GeomDistance([D1].[geom], [D2].[geom], 0) ASC) --  GROUP BY [ID], [Point]) -- SPLIT BY GeomToBranches(GeomIntersectLinesPair([line], [line], 0)) AS [Segment]ORDER BY [ID] ASC, [DS] ASCAny pointers are much appreciated.
 artlembo2,937 post(s) #30-Aug-17 23:53 At the end of the day, what are you looking to do? Do you want to find if a point is to the left or right-hand side of the nearest line? Or, is there something else you're trying to do?
 bclement207 post(s) #31-Aug-17 00:47 At the end of the day, I want an automated addressing system that will marry my current x,y house number calculation to the road system with a connecting line segment (for routing) in a way that truly works (i.e., not geocoding). This query gets me the connecting line in a way that also allows me to find the “handedness” of the house number (left or right). It was also my hope that in rewriting this query, I would learn more spatial SQL taking up from where your fine book “How do I do that in Manifold SQL” left off and giving me a win-win. But alas, I seem to be suffering from a dielectric problem . . . too much air between the old synapses.
 artlembo2,937 post(s) #01-Sep-17 15:28 The only reason I ask is because a Script in 8 may be much more shorter. Of course, you aren't learning 9, but this script will take a set of points, find the closest lines, and then create a tangential line from each point to the closest line:Sub Main Set Qry = document.ComponentSet("Query") Set Qry2 = document.ComponentSet("InsertLat") Set P = document.ComponentSet("P") Set D = document.ComponentSet("D") Set L = document.ComponentSet("Lateral") Qry.Run Set tempTable = document.ComponentSet("tempTable") for each rec in tempTable.RecordSet    Set targetPoint = p.ObjectSet.Item(p.ObjectSet.ItemByID(rec.Data("p_id"))).Geom.Center    Set targetGeom = D.ObjectSet.Item(D.ObjectSet.ItemByID(rec.Data("d_id"))).Geom    Set endPoint = targetGeom.ProjectPoint(targetPoint)    Qry2.Text = "INSERT INTO Lateral ([Geom (I)]) VALUES(NewLine(NewPoint("&targetPoint.X&","&targetPoint.Y&"),NewPoint("&endPoint.X&","&endPoint.Y&")))"    Qry2.Run NextEnd SubNow, I have a query called Qry and I didn't want to embed the query in the VBScript, it finds the closest line to each point. The cool thing is, with another WHERE clause you can make sure the street names match. Anyway, here is the query:SELECT min(dist) as dist, p_id, first(d_id) AS d_idINTO tempTableFROM    (SELECT distance(p.[Geom (I)],d.[Geom (I)]) as dist, p.id as p_id, d.id as d_id   FROM P, D   ORDER BY dist ASC   )GROUP BY p_idNow, you can simply split the lines. I have class, so I have to run, otherwise, I would add the Analyzer function to do that. I've attached the .map file below.As you know, I'm a big fan of SQL, but I think the power of knowing how to write an actual script goes a real long way. In fact, I will probably reuse this code some day, so I am going to leave this little bread crumb for finding this post in the future:create perpendicular bisector from point to a lineAttachments: ben.map
 artlembo2,937 post(s) #01-Sep-17 15:46 BTW, if you want to do the split, just add these two lines at the end of the script:Set Analz = document.NewAnalyzer Analz.Split D, D, D.ObjectSet, L.ObjectSetthat's one of the reasons I like the scripting object - one line basically accomplishes a lot of SQL code (in some cases, in others, the opposite is true).
 bclement207 post(s) #01-Sep-17 19:45 Thanks Art. I will look into it. It is always good to spread your eggs over multiple baskets!
 artlembo2,937 post(s) #02-Sep-17 01:11 Also, it's so short of a script, we can likely recreate it line-by-line in 9, and learn how to start using 9.Adam: as you can see, this is a pretty short, but effective script. The .ProjectPoint is the key to all this, and saves an enormous amount of coding. I haven't found a similar function in either SQL or the coding API that would do the same thing. Am I missing something?Also, the ArcGIS Near_Analysis finds the nearest object to another object, and there is a toggle that will allow you to return the location on the line where the perpendicular intersection would exist (just like what ProjectPoint gives you).
 dale527 post(s) #02-Sep-17 05:15 I'm following this with interest. I acquire and use Full Motion Video to identify and locate animals along a flown transect in both infrared and RBG. My current workflow is Arc, with FMV pluggin for video management, replay and digitising. Output is a series of points. For subsequent analysis each point is attributed with distance from transect. ( the transect is the recorded track of the imager boresight) At present I use the Arc near_anaylsis to populate the offset field.Having the equivalent in Radian etc would be very useful.
 tjhb7,619 post(s)online #02-Sep-17 07:16 Also, it's so short of a script, we can likely recreate it line-by-line in 9, and learn how to start using 9.That is false Art, you are mistaken.Radian Studio/Manifold 9 have a completely different programming model from Manifold 8.It's not just that the 8 and 9 .NET object models are different, they are of very different kinds.It's all new!
 artlembo2,937 post(s) #02-Sep-17 13:32 Of course they are different. That's why we have to rewrite to VBScript from 8. Starting in this fashion is a great way to see what is available and what the corollaries are. Hence, my asking Adam about the ProjectTo object.
 tjhb7,619 post(s)online #02-Sep-17 14:33 I get it. You were not mistaken, you were being subtle!Missed that! Sorry Art.
 tjhb7,619 post(s)online #02-Sep-17 07:45 There's also a tiny bug in the query above. At most it affects performance, not logic. It might affect absolutely nothing.But the lineSELECT min(dist) as dist, p_id, first(d_id) AS d_idshould ideally beSELECT FIRST(dist) as dist, p_id, FIRST(d_id) AS d_idor something similar, so that we are not (logically) inviting the engine to sort on Distance() a second time in the outer query, but instead requiring use of the inner ORDER BY.I don't know whether the outer query would actually sort again in M8. It might see straight through.
 artlembo2,937 post(s) #02-Sep-17 13:35 Yes, good catch. I was already doing an order by so there's no need to grab min. I mighta give it a shot with a large data set to see if it makes a difference. By the way, this could be a very useful query/script for connecting houses to water, sewer, or gas mains.
 adamw7,446 post(s) #04-Sep-17 06:17 We don't yet have a function similar to ProjectPoint in 8. We have a batch of functions coming that includes it.I would note that if 8 had this function in queries, a query would perhaps be even simpler than a script (and if 8 did not have this function in queries but allowed calling script functions from queries like we do now, that would make a query pretty simple, too, although there is nothing wrong with simply using a script).
 tjhb7,619 post(s)online #31-Aug-17 00:30 Brian,I'm pleased this old query is still useful.I'll rewrite it for Radian/Manifold 9 SQL in the next day or two, with detailed notes. It might make a good example.For now, you are right to infer that SPLIT must be part of the SELECT clause in SQL9. (And BY is dropped.)I think it makes much more sense as part of SELECT. One subtle consequence is that if there are preceding fields in the SELECT list (there need not be, but that is the usual case), then they must be followed by a comma. Easy to miss that. In sum, SPLIT is no longer a separate clause, but part of a list.Perhaps the single major difference in SQL9 (for these purposes) is that we can now write subqueries as separate functions. It's simply an option--there is no semantic difference. From our point of view the only difference between calling a function and embedding a subquery are (1) better readability, (2) the possibility to re-use functions from a library. Both are great advances.Back soon.
 adamw7,446 post(s) #31-Aug-17 07:35 For the rewrite, apart from the functions that you mention, it would perhaps make sense to use temporary components. This will help split the 1-2-3-...-9 steps into manageable parts. There is a certain beauty in building a multi-story tower with tons of nested SELECTs, but there is a lot of pragmatic value in replacing it with multiple one/two-story buildings. That's just thinking out aloud, nothing too serious.
 tjhb7,619 post(s)online #31-Aug-17 08:30 I hadn't thought of using temporary tables, I don't use them nearly enough. So I will try that too, as well as functions, thanks very much!The (possible) beauty of nested(nested(nested...))) is largely gone now. You've given us better alternatives, better syntax that is also more readable.No need for backflips any more. (That applies to individual functions too.)I would absolutely love it if you could assess and improve on my translation Adam. (Coming tomorrow NZ time, sorry not sooner.)
 tjhb7,619 post(s)online #01-Sep-17 06:59 Really enjoying this rewrite. SQL9 so much better to use! Result likely tomorrow now though, sorry for extra delay.
 adamw7,446 post(s) #01-Sep-17 07:32 Whenever you are ready. I won't pass a chance to comment on a query this interesting. :-)
 tjhb7,619 post(s)online #02-Sep-17 07:30 I've done the main part, with only a few simple things to do.I thought anyone watching might appreciate this working update with comments (because of my delay).The final query will *not* be a set of nested queries as for this draft.What I'd most like to note is how much easier it is now, to find the nearest object of any kind, using COLLECT. That is fantastic, a huge difference, and very adaptable. Huge!There's a lot of baggage below. This is working code, with notes of things that confused me. So, just honest.--SQL9--!fullfetch -- applied--USE ROOT; -- reminderSELECT    [point id], [Point],    SPLIT        (COLLECT [XY], [XYNext] -- start, end of nearest segment        ORDER BY GeomDistance([Point], GeomMakeSegment([XY], [XYNext]), 0) ASC -- unindexed (slow)        FETCH 1        )-- **    TODO-- **    speed up this node using a temp table with an RTREE indexFROM    (    SELECT        [point id], [Point], --[Nearest line],        SPLIT CALL GeomToSegments([Nearest line])    FROM        (        SELECT            [D1].[mfd_id] AS [point id],             [D1].[Geom] AS [Point],        --    SPLIT        --         -- 'splitting' pro forma, since there is only one record per mfd_id        --        -- (because FETCH 1)        --        (COLLECT        --            -- can't alias fields inside COLLECT        --            --[D2].[mfd_id], -- can SPLIT multiple fields (not required here)        --            [D2].[Geom]        --        ORDER BY GeomDistance([D1].[Geom], [D2].[Geom], 0) ASC -- RTREE indexed        --        FETCH 1        --        )            ( -- parens to vectorize result table              -- which must have a single field, which must be scalar-- **            ( -- inner parens not required                COLLECT                    --([D2].[mfd_id], [D2].[Geom]) -- tuple, can't be vectorized                    [D2].[Geom] -- scalar                ORDER BY GeomDistance([D1].[Geom], [D2].[Geom], 0) ASC -- RTREE indexed                FETCH 1-- **            )            ) AS [Nearest line]        FROM            -- [Drawing 1] AS [D1]            (            SELECT * FROM [Drawing 1]            WHERE GeomIsPoint([Geom])            --THREADS 1            ) AS [D1]            CROSS JOIN            -- [Drawing 2] AS [D2]            (            SELECT * FROM [Drawing 2]            WHERE GeomIsLine([Geom])            --THREADS 1            ) AS [D2]        GROUP BY [D1].[mfd_id], [D1].[Geom]        THREADS 6        )    THREADS 1    )GROUP BY [point id], [Point]THREADS 6;The use of temporary databases is only slightly tricky, not as tricky as it sounds. The manual is not right on that yet and needs some help.More on that tomorrow.
 tjhb7,619 post(s)online #04-Sep-17 06:34 Got there. I need to simplify the code just a bit before posting it tomorrow.I think there will be a long version (with notes) and a short version (for simple use).It's been a great learning experience! Much room for improvement and tuning.Untuned, SQL9 is slightly better than twice as fast as M8 code on my test data.I have a couple of ideas to try to make it faster but will welcome more.
 adamw7,446 post(s) #04-Sep-17 06:37 It looks good. :-)It might seem big to an untrained eye, but two thirds are comments and spacing (which do help at this stage, thanks) and even with all that, it is already about half of what there was in 8. There are just 3 SELECTs if we don't include the two that filter Drawing 1 to just points and Drawing 2 to just lines.(And when we have a function like GeomProjectPoint it will all collapse into a completely trivial three-line query, but then some other project will call for something more elaborate - ie, to find not just one closest line for each point, but k closest lines - and there will be no such function at least for some time, but the query will still adopt to the new requirements with a bit of work.)
 tjhb7,619 post(s)online #04-Sep-17 06:55 The final version (a) is longer because I have used a temporary database for interim results and separated the queries and (b) will be shorter when I remove some comments (not all).I thought about wrapping most of the logic into a script function named, say, GeomProjectPoint. Or into an SQL function to do the same. But no need to change everything at once.I'm so pleased to have had the opportunity (or kick in the pants) to get comfortable using temporary databases (CREATE ROOT etc). Very powerful, very easy. Thanks for the hint in that direction Adam.One particular thing I noticed is how much it can help to add a composite BTREEDUP index over the fields that will later be used in GROUPing. I hadn't expected that. Great.[Added.]...it is already about half of what there was in 8...COLLECT saves so much space! And also some processing time. It's great logic, efficient.
 tjhb7,619 post(s)online #05-Sep-17 08:48 Update: I'm getting some very unexpected results.I expect they're my fault, but I need to partly solve them before posting code.I think it's possible that they are not my fault, but due to implicit rounding/decimal truncation of FLOAT64 values passed between tables.Really don't know so far.
 tjhb7,619 post(s)online #05-Sep-17 09:06 ...of FLOAT64 values passed between tables, or from query to function and back.Really don't know.
 tjhb7,619 post(s)online #05-Sep-17 22:59 Fixed! It was exactly my fault, being tired and making a silly error with parentheses. Couldn't see it last night.Whew, it works now. Back soon.
 tjhb7,619 post(s)online #06-Sep-17 00:23 Actually not yet. Damned =.Turns out that when working with raw coordinates, we need to implement tolerance, just like grownups.Same with Manifold 8 (also tested).When working with GEOM objects, this is all done for us. With plain numbers, not. Naturally enough.
 tjhb7,619 post(s)online #06-Sep-17 05:25 That's partly true, but more importantly my raw math was just wrong. (Not in the Manifold 8 version, which only uses geoms and built-in functions on geoms. The emperor's new clothes version mainly uses coord vectors for speed. And was wrong.)Back soon, with help and clarification from stackoverflow and stackexchange as well as Wolfram and Wikipedia.The new version is much more succinct. If I have time I'll rewrite it for M8, as well as a straight port of the old version to SQL9, for comparison.
 tjhb7,619 post(s)online #06-Sep-17 08:46 Rewritten for Manifold 8. A 24% speedup using better maths suggested on Stack Overflow.-- For each point in Drawing 1-- find the nearest point on the nearest line in Drawing 2-- Requires that Drawing 1 and Drawing 2 share a common coordinate systemOPTIONS CoordSys("Drawing 1" AS COMPONENT);SELECT    [point ID], --[px], [py],    --[vx], [vy], [wx], [wy],    --[t],    AssignCoordSys(        NewPoint(            [vx] + [t] * ([wx] - [vx]),            [vy] + [t] * ([wy] - [vy])            ),            -- projection of point along segment            -- v + t * (w - v)        CoordSys("Drawing 1" AS COMPONENT)        ) AS [Intersection]FROM    (SELECT        [point ID], [px], [py],        [vx], [vy], [wx], [wy],        Max(0, Min(1,            (([px] - [vx]) * ([wx] - [vx]) + ([py] - [vy]) * ([wy] - [vy]))                -- dot product(point - start, end - start)            / (Pow([wx] - [vx], 2) + Pow([wy] - [vy], 2))                -- square of distance(start, end)            )) AS [t]            -- parametric position of projection of point            -- along continuous line through (start, end) of segment,            -- clamped to range [0, 1] to restrict to segment            -- see https://stackoverflow.com/a/1501725            -- t = [(p-v) . (w-v)] / |w-v|^2    FROM        (SELECT            [point ID],            CentroidX([point ID]) AS [px], CentroidY([point ID]) AS [py],            CentroidX(Coord([VW], 0)) AS [vx], CentroidY(Coord([VW], 0)) AS [vy],            CentroidX(Coord([VW], 1)) AS [wx], CentroidY(Coord([VW], 1)) AS [wy]        FROM            (SELECT                [point ID],                FIRST([Segment]) AS [VW] -- nearest segment             FROM                -- for each point, all segments of the nearest line                -- ranked by distance                (SELECT [point ID], [Segment]                FROM                    (SELECT                         [D1].[ID] AS [point ID],                        FIRST([D2].[ID]) AS [line ID] -- nearest line                    FROM                         -- for each point, all lines                        -- ranked by distance                         (SELECT [D1].[ID], [D2].[ID]                        FROM                             (SELECT * FROM [Drawing 1] WHERE IsPoint([ID])                            ) AS [D1]                            CROSS JOIN                            (SELECT * FROM [Drawing 2] WHERE IsLine([ID])                            ) AS [D2]                        ORDER BY Distance([D1].[ID], [D2].[ID]) ASC -- native e                        )                    GROUP BY [D1].[ID]                    )                SPLIT BY Branches(IntersectLine([line ID], [line ID])) AS [Segment] -- native e                ORDER BY Distance([point ID], [Segment]) ASC -- native e                )            GROUP BY [point ID]            )        )    );SQL9 version will be even better.
 tjhb7,619 post(s)online #07-Sep-17 08:40 Here is a current SQL9 version.I have possibly over-emphasised using new infrastructure (functions and temporary databases).A current timing comparison, with matching fake test data (data made in Manifold 8 was converted to Radian format):Manifold 8.0.30 (revised query above): 41s.Radian Studio 9.0.163.0 (query below, without writing output data): 49s.Although SQL9 code is more convenient (it does more in one step), I would like to make it win on timing too.There are two queries (with some comments in the code): [Nearest point on nearest line SQL9], the executable query; and a function library [Vector2].Any and all suggestions for improving this code are hugely welcome. I can upload test data.[Nearest point on nearest line SQL9]-- \$include\$ [Vector2]---------------------------------------------------------------------------------- make output tables and drawings in project--------------------------------------------------------------------------------CREATE TABLE [Nearest point Table] (    [mfd_id] INT64,    [point id] INT64,    [Geom] GEOM,    INDEX [mfd_id_x] BTREE ([mfd_id]),    INDEX [point id_x] BTREE ([point id]),    INDEX [Geom_x] RTREE ([Geom]),    PROPERTY 'FieldCoordSystem.Geom' ComponentCoordSystem([Drawing 2])    );CREATE DRAWING [Nearest point] (    PROPERTY 'FieldGeom' 'Geom',    PROPERTY 'Table' '[Nearest point Table]'    );CREATE TABLE [Tie line Table] (    [mfd_id] INT64,    [point id] INT64,    [Geom] GEOM,    INDEX [mfd_id_x] BTREE ([mfd_id]),    INDEX [point id_x] BTREE ([point id]),    INDEX [Geom_x] RTREE ([Geom]),    PROPERTY 'FieldCoordSystem.Geom' ComponentCoordSystem([Drawing 2])    );CREATE DRAWING [Tie line] (    PROPERTY 'FieldGeom' 'Geom',    PROPERTY 'Table' '[Tie line Table]'    );---------------------------------------------------------------------------------- make temporary database and tables--------------------------------------------------------------------------------CREATE ROOT [Scratch];    -- temporary databaseUSE ROOT [Scratch];    -- now currentCREATE DATASOURCE [Project] AS ROOT;    -- reference to project root from temporary databaseCREATE TABLE [Segments] (    [point id] INT64,     [Point] GEOM,    [Segment] GEOM,--    INDEX [Point_x] RTREE ([Point]),--    INDEX [Segment_x] RTREE ([Segment]),        -- **faster without spatial indexes**        -- indicative speedup 57s -> 49s        -- RTREE not used by GeomDistance, finding nearest segment?    INDEX [point id.Point_x] BTREEDUP ([point id], [Point])        -- hash for grouping        -- indicative speedup 73s -> 57s    );CREATE TABLE [Nearest point] (    [point id] INT64,     [Point] GEOM,    [Projection] GEOM,    --INDEX [Point_x] RTREE ([Point]),    --INDEX [Projection_x] RTREE ([Projection])    INDEX [Point_x] BTREEDUP ([Point]), -- hash for filtering    INDEX [Projection_x] BTREEDUP ([Projection]) -- hash for filtering    );---------------------------------------------------------------------------------- for each point, each segment of the nearest line--------------------------------------------------------------------------------INSERT INTO [Segments] (    [point id], [Point],    [Segment]    )SELECT    [point id], [Point],     GeomMakeSegment([XY], [XYNext])FROM    (    SELECT        [point id], [Point],        SPLIT CALL GeomToSegments([Nearest line])    FROM        (        SELECT            [D1].[mfd_id] AS [point id], [D1].[Geom] AS [Point],            ( -- vectorize table (one scalar field)            COLLECT [D2].[Geom]            ORDER BY GeomDistance([D1].[Geom], [D2].[Geom], 0) ASC            FETCH 1            ) AS [Nearest line]        FROM            (            SELECT * FROM [Project]::[Drawing 1] WHERE GeomIsPoint([Geom])            --THREADS            ) AS [D1]            CROSS JOIN            (            SELECT * FROM [Project]::[Drawing 2] WHERE GeomIsLine([Geom])            --THREADS            ) AS [D2]        GROUP BY [D1].[mfd_id], [D1].[Geom]        THREADS 6        )    THREADS 2    )THREADS 6;---------------------------------------------------------------------------------- for each point, its projection onto nearest segment--------------------------------------------------------------------------------INSERT INTO [Nearest point] (    [point id], [Point],    [Projection]    )SELECT    [point id], [Point],    GeomMakePoint(        ProjectPointToSegmentB(            GeomCoordXY([Point], 0),            GeomCoordXY([Nearest segment], 0),            GeomCoordXY([Nearest segment], 1)            ) -- X2        ) -- geom (could leave as X2)FROM    (    SELECT        [point id], [Point],        ( -- vectorize table (one scalar field)        COLLECT [Segment]        ORDER BY GeomDistance([Point], [Segment], 0) ASC -- is RTREE used?--        ORDER BY GeomDistance([Segment], [Point], 0) ASC -- (either order)        FETCH 1        ) AS [Nearest segment]    FROM [Segments]    GROUP BY [point id], [Point] -- should be indexed    THREADS 4    )THREADS 4;---------------------------------------------------------------------------------- write output tables--------------------------------------------------------------------------------INSERT INTO [Project]::[Nearest point table] (    [point id], [Geom]    )SELECT    [point id], [Projection]FROM [Nearest point]--WHERE [Point] <> [Projection]--THREADS;INSERT INTO [Project]::[Tie line table] (    [point id], [Geom]    )SELECT    [point id],    GeomMakeSegment(        GeomCoordXY([Point], 0),        GeomCoordXY([Projection], 0)        )FROM [Nearest point]--WHERE [Point] <> [Projection]--THREADS;---------------------------------------------------------------------------------- clean up temporary tables and database--------------------------------------------------------------------------------DROP TABLE [Segments];DROP TABLE [Nearest point];--USE ROOT;-- switch back to project root-- without dropping temporary databaseDROP ROOT [Scratch];-- remove temporary database,-- implicitly revert to project root[vector2]---------------------------------------------------------------------------------- VectorX2 functions---------------------------------------------------------------------------------- Distance-- DistanceSquared-- Dot2-- Det2-- Add2-- Subtract2-- Scale2-- Hat2 (TODO)-- ProjectPointToSegmentA-- ProjectPointToSegmentB--   ProjectPointToSegmentA (no dependencies, unpacking all vectors once)--   is somewhat faster than --   ProjectPointToSegmentB (calling base functions with ad hoc unpacking)---------------------------------------------------------------------------------- base functions--------------------------------------------------------------------------------FUNCTION Distance(    [p] FLOAT64X2, [q] FLOAT64X2    ) FLOAT64    -- distance PQ    -- cf. GeomDistancePoint() (built in)    AS        Sqrt(            Pow(VectorValue([p], 0) - VectorValue([q], 0), 2)            +            Pow(VectorValue([p], 1) - VectorValue([q], 1), 2)            )    END;FUNCTION DistanceSquared(    [p] FLOAT64X2, [q] FLOAT64X2    ) FLOAT64    -- square of distance PQ    AS        Pow(VectorValue([p], 0) - VectorValue([q], 0), 2)        +        Pow(VectorValue([p], 1) - VectorValue([q], 1), 2)    END;FUNCTION Dot2(    [v] FLOAT64X2, [w] FLOAT64X2    ) FLOAT64    -- dot product V · W    AS        VectorValue([v], 0) * VectorValue([w], 0)        +        VectorValue([v], 1) * VectorValue([w], 1)    END;FUNCTION Det2(    [v] FLOAT64X2, [w] FLOAT64X2    ) FLOAT64    -- determinant |V W|    AS        VectorValue([v], 0) * VectorValue([w], 1)        -        VectorValue([v], 1) * VectorValue([w], 0)    END;FUNCTION Add2(    [v] FLOAT64X2, [w] FLOAT64X2    ) FLOAT64X2    -- sum V + W    AS        VectorMakeX2(            VectorValue([v], 0) + VectorValue([w], 0),            VectorValue([v], 1) + VectorValue([w], 1)            )    END;FUNCTION Subtract2(    [v] FLOAT64X2, [w] FLOAT64X2    ) FLOAT64X2    -- difference V - W    AS        VectorMakeX2(            VectorValue([v], 0) - VectorValue([w], 0),            VectorValue([v], 1) - VectorValue([w], 1)            )    END;FUNCTION Scale2(    [v] FLOAT64X2, [k] FLOAT64    ) FLOAT64X2    -- product V x k    AS        VectorMakeX2(            VectorValue([v], 0) * [k],            VectorValue([v], 1) * [k]            )    END;--FUNCTION Hat2 -- TODO----------------------------------------------------------------------------------------------------------------------------------------------------------------FUNCTION ProjectPointToSegmentA(    [p] FLOAT64X2, [v] FLOAT64X2, [w] FLOAT64X2    ) FLOAT64X2    -- point on segment VW nearest point P    -- (no dependencies)    AS (        SELECT            VectorMakeX2(                [vx] + [t] * ([wx] - [vx]),                [vy] + [t] * ([wy] - [vy])                )                -- projection of P along VW                -- v + t * (w - v)        FROM (            SELECT                --[px], [py],                 [vx], [vy], [wx], [wy],                ValueMax(0, ValueMin(1,                    (([px] - [vx]) * ([wx] - [vx]) + ([py] - [vy]) * ([wy] - [vy]))                        -- dot product(p - v, w - v)                    / (Pow([wx] - [vx], 2) + Pow([wy] - [vy], 2))                        -- square of distance(v, w)                    )) AS [t]                -- parametric position of projection of point P                -- along continuous line through segment VW                -- clamped to range [0, 1] to restrict to segment                -- see https://stackoverflow.com/a/1501725                -- t = [(p-v) · (w-v)] / |w-v|^2            FROM ( -- unpack vectors                VALUES(                    VectorValue([p], 0), VectorValue([p], 1),                    VectorValue([v], 0), VectorValue([v], 1),                    VectorValue([w], 0), VectorValue([w], 1)                    )                AS ([px], [py], [vx], [vy], [wx], [wy])                )            )        )    END;--------------------------------------------------------------------------------FUNCTION ProjectPointToSegmentB(    [p] FLOAT64X2, [v] FLOAT64X2, [w] FLOAT64X2    ) FLOAT64X2    -- point on segment VW nearest point P    -- (calling base functions)    AS (        SELECT             Add2([v], Scale2(Subtract2([w], [v]), [t]))            -- projection of P along VW            -- v + t * (w - v)        FROM (            VALUES                (ValueMax(0, ValueMin(1,                    Dot2(Subtract2([p], [v]), Subtract2([w], [v]))                    / DistanceSquared([v], [w])                    ))                    -- t = [(p-v)·(w-v)] / |w-v|^2                )                AS ([t])                -- parametric position of projection of point P                -- along continuous line through segment VW                -- clamped to range [0, 1] to restrict to segment                -- see https://stackoverflow.com/a/1501725            )        )    END;
 adamw7,446 post(s) #09-Sep-17 16:16 Wow, Tim, you went to write a query and wrote a library. :-)This isn't bad, far from it. All of this is useful for when the task changes a bit and canned solutions stop being sufficient. The entire reason we have this much code (and the entire reason this is interesting) is that we are missing a built-in function which happens to be fairly complex and instead of just waiting until Radian adds it, we can implement it using SQL. This is an important point (saying mostly to others reading): SQL does not have to be this long, we are simply looking specifically at doing something non-standard, the alternative would be to do nothing or go to another product.Queries like this also stress-test the engine well, although that's mostly of interest to us, the devs.The slight performance difference with 8 almost certainly occurs because you are using sanely named functions that operate on sane types like vectors instead of on disjoint pairs of values. This inevitably loses some performance. It can be recouped, but why bother, it isn't large. A better way to get performance would be to rewrite part of the query in a script.COLLECT ... ORDER BY GeomDistance(...) does not use RTREE. We could try using an RTREE here, but it would be better to add a function to get one or K nearest objects.All in all, a very interesting exercise.
 adamw7,446 post(s) #09-Sep-17 16:32 Regarding performance difference with 8 - one more place where there is likely a noticeable payment for abstractions is SELECT - SELECT - VALUES in ProjectPointToSegmentX. Setting up virtual tables around a couple of numbers takes some time - virtual tables are not terribly expensive, but the overhead from millions of such calls can indeed be significant (and even dominating the processing time). Does this get faster if you remove SELECTs and just pass Vectors? (Keeping disjoint values and replacing a function with big expressions in the calling query would perform even better, but the readability will surely suffer.)
 tjhb7,619 post(s)online #09-Sep-17 16:58 You are exactly right, yes. (I did some test timings, will add these a bit later.) Keeping vector types inline with no prior unpacking and just one SELECT node makes a big noticeable difference to speed, iff the function is called many times. And the readability is not too bad. (I would compare to Manifold 8, and say the readability and reusability are much better.)If a function is called only a few times overall then nested functions, and/or explicit unpacking, seem cheap. Maybe they are a good way to get code logic right, and for testing, before inlining everything again at the end for speed.As you say a really interesting exercise, at least for me. I hadn't done anything significant with temporary databases before for example. There are performance advantages here, but above all they make writing and testing much much easier than with a monolithic nested query. Especially given our best friend, Select plus Alt-Enter, to allow running code in stages (repeatedly after small changes, with easy inspection of interim results). It's great.I think I definitely erred on the side of trying out as much new stuff as I could. Thanks for reading! Also thanks for confirming no use of RTREE in COLLECT... ORDER BY Distance().
 tjhb7,619 post(s)online #10-Sep-17 08:47 (I will have to get back to this tomorrow.)One small surprise: the best version that uses a function is very slightly quicker than the best-matching version that does not a function (fully inline).We're only talking 49s versus 51s, for 2259894 sequential calls, but the difference seems consistent.I think that's good! Function infrastructure is very fast, we should not hesitate to use it.I will check again in the morning, and post code and timings.
 adamw7,446 post(s) #10-Sep-17 09:22 Well, we do inline functions but to inline them better than the already fully inlined code would be strange. :-)The only thing that springs to mind is less type conversions: a function forces numeric types to whatever you specify, but if you put the body of the function into a bigger expression, the types for the input values could be left as FLOAT64 or something similar - not sure this could make a noticeable difference for the query we are discussing, but who knows.I will try to take a look at what specifically happens when you post the code.
 tjhb7,619 post(s)online #10-Sep-17 09:43 Thanks Adam. Really leaving this till tomorrow now, but...Is it possible that when a function like AS (SELECT... FROM (VALUES... )) is implicitly inlined by the compiler, it translates as one node, while explicit inline SELECT... FROM (SELECT... ) must compile as two nodes?
 adamw7,446 post(s) #10-Sep-17 10:18 In short, no.
 tjhb7,619 post(s)online #11-Sep-17 00:55 Thanks.
 tjhb7,619 post(s)online #11-Sep-17 01:42 Some timings for interest.(1)I adjusted the function library, splitting it into two parts (attached).[Vector2] is a generic library of tiny vector functions, meant for reuse. [ProjectPoint] contains the specific function used here, ProjectPointToSegment*, in 4 versions (ordered fastest to slowest).Versions 1 and 2 are self-contained. Version 1 has everything fully in-line, with vector values left intact, accessed by VectorValue() on each use. Version 2 unpacks all vectors into atomic values once, before they are used/reused.Versions 3 and 4 call functions in [Vector2]. Both leave vectors to be unpacked by child functions. Version 3 pre-calculates two vector results that will be used more than once, while version 4 does not.The main query in this project, [Nearest point on nearest line SQL9 i], can call any of these functions on the sample data, once for each of the 5203 points, without any consistent timing difference (less than the variation between trials). I get a total time of 49s for the whole query, regardless which function is used.If we call the functions many more times (I tested 2259894 calls, once for each segment in the lines drawing) then timing differences become significant:-- indicative timings (2259894 calls, 6 threads)-- ProjectPointToSegment1: 200s (3mn 20s)-- ProjectPointToSegment2: 236s (3mn 56s)-- ProjectPointToSegment3: 254s (4mn 14s)-- ProjectPointToSegment4: 298s (4mn 58s)(The timings are also in the headnote to [ProjectPoint].)(2)So when using a function, it is fastest to inline everything, and leave vector values as they are. (But it may not matter at all, if the number of calls is fairly small. Calling child functions may be outright better, if that makes code easier to write, read, reuse.)Now to compare not using a function, instead inlining everything in the main query. This is attached as [Nearest point on nearest line SQL9 inline a], which is as close as possible to [ProjectPointToSegment1], the fastest function measured above.(The inlined code is in the statement beginning INSERT INTO [Nearest point]..., where the function was called before.)Testing the adjusted query as a whole, the total time is consistently 59s--versus the 49s for the version calling a function.(I haven't tested running this code 2259894 times (once per segment). I'll post again if I find a nice way to do that.)I've also attached a current test project. Query [Nearest point on nearest line SQL9 i] has changed from the earlier version, but only cosmetically. The test drawings have been slightly adjusted from the version posted earlier. Just to be sure of comparing apples with apples.
 tjhb7,619 post(s)online #11-Sep-17 07:18 (3)I messed up last night, must have been too tired, when I saidOne small surprise: the best version that uses a function is very slightly quicker than the best-matching version that does not a function (fully inline).We're only talking 49s versus 51s, for 2259894 sequential calls, but the difference seems consistent.The first sentence was correct (and possibly surprising), but the second was wrong. The timings were not for 2259894 calls, but for for 5203 calls (one per point). (4)I have now tested 2259894 sequential iterations of an inline version of the in function version 1, and the result is a bit astonishingly I think. Given many iterations, inline code in this case is about 13x faster.The testing code is attached. Note that it is meant to be run in parts, so that each part can be timed.So:2259894 calls to function [ProjectPointToSegment1]: 201s2259894 calls to equivalent inline code: 15sUm, striking.That is a different thing from testing [Nearest point on nearest line i], using function [ProjectPointToSegment1], against fully inline query [Nearest point on nearest line SQL9 inline a], with only 5203 calls. Redoing that again now to check, I once more get consistently 49s versus 51s. This is what I was looking at last night, wrongly described. I think the 59s I reported earlier today for the inline query must have been an anomaly.So at a modest scale, calling a function seems very slightly faster than doing everything inline, while with a large number of calls there is a huge difference in the opposite sense.
 adamw7,446 post(s) #11-Sep-17 10:11 The comparisons are not exactly one to one because of threads - in your test code, SQL with a function call uses 6 threads while SQL without a function call uses 4x4 threads but this happens not to matter much in this case.Anyway, the performance difference is related to virtual tables.I made several more modifications, see the attached file.First run (1) - Make Segments. This creates the Segments table which all other queries are going to be operating from. We are not measuring time for this.Run (2) - Find Nearest Point - SQL Function. This runs ProjectPointToSegment1 from your library. The function is the fastest of the 4 variants, but still has SELECT and VALUES inside. Time to run on a test system: 437.3 sec.Run (2b) - Find Nearest Point - SQL Inline. This runs the code from your test file which inlines the function into the main query. I forced the threads to be roughly the same as in 2. The inlined code still has SELECT, but not VALUES. The run time is: 32.9 sec. The difference comes from having less virtual tables on a critical path.Let's try augmenting the code to do less virtual tables. Maybe we will be able to keep a function call if we get rid of SELECT and VALUES.Run (2c) - Find Nearest Point - SQL Function Simplified. The code is in a function (so there is some abstraction cost from not having it all inlined) but instead of using SELECT / VALUES to compute 't' just once, it simply computes 't' using another function. The run time is: 37.4 sec. We managed to keep the code clean and split into functions without paying a hand and a leg in performance.Let's now try using a script. With a tiny function like this that only does something like 20+ arithmetic operations, not having to switch to script and back will likely keep (2b) faster, but let's just see what the numbers are.Run (2d) - Find Nearest Point - Script. The code for ProjectPointToSegment1 is moved into a script. (I would say it looks cleaner because you just use p.X instead of VectorValue(p, 0), etc.) Run time: 35.1 sec. Calling a script is actually not expensive at all and while inlining everything is still faster for this amount of operations, the point where the script will overtake in performance is very close - if the function was not "project point to segment" but rather "project point to geom", the script would have won significantly.
 tjhb7,619 post(s)online #11-Sep-17 11:08 Heavens! Let me see if I understand.So the way I had set up my functions was such that they incurred the setup and teardown costs of one or more virtual tables (to support nested SELECT and sometimes VALUES statements) every time they were called.That is very different from using subqueries (or a VALUES query) inside an actual query, where a virtual table is set up and torn down just once to process an entire set. (Or once per thread or, I suppose, at worst once per batch, if multiple threads are used.)That had not even occurred to me, what a great lesson Adam, thank you!In general, we should keep delegated structures (functions, whether SQL or script) as close as possible to the structure of an expression, even a complex one (e.g. through child functions), and avoid burdening them with their own additional expensive structures (notably virtual tables to support subqueries).I will understand better after trying out your alternatives in detail tomorrow. Many thanks again, this is really wonderful.
 adamw7,446 post(s) #11-Sep-17 13:40 Yes, exactly.The big difference between (2) and (2b) comes from (2) setting up / tearing down virtual tables for each record (doing millions of small SELECT / VALUES) and (2b) having to do this just for two virtual tables (two SELECTs) before threads. With a small function like the one we have, the overhead from small SELECT / VALUES ends up dominating the execution time. (2c) and (2d) rewrite the function so that it does not use SELECT / VALUES and this recovers the performance.
 tjhb7,619 post(s)online #12-Sep-17 02:43 Thanks so much again for a great explanation and all your examples.I translated your C# script to IronPython, just on principle.#IronPythonfrom Manifold import Point#def ProjectPointToSegment(p, v, w):    vw2 = (w.X - v.X) ** 2 + (w.Y - v.Y) ** 2        # square of norm(w - v)        # or dot product(w - v, w - v)    t = 0    if vw2 != 0:        # projection of P along line through VW with origin at V        t = (            (p.X - v.X) * (w.X - v.X) + (p.Y - v.Y) * (w.Y - v.Y)            # dot product(p - v, w - v)            ) / vw2        # clamp projection to segment VW        t = max(0, min(1, t))    (x, y) = (        v.X + t * (w.X - v.X),         v.Y + t * (w.Y - v.Y)        )    return Point[float](x, y)#def Main():    app = Manifold.Application    app.Log(        str(ProjectPointToSegment(            Point[float](5, 6),            Point[float](0, 0),            Point[float](10, 10)            ))        )    app.OpenLog()Now to give them a race! Back soon.
 tjhb7,619 post(s)online #12-Sep-17 03:16 Three consecutive runs for each, best underlined, as before for 2259894 calls, 6 threads.Your [(2c) - Find Nearest Point - SQL Function Simplified]: 14.016, 13.968, 14.047 sec.Your [(2d) - Find Nearest Point - Script], calling [Nearest Point Script], C#: 13.815, 14.134, 13.794 sec.New [(2e) - Find Nearest Point - Script ipy], calling [Nearest Point script ipy], IronPython: 14.694, 15.175, 15.161 sec.Versus pre-enlightenment [(2) - Find Nearest Point - SQL Function]: 204.875 sec (one trial is enough).I think you will be happy that IronPython is not quite as fast as C#--but I'm happy too with that difference!This is good.
 Dimitri4,430 post(s) #12-Sep-17 07:04 I think you will be happy that IronPython is not quiteas fast as C#--but I'm happy too with that difference!It would be interesting to see V8 results... as a personal matter it's all the same to me, but as an observer of changing fashions I'm mildly puzzled that despite Google being behind V8 it has not achieved the universality one might expect.
 tjhb7,619 post(s)online #07-Sep-17 08:59 Currently the main thing I am surprised by is that adding RTREE indexes for the GEOM fields in intermediate table [Segments] does not speed up the GeomDistance() tests when filtering the result into [Nearest point]. There are brief notes in line.Adding spatial indexes for those fields produces a slowdown overall. Perhaps just overhead, but that raises the question whether GeomDistance() can use spatial indexes from the interim table, and if not why.
 tjhb7,619 post(s)online #07-Sep-17 09:12 Here is a Radian file including the above query and library and the test data I used. (Compressed by 7-Zip to be accepted by forum.)Delete the existing result tables and drawings--namely [Nearest point], [Nearest point Table], [Tie line], [Tie line Table]--if you want to the query again.I'm not claiming that the test data is good! It's the same random data I used for Manifold 8, converted. Radian may need bigger data to shine.
 tjhb7,619 post(s)online #07-Sep-17 09:22 Lastly the revised Manifold 8 code posted up here, as a text attachment.Attachments: Nearest point on nearest line m8 b.txt
 bclement207 post(s) #07-Sep-17 15:50 Wow! What can I say but thanks. I admitted that I was in over my head, but I didn't realize that while I was in over my head, I was still in the shallow end of the pool. Thanks again. I never would have made it here on my own.Ben
Manifold User Community Use Agreement Copyright (C) 2007-2017 Manifold Software Limited. All rights reserved.