Subscribe to this thread
Home - General / All posts - Discussion Question 3 - On-the-Fly Coordinate Systems
artlembo


2,916 post(s)
online
#27-Nov-17 14:29

In assembling the soil data here, the coordinate system is in Albers. That is perfect for US Soils. However, I will also be using data from other sources such as State Plane parcels, UTM roads, etc. At the moment, MF cannot perform spatial operations on these layers without us having to write a coordinate reprojection.

In 8, reprojection was done on-the-fly for spatial operations. MF already has the "enforce XY" compromise for Latitude/Longitude, so I wonder if there should be another compromise to allow on-the-fly reprojection for spatial operations. That is, the engine doesn't care what coordinate system the data is in.

Also, related to that, lots of big data sources (like the NYC Taxi cab data) is in latitude/longitude. So, integrating this with other layers becomes a chore. Additionally, the point-to-point distances are important, but since the data is in decimal degrees, we can't return 'mi' or 'km' like we could in 8 using DistanceEarth. Or, even having a toggle in the GeomDistance function that returns ellipsoidal distance.

How important is it to have spatial operations work on mixed coordinate systems?

joebocop
297 post(s)
#27-Nov-17 16:59

Thank you for raising this point.

With projection information stored within the geometry in Manifold, why can't functions natively operate on objects of differing projection? I totally agree that, from a lay-GIS-person's perspective (my own), this makes intuitive sense; feed two geometries to the GeomTouches() function, and you expect that function to compare the geometries, including their projections.

Perhaps is it too dangerous to allow Manifold to make an election on the transformation algorithm to use internally when having to do the comparison?

In any case, I agree, being able to write a coordinate converter function is powerful, flexible, etc, just not as convenient or "clean" as being able to either a) project on the fly, or b) have the underlying functions handle the projections internally.

Dimitri


4,332 post(s)
#27-Nov-17 17:50

why can't functions natively operate on objects of differing projection?

I guess this week is analogies week for me. :-)

That's like asking when you go to your lawyer to ask, "Should I have a will?" why doesn't your lawyer without saying a word or asking permission, grab you, drag you down to the nearest big hospital, give you a full-body scan, take your blood for a total blood workup, put you on a treadmill for a stress test and then send off a swab from your inner cheek to get your full DNA genome sequenced and then send you a huge bill for all that. After all, why should the lawyer take it for granted that you are not in mortal peril of some previously unknown medical problem?

The reason your lawyer doesn't do any of that is because you want your services compartmentalized. If you ask your lawyer about creating a will, the assumption is you're not harboring some medical problem. It's just a generic question and you don't want to spend a fortune getting an answer. If you're worried about your health you'll go get whatever checkup you want, and you won't go to your lawyer for that.

Much of technology is like that as well. It makes sense to compartmentalize so different gadgets don't try to do the job of other gadgets. None of this having your refrigerator try to figure out what sort of music you want to hear right now through your headphones. Same with functions. It's part of the whole "software tools" philosophy of modularization, stringing together functions like pearls on a string to get exactly the strand you want.

When you feed functions numbers you want them working on those numbers. The assumption is you fed them the numbers you wanted them to use. Re-projection isn't all that expensive, but it ain't exactly free with a billion features either.

If you want to re-project on the fly, sure you can do that in SQL: create something temporary and have at it. Getting on the same page in terms of coordinate systems is usually step 1 in most big data work anyway.

feed two geometries to the GeomTouches() function, and you expect that function to compare the geometries, including their projections.

Everything seems trivially simple, and often is, if you only have two objects. But you never have just two objects so you don't compare only two. You compare all the permutations, and you use more effective algorithms than brute-force permutational comparisons. Complicated enough without calling upon one function to do some other function's job. :-)

artlembo


2,916 post(s)
online
#27-Nov-17 19:30

I guess this week is analogies week for me. :-)

to add to the analogy though:

then again, I could also create a will myself with make_a_will.com, bring it to my lawyer, and have him sign and notarize it.

But, why do all that prep work when my lawyer is capable of it :-)

If on-the-fly spatial operations present problems, that is fine. I just would like to know what those problems are. PostGIS does not do on-the-fly spatial operations (you get an error that you are working with mixed SRIDs). That is fine because they have a really easy function: ST_Transform(geometry, srid)

I can live with an intermediary step of transforming the data on-the-fly myself, but would like it to be easier than what is currently implemented. However, if the whole problem simply goes away, that would be great.

Just as a case in point: the NYC taxi data. It is in lat/lon. It is also volatile, and changes regularly. I don't want to have to change the entire database. If I put it into a planar system, then I can have it interact with other planar datasets (i.e. like the NYC property centroids that are in State Plane). But, if I do that, and then get a boatload of addresses which I can geocode and get lat/lon, I'm back to the same place I was before - having to transform another dataset.

If you guys feel that checking for matching EPSG values adds time to the computation over millions of comparisons, that is ok with me. I am just curious as to what is the best approach. But, certainly, an easier implementation of the on-the-fly coordinate projection would be helpful (there are too many commas and parentheses to keep straight in the current implementation).

Dimitri


4,332 post(s)
#28-Nov-17 06:09

If on-the-fly spatial operations present problems, that is fine. I just would like to know what those problems are.

No problems at all, hence the current implementation. It is nontrivial to learn because it provides rich capabilities to deal with nontrivial requirements. But at the end there are many key things it does which avoid forcing the user to think about stuff that the system can do better, while not doing things inefficiently.

Consider the example of comparing two objects to see if one touches the other. OK, that's fine, but in real life the task is never as simple as comparing just two objects. In real life the task is comparing a set of many objects against possibly very, very many other objects. If you really want to write your own code that does pairwise comparisons you can do that. Not a good idea given that Radian provides much better options that do the whole thing at once.

The smart way of leveraging those big, ever more optimized, functions is to explicitly think about coordinate systems, which is why big data systems tend to do that. Operators of such systems generally are informed enough and have enough of their own strong opinions that they want to be calling the shots in that way and not having some individual function doing it for them automatically. They usually don't want individual functions to do automatic work that they strongly prefer to do explicitly.

But, if I do that, and then get a boatload of addresses which I can geocode and get lat/lon, I'm back to the same place I was before - having to transform another dataset.

Very true, but that's life in the GIS fast lane. One way to handle that is to have rich sets of controls that enable sophisticated transformations. But then you end up with commas and parentheses to learn.

Look, the fundamental issue is that you are working with complex matters where there are endless variations in detail. One size fits all tends not to be what most experts want in such matters. They want rich controls that enable them to apply their expertise to cut through very complicated situations.

You can do some simplifications, like reprojecting those property centroids also into lat/lon, but in general if you are trying to get your hands around a very complicated business with lots of hair you're going to need powerful, sophisticated controls with many options, and you're going to need to be in control of them, not having them do all sorts of automatic things by default.

When there is too much automation you lose the fine control, like a surgeon trying to do fine work while wearing mittens. In contrast, surgeons can do ultra-fine work using surgical robots to augment their skills, but those robots require a long learning curve.

tjhb

7,545 post(s)
#28-Nov-17 06:37

That is spin.

Let's address Art's actual question, which is about the user experience.

Everything works, but what could be made easier here?

Where it could be made easier, should it?

If if should, what should the syntax look like?

Dimitri


4,332 post(s)
#28-Nov-17 06:58

what can be made easier here?

Better documentation and examples?

You really think advocacy of the "software tools" approach is spin? If that is not what you meant, please do me the courtesy of quoting specifically from what I wrote which is "spin" and not pragmatic reality.

So let's talk about that reality. I say if you want to do sophisticated things with big data it is more effective to first cast the big data you are using into compatible coordinate system form, to do that as a conscious, mindful step, and not expect each function that might need to work with data to do that automatically on its own.

If you disagree with that, do you really want each function when dealing with big data to do a coordinate system transformation automatically on its own? Suppose you write a query that uses 20 such functions... do you really want the data transformed 20 times by each function acting automatically and independently?

Consider something as simple as understanding which vector objects touch other vector objects. That involves all objects in the data set, so all, in effect, must be converted into compatible coordinate systems.

Suppose you have a data set with a billion vector objects and now you write a sophisticated query, involving not just one but, say, a two dozen or so functions. Do you really want a billion objects transformed two dozen times? Why not just do it once, explicitly, so all the other modules can do their work?

tjhb

7,545 post(s)
#28-Nov-17 07:05

Da nyet, I agree with all those points Dimitri, which are well made.

I edited my post after you made yours I think--and made another one. I would still make the points in my second post.

I don't think reprojection should be automated (at least not within code), but I agree with Art that it could be made easier. And warnings would be helpful, in case of prima facie mismatches (sometimes intentional--warning can be safely ignored).

tjhb

7,545 post(s)
#28-Nov-17 07:19

Having said that, I would like the option to switch that warning off, and I would normally keep it switched off, because I would rather blame myself for failing to check projections.

But for a beginner or intermittent user I think it is different.

tjhb

7,545 post(s)
#28-Nov-17 06:51

I tend to think that nothing basic should be changed. Reprojection should always be left to the user/coder.

Except...

(1) I think there is room for making reprojection in SQL9 easier and more fluid. It's not hard now, but it could be made easier. (Actually I think that is more than half of what Art is getting at.)

(2) There is perhaps room for logging a warning, when a geom from drawing A is passed as an argument beside a geom from drawing B, where A and B do not share the same coordinate system. No automatic conversion is OK (in my opinion), but a warning about a possible mismatch could be helpful.

Dimitri


4,332 post(s)
#28-Nov-17 07:33

I think there is room for making reprojection in SQL9 easier and more fluid.

I agree 100%, which is easy, because there is almost always room to make things easier and more fluid.

Next comes suggesting how, specifically, something should be made easier and more fluid, and whether as a priority it is right to do this thing and not something else. Take the latter first: If something already is not hard now, then it probably should not be a priority. Considering how specifically, this should be changed to be easier: I understood Art's suggestion to be that spatial operations should work on mixed coordinate systems.

I don't in principle disagree with that. I disagree with that as a matter of embedding such automatic function within specific functions using big data as an example. Where I do agree is that for consumer-oriented templates it would be great if that were automatic, just like maps don't care what coordinate systems layers are in but reproject on the fly. There we have the duality of Manifold as a tool often used in casual settings and also used by maximum experts with big data. What is OK and necessary to do for consumers is violently opposed by experts in big data / DBMS settings.

A brute force way of dealing with this is to have some tools - options settings for a) confirmations and b) data sizes. Below some critical size the system just reprojects on the fly into temp data and does what is commanded without mentioning it. Above that data size, the system asks "Incompatible coordinate systems. Temporarily re-project?" So that way if the situation is harmless there is no interruption of tranquil workflow, but if the user has commanded something that might take noticeable time, he or she gets a say in what happens next.

StanNWT36 post(s)
#28-Nov-17 08:57

Hi Dimitri,

My two cents worth. I'm not a programmer, I appreciate vet written efficient highly parallelized code that the deepen team hag been working on for more then a decade. I like buttons and GUIs and filing in the variables or options that I want tools to use.

That being said would it be useful if for no other reason than educating the manifold user base and ass promotion al examples to have the Orton either through code you enter in the command window or a check box life in Manifold 8 for using GPU acceleration or not, which is for:

- reprojectic on-the-fly all layers into the maps coordinates system for:

- viewing

- or spatial analysis - transformations

Or just for selected or specified layers the user selects inthe layers men applied to the three options above.

If this was don, the user could result compare the approach of reprojection all data you want to process into one coordinate system vs. The options abov. The efficiency, performance or even hangs or crashes could be viewed by any user using reprojection on-the-fly vs. all data involved being in a single coordinate system.

Yes this may seem a wasteful allocation of development resources but reinforcing points the developers and product team are trying to make its easier with honey than vinegar. Development resources have to be efficiently allocated to get the best and biggest bang for the buck of course. But addressing things users find helpful or easier is important. I remember the Esri community in the 1990s and early 2000s making fun of MaInfo users and software by stating the irrelevance of reprojection on-the-fly for at the time vector only reprojection on-the-fly mostly because arcinfo and arcview couldn't do it, even ArcGIS couldn't do it right away. What was an Esri limitation of the platform was considered and taught in schools as "best practice". That thinking is still around today in many quarters in IT.

I always take the users perspective and not the developers or programmers perspective. By users I mean people who want tools that work without having to write tons of code. Being able to do it with the software you have is awesome and a required component but shouldn't be required to just get things done I'd that makes sense. Perhaps in too button and GUI driven, but hey GIS is a visual thing after all. If you only see scripts and never the visualization of your work it's just programming. I think far to few pepole have geography and cartography training these days. Yes this last paragraph / tangent is way off topic.

adamw


7,307 post(s)
#01-Dec-17 09:44

Not sure where best to reply, will reply here - to the entire thread.

The topic has been briefly discussed before, eg, here.

A recap:

1. Geometry values do NOT store coordinate system info. They did in Manifold 8, they no longer do in Radian / Future / Viewer. This is intentional. The main issue has always been encoding coordinate system info in a way which would be fast to compare for equality / fast to decode for use / compact / future-proof. Since the time the decision was made we made significant progress on the last two items, but there are still the first two items plus there are caveats, so we don't think storing coordinate system info is a good option.

(Example issue: we could store coordinate system info as a string in the universal format we have now, but a coordinate system definition could be, say, an SRID code specific to the database. If we keep the definition as the code, we cannot decode it, because it refers to the database and the database is not there when we decode. If we expand the code to the actual parameter names and values, we are making it much harder to compare coordinate systems of two geoms for equality - can no longer compare just the codes. Etc.)

2. Since geometry values do not store coordinate system info, it has to be handled outside of geometry functions. Typically, one of the geoms gets projected to the coordinate system of another geom.

Interactive operations do that automatically, the queries they generate include calls to project data as necessary for the operation.

Queries composed manually do NOT do that automatically. Whoever creates a query has to insert calls to project data by himself.

Where we could help:

2.1. Making calls to project data easier. Here, we are open to suggestions. Tell us what specific calls you'd like to see.

2.2. Making calls to project data unnecessary by making operations work on components instead of geoms and taking coordinate system info from these components. We are doing this already for overlays, etc. If there are some other operations between components that you are doing often (clipping?) and you want it to be done as a function that operates on components, tell us. Wrapping an operation into a function that operates on components has its drawbacks because you are losing a degree of flexibility, but we'll still have the original operation available, so that's perhaps fine.

2.3. Making calls to project data unnecessary by making the query engine smart enough to detect when a call to a geometry function combines data from different components and insert a projection call into the middle.

Eg, in the query: SELECT * FROM a INNER JOIN b ON GeomContainsAuto(a.geom, b.geom, ...), the query engine could detect that drawings 'a' and 'b' have different coordinate systems and make GeomContainsAuto do a reprojection. The suffix 'Auto' would distinguish a function that is allowed to reproject data (possibly stronger: a function that is required to recognize coordinate systems for its arguments) from the analogous function that does not care about projections.

This additional logic for the query engine is something we are thinking about doing, but there will perhaps be cases it will not cover.

adamw


7,307 post(s)
#01-Dec-17 10:04

OK, here is some imaginary syntax for 2.1:

--SQL9

SELECT *

FROM a INNER JOIN CALL ComponentProject(b, ComponentCoordSystem(a)) AS b_p

WHERE GeomTouches(a.geom, b_p.geom, 0);

The imaginary ComponentProject does nothing if the coordinate systems are the same / projects the component to the specified coordinate system if they are not. Let's skip the issues with image rects and such for now, let's just talk about drawings.

Does this help? Or is it still too long or too complex?

artlembo


2,916 post(s)
online
#04-Dec-17 00:34

I see where you are going with this, and sure, it works. Personally, I don't like the CALL command - but, that's just probably because I'm not used to it yet.

For the coordinate projection, I really like the ST_Transform(geometry, SRID) that PostGIS uses. That is just very simple. Now, as you know, Postgres doesn't really store projection information in the geometry either. That is why the geometry_columns exists. So, I guess you could argue that ST_Transform is really a placebo, since ultimately, the coordinate information is in the table definition.

But, to me the CALL function above seems like a detour, where something like:

GeomTouches(GeomTransform(a.geom,2261),b.geom) reads easier.

Of course, I've boxed myself in a bit, as the 2261 is an SRID, and you want to support more coordinate systems than that. Again, I wouldn't change things on my behalf, I can learn to adapt. But, hopefully, as a person who is constantly writing shotgun SQL, you'll see what the simplicity of ST_Transform is useful.

Dimitri


4,332 post(s)
#04-Dec-17 07:44

I really like the ST_Transform(geometry, SRID) that PostGIS uses.

Art, I don't know Post that well but I'm curious... when you use ST_Transform in a query does that transform the data so that after the query runs the data is now in the new coordinate system? Or is this just a virtual transform during the query that has no lingering effect on the data?

artlembo


2,916 post(s)
online
#04-Dec-17 12:22

Just a virtual transform for that query only. The data remains intact. That is why I like it.

otm_shank
48 post(s)
#03-Dec-17 22:48

Option 2.3 seems the most user friendly, especially if you want to appeal to more 'non-tecchie' GIS folk (i.e. more sales for Manifold!)

A few 'Auto' versions to cover the main SQL geospatial functions (Contains, Intersects etc) would likely be welcomed.

Manifold User Community Use Agreement Copyright (C) 2007-2017 Manifold Software Limited. All rights reserved.