Subscribe to this thread
Home - Cutting Edge / All posts - Manifold System

8,037 post(s)
#30-Jun-18 14:49

This is a temporary build that is basically with an extended timelock that allows it to work past June. The real build is coming around Monday, we are finishing a big change.

There is a tiny added feature: .JSON files are automatically read as GeoJSON. This comes from Microsoft recently publishing vector footprints of all US buildings (!) under the Open Data Commons Open Database License (ODbL) as GeoJSON, their files all use the .JSON extension. We cannot currently read the biggest files due to an internal limitation of our code, but we will make sure we can read them all.

SHA256: b0957ab08b79e406ae49857a3a877d3db00c82cf5c2f8efbe19c1e77989d6224

SHA256: f02ee38939414f98eadb12904aa9663fc7a513a21635d426a7bbc6393ccfacae


8,037 post(s)
#08-Jul-18 17:25

Status update.

The next build is taking longer than we initially thought it would take. We are adding a really big set of connected features to the query engine, which snowballs a bit. We want to do it the right way and leave no holes.

We are pushing to have the build as soon as possible, we should have it somewhere in the coming week.


8,168 post(s)
#08-Jul-18 22:53

God's work.

148 post(s)
#09-Jul-18 01:30


Meanwhile having fun with this dataset doing tasks at a speed that might seem miraculous to ordinary mortals

-- Manifold System Beta

-- Import: C:\M9work\data\BuildingFootprintsUS\California\California.json (0.004 sec)

*** (import) Invalid file type.

California 3,240 MB probably over size limit mentioned by Adam, trying Florida 1,960 MB

-- Import: C:\M9work\data\BuildingFootprintsUS\Florida.json (172.269 sec)

-- Query: [(Index) Florida Building Footprints Table] (0.000 sec)

-- Query: (Transform - Center, Inner - Add Component) (43.476 sec)

Render: [Florida Building Footprints] (0.571 sec)

CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz; GPU: CUDA (GeForce GTX 1070 (6.1)); RAM: 32725 MB

Attachment: MS building generation missed a few houses under heavy tree cover but nontheless dataset will be a great asset.



4,941 post(s)
#09-Jul-18 05:45

It is a very interesting data set and it is a great thing Microsoft has issued it. The (temporary) limitations on Texas and California are mentioned in the GeoJSON / JSON topic and in the Example: Import GeoJSON / JSON File example topic. That latter topic also has illustrations of some of the anomalies. Hugh has also noted with an illustration a class of many anomalies, buildings hidden under trees.

There is a lot to think about in connection with this Microsoft gift to the community. From a "big picture" perspective it shows the current limits on the state of the art in terms of automatic vectorization, given significant effort by a knowledgeable and reasonably well-funded team to use very large resources, a huge library of images and very large AI neural networks. Those limits point to directions where effort might be invested in the future.

From Microsoft's description of their work it is clear this was not an "unlimited funding" effort as sometimes mounted by Google. They did not attempt to leverage all possible inputs, such as LiDAR, street views and multispectral satellite photography, as controls on their results. Microsoft's effort had limits, which is good, because those make the process and results more relevant to what might be done by ordinary mortals within some reasonable time span.

It is an open question how useful the result is. The main problem is that it is huge data which is seeded with very many mistakes, including plenty of phantom buildings indicated in open fields (as seen in the example), numerous missed buildings plainly visible among other buildings and routine errors in the footprints constructed. One could say "it is a step forward compared to other such automated vectorization of satellite photos," which is probably true, but does not really change the conclusion, "sure, but the errors are why we use manually vectorized footprints in our jurisdiction," or "that's why we now focus on LiDAR...". It can be way more costly to find and fix errors seeded throughout than it is to simply digitize manually a particular area of interest in an assembly-line fashion.

That's OK. You could use this data to find areas where the buildings are likely to be, for such a process, for statistical purposes, as a check on LiDAR and so on.

It's also something to think about that Microsoft chose to publish the data as JSON. It seems likely to be a political decision given the spectacularly inefficient nature of JSON for gigabyte-scale data. The primary virtue of JSON is that it is human-readable, a value that is useful only for relatively small text. It's crazy to publish two gigabytes of text in human-readable form instead of in a fast binary format.

To get an idea of what 2 GB of human-readable text means, you can do a quick and dirty estimate: Internet tells us the average "page" of single-spaced text contains 3000 characters. 2 gigabytes of text therefore requires 666,666 pages, which at 11 inches per page (splitting the difference between A4 and "Letter" size) ends up being a set of pages that placed end-to-end would be 115 miles or 185 kilometers long.

The current limitation of 2 GB on JSON imports is related to that spectacular inefficiency of JSON for bigger data and to the belief that efficient efforts do not use human readable formats for data that requires 185 kilometers of single-spaced text. JSON is a text format and is basically treated as a text format by Manifold, the sort of thing that is used to save queries, programming text, and commentary. 2 GB seemed way more size than is necessary for the length of a query - who writes queries that are 185 kilometers long as single-spaced text? - or a program or as comments. So, Microsoft's puzzling choice of JSON for gigabyte scale data comes as a surprise.

Expanding the maximum size of text items beyond 2 GB to handle Microsoft's choice is a straightforward task. One wishes they had used GPKG or something else. :-)

148 post(s)
#09-Jul-18 07:04

good thoughts on the limitations of Microsoft's approach. However I think it is mainly aimed at OpenStreetMap folks refining local maps so these will be a help. Though it is hard to image anyone working on OpenStreetMap big data outside of M9.

Also I am an anthropologist and all data adds qualitatively to one's work in addition to specific analyses. This does too. After using Manifold as my GIS for so many years in my old age it is wonderful to have it become a tool that can let me explore and relate so many very large geodatasets so fast.


6,194 post(s)
#09-Jul-18 21:49

German UI-file for build



4,941 post(s)
#10-Jul-18 08:07

Thanks Klaus!

Manifold User Community Use Agreement Copyright (C) 2007-2017 Manifold Software Limited. All rights reserved.