Subscribe to this thread
Home - General / All posts - Australian building footprints (example, fun, data)
mdsumner


4,232 post(s)
#21-Oct-20 11:10

https://github.com/microsoft/AustraliaBuildingFootprints#faq

6Gb unzipped GeoJSON goodness in a 845Mb download


https://github.com/mdsumner

ColinD


1,977 post(s)
#21-Oct-20 22:02

Have you opened this in M9 Mike? According to help:

Current Manifold builds have a limit of 2 GB per JSON or GeoJSON file

and the building footprint file is 6 GB. So I get nothing imported or a dead linked data source.


Aussie Nature Shots

mdsumner


4,232 post(s)
#21-Oct-20 22:19

Oh, good to know - no I haven't tried it, I converted to TAB remotely so I could download that locally and use. Possibly could go via GDAL to avoid that limit (I'm not set up to be able to try that though).

In terms of size, TAB and SHP seem the same, GPKG is a little bigger (a bit over 1Gb). With GDAL vsizip you can zip up the file/s and get down to 500Mb. It's caused me to reflect how Manifold really is the best format, still - just a bit operating system and sharing limited ;)

FWIW, until recently my policy has been "I know text is a bad, inefficient format - but partial read is possible, conversion is problematic and introduces fidelity issues". I always resisted reformatting until I knew it would be reliable.

But, loud voices from the GDAL crowd have forced me to reconsider, and update. You really can just convert these days (my thinking is still based around ca. 2005) and get good fidelity, and that is the modern take: Get it out of GeoJSON and into binary, to TAB or GPKG (other candidates? I'm interested).

In time I think FlatGeoBuf might be a better default, that is available in GDAL 3.1.0 which is still a bit early and I'm not sure if Manifold has that yet.


https://github.com/mdsumner

steveFitz

264 post(s)
#21-Oct-20 22:40

QGIS successfully opened the file for me so I guess there may be ways to export to a format that M9 can use.

ColinD


1,977 post(s)
#22-Oct-20 01:53

Exported as a shapefile (1.7 GB) from QGIS. Pretty good given the constraints, a few things missing or slightly out of place, but impressive.


Aussie Nature Shots

steveFitz

264 post(s)
#22-Oct-20 02:55

Colin, I exported as geopackage 2.3GB. M9 seems to work well with this format. I wonder if the 254 text character limit or 2GB shapefile size limit has limited your data somehow? I never choose to export to shapefiles anymore if I have the choice due to sometimes finding truncated data some time down the track. Funny thing is I still use them more than any other format!

steveFitz

264 post(s)
#22-Oct-20 03:09

... well it won't be the 254 text limit as there is only one column and that is geometry!

ColinD


1,977 post(s)
#22-Oct-20 04:50

Exactly, property data would need to be transferred to make it more useful.


Aussie Nature Shots

Dimitri


6,275 post(s)
online
#22-Oct-20 08:32

QGIS successfully opened the file for me so I guess there may be ways to export to a format that M9 can use

QGIS used GDAL to open the file (QGIS has no native ability to read GeoJSON). If you don't mind using GDAL to open the file, Manifold can do exactly the same thing as QGIS and use GDAL as well.

Both QGIS and Manifold can use GDAL. It's just that because QGIS has no native dataport capability you're stuck using the third party GDAL package, while with Manifold you get hundreds of native dataports built in, with no need to mess with GDAL in most work. But if you prefer GDAL for any reason, no problem, it's there, and it's nice to have GDAL to connect to niche formats or niche uses, like the politically correct but spectacularly inefficient notion of publishing gigabytes of vector data in a (you can't make this stuff up...) text format. It could have been worse, of course. They might have picked braille, or even worse, GML.

To my taste, the best part of GDAL is OGR, specifically the ogr2ogr mega-utility that is an easy way to mass convert files, if you don't mind using a command prompt and dealing with a program that has about 800 command line options. That's worth it just to scratch the nostalgia itch, if you miss command line work in CP/M, DOS, or UNIX.

A better format than GeoJSON to use overall for larger vector data sets would be GPKG, which you can easily create from the .geojson using ogr2ogr. Launch the following in a command prompt within the folder where the file is located:

Convert all .geojson files in a folder to .gpkg:

for %f in (*.geojson) do ogr2ogr -f GPKG "%~converted.gpkg" "%f"

Convert one .geojson file:

ogr2ogr -f GPKG "Australia.gpkg" "Australia.geojson"

Still waiting for my Australia GeoJSON file to download, so I haven't tried the above, but it looks right (to a non-expert in OGR...).

gjsa89 post(s)
#23-Oct-20 01:03

Also interesting to note that for large vector layers like this one that really need to be handled in a binary storage format, read operations in QGIS are considerably faster from GDB than GPKG.

The read-only OpenFileGDB driver that comes with QGIS is very fast (although the same can't be said if using the FileGDB driver that is reqd to write to GDB using GDAL/ogr2ogr - see: Working with File Geodatabases (.gdb) using QGIS and GDAL | Geospatial @ UCLA)

If anyone has an explanation for the difference in read speed between GPKG and GDB (in QGIS - at least when using OpenFileGDB), I'd be interested in the technical explanation.

My uneducated explanation goes like this:: GDB and the drivers to read them are functionally superior to the GPKG equivalent, let down only by the fact that the storage format is proprietary and involves more than one hundred separate files in a .gdb folder structure.

Dimitri


6,275 post(s)
online
#23-Oct-20 09:36

Copying and adapting a bit from the GPKG topic...

GPKG is not a simple file format like .csv or .shp, but instead is data together with a small DBMS system packaged within a file.

A GPKG file is basically an SQLite database file that's extended with SpatiaLite capabilities. Reading or writing that file requires launching SQLite software and SpatiaLite software that have been installed in the form of dlls on the computer where the GPKG file is used.

So... when Manifold or QGIS or any other package reads a GPKG file it's launching the SQLite database, with SpatialLite extensions, to do so. The performance you see is the performance of the SQLite/SpatialLite engine ensemble, which is pretty slow as DBMS engines go. It's not remotely as fast as, say, PostgreSQL, and way slower than Radian.

ESRI Geodatabase (GDB) as a database technology is pretty backward stuff, but it's not as slow as SQLite. That's true whether you connect to it using ESRI's own GDB drivers (which Manifold uses) or whether you use OpenFileGDB to do so.

I could be wrong about this, but I get the impression that the OpenFileGDB project got started before ESRI opened up free and zero hassle use of ESRI's own GDB API. But now that ESRI's own code is available, may as well use that. I don't see any technical or licensing reason to use OpenFileGDB instead of ESRI's own code to connect to GDB.

gjsa89 post(s)
#25-Oct-20 21:58

Thanks Dimitri, interesting perspective. I will investigate using ESRI's own GDB driver for read/write within GDAL.

Reassuring to read that PostgreSQL is fast, as that is how most of my data is stored - and it's true when properly indexed. Not as fast as Radian/M9, but the second best option.

Still, disappointing that gpkg/SQLite is not as efficient as GDB - particularly as it has been adopted as the open source DB-in-a-file standard in relatively recent times.

steveFitz

264 post(s)
#21-Oct-20 22:09

That's great mdsummer!

Thanks for sharing.

I do find Github a bit obscure sometimes so for those searching for the actual download click on the 'Australia' link in the table under FAQ.

Manifold User Community Use Agreement Copyright (C) 2007-2019 Manifold Software Limited. All rights reserved.