Subscribe to this thread
Home - General / All posts - GDAL 2.2 /vsicurl
joebocop
514 post(s)
#02-Feb-19 23:51

Two clients I have worked with this year are storing drone imagery in AWS S3 buckets in "cloud optimized geotiff" format (https://www.cogeo.org/). This is essentially a tiled GeoTiff with built-in overviews, and with the header data moved to the beginning of the file for HTTP GET-RANGE reading.

GDAL 2.2, usable with Release 9 (http://www.manifold.net/doc/mfd9/index.htm#gdal___ogr.htm), can read these files over http using vsicurl, and even directly from non-public S3 buckets using vsis3, without having to first download the entire file.

To read the metadata of a cloud-optimized geotiff using gdalinfo, for example:

gdalinfo /vsicurl/https://www.server.com/my_co_tiff.tif

A few other desktop GIS applications support these datasets:

  • QGIS 3.4.4 (via GDAL 2.4),
  • AcrGIS Desktop 10.6 (via the OptimzeRasters toolset's "Raster Proxies": https://github.com/Esri/OptimizeRasters/blob/master/Documentation/OptimizeRasters_UserDoc.pdf) and
  • CADCorp SIS Desktop 9 (via GDAL 2.3.1)
In Release 9.0.168.8, I create a gdal data source, or attempt to import a gdal file by specifying the URL to a cloud-optimized GeoTiff with /vsicurl/ pre-pended, but the data source just writes "Cannot open file" to the log pane.

Has anyone else seen success in loading a cloud-optimized geotiff from a GDAL data source over http in Release 9?

mdsumner


4,259 post(s)
#05-Feb-19 12:59

Try

gdal_translate /vsicurl/https://www.server.com/my_co_tiff.tif  vsi.vrt -of VRT

and then see if 9.0 will open the vsi.vrt - this is a virtual GDAL format, it just wraps the specific source into a local file.


https://github.com/mdsumner

joebocop
514 post(s)
#06-Feb-19 17:37

Oh man, classy, thank you for that tip!

That technique extends the power of those GDAL virtual file systems (vsizip, vsis3, vsicurl, etc) into Arc and MapInfo as well.

Thank you very much.

Attachments:
vrt_cogtiff.PNG

joebocop
514 post(s)
#07-Feb-19 00:05

I haven't been able to "link" (Create Datasource) successfully (using the Release 9 built-in File:TIFF data port) to a tiled TIFF file on disk.

The Data Source "expands" without error, and I can open the image component listed therein, but what is displayed is just an empty white background. Zoom to fit, or any attempt to style do not show me the data, so perhaps the Manifold TIFF dataport does not recognize tile tiffs, only non-tiled?

mdsumner


4,259 post(s)
#07-Feb-19 05:54

Try a subset so it's faster to stream down, ullr arg to gdal_translate. Got a link I could try?


https://github.com/mdsumner

joebocop
514 post(s)
#14-Feb-19 19:19

Ok, here's an example:

https://gte-rexcfnagudxgnuuhpqphnpuq.s3-us-west-2.amazonaws.com/cc_cogt_jpeg-ycbcr_mask.tif

Two interesting aspects of this tiff file are that

  • it was created from an "original" un-tiled RGBA tiff, using gdal 2.4, by running these two commands:

gdal_translate -mask 4 -b 1 -b 2 -b 3 -co PHOTOMETRIC=YCBCR -co TILED=YES -co COMPRESS=JPEG -co JPEG_QUALITY=50 --config GDAL_TIFF_INTERNAL_MASK YES cc_ortho.tif cc_cogt_jpeg-ycbcr_mask.tif

gdaladdo --config COMPRESS_OVERVIEW JPEG --config PHOTOMETRIC_OVERVIEW YCBCR --config INTERLEAVE_OVERVIEW PIXEL -r average cc_cogt_jpeg-ycbcr_mask.tif

  • it is, therefore, a 3-channel YCBCR colourspace raster, having a 1-bit "mask band". It's not a 4th band, or an alpha band; it is a "mask", created in that first command from the original 4th band alpha channel (I understand this might be complicating matters, but the space savings are significant)

The Radiant.Earth marblecutter correctly detects the mask, as you can see the on-the-fly tiling preview here:

http://tiles.rdnt.io/preview?url=https%3A%2F%2Fgte-rexcfnagudxgnuuhpqphnpuq.s3-us-west-2.amazonaws.com%2Fcc_cogt_jpeg-ycbcr_mask.tif&rgb=1%2C2%2C3&nodata=&resample=average#15/64.8425/-139.8759

Thank you for having a look. Would be interested to know your experience with it.

Dimitri


7,411 post(s)
#15-Feb-19 11:40

so perhaps the Manifold TIFF dataport does not recognize tile tiffs, only non-tiled?

The Manifold TIFF dataport works fine with tiled TIFFs. Here are examples of the tiled test files from libTIFF, which all import perfectly:

cramps-tile.tif256x256 tiled version of cramps.tif (no compression)

quad-tile.tif512x384 tiled version of quad-lzw.tif (lzw)

zackthecat.tif234x213 8-bit YCbCr (OLD jpeg) tiled "ZackTheCat"

A quick way to see what a dataport can do is to find a known-good example of a given format (Google "examples of tiled TIFF" for instance) and try importing it. If it works, that's possibly something to cross of the list of what needs to be troubleshooted.

If you open a TIFF and see all white, launch the Style panel, select all three channels and apply medium Autocontrast and then remember to press the Update Style button.

Attachments:
cramps-tile.tif
quad-tile.tif
zackthecat.tif

joebocop
514 post(s)
#15-Feb-19 17:25

First, thanks for the autocontrast style tip.

Next, import of my example tiff, or any of your examples, works "as expected", no problems there.

I haven't been successful "linking" (old term, sorry) rather than importing, though. Here is the log window output when I expand the "linked" tiff datasource of my example tiff file:

2019-02-15 08:57:43 -- Create: (New Project) (0.005 sec)

2019-02-15 08:57:58 ** Unknown field with tag 42112 (0xa480) encountered

2019-02-15 08:57:58 ** Unknown field with tag 42112 (0xa480) encountered

2019-02-15 08:57:58 ** Unknown field with tag 42112 (0xa480) encountered

2019-02-15 08:57:58 *** Can not handle image with PhotometricInterpretation=4

Similarly, when I create a datasource from your "zackthecat.tif" file (which is also YCbCr) I get the same issue as with the YCbCr tiff file I shared, albeit without the log pane warning. Opening the datasource's image component shows an all-white screen.

If I copy the data source's image component and paste it into the project, the resulting image components behave as follows.

In the case of zackthecat.tif, with all three channels selected in the style pane, I can right-click autocontrast (med), and the value fields appear to be sensibly populated with float values ranging from -55 to 171. When I click "Update Style", however, the image component window remains all-white.

In the case of my example tiff, with all three channels selected in the style pane, I can right-click autocontrast (med), but none of the value fields' values change in any way. The range remains populated with 0-255. Clicking Update Style, of course, yields no change in the all-white display of the open image component window.

I have a bad feeling that I'm doing something fundamentally incorrect here, so please accept my apologies in advance if the solution is an RTFM'r.

Dimitri


7,411 post(s)
#15-Feb-19 17:37

I'm baffled what it could be. Here's what I did:

1. Clicked on the zackthecat.tif link in my prior post to download it from the forum.

2. Launched Manifold from bin64 (64-bit), using Build 9.0.168.8 (the current cutting edge build).

3. File - Create New Data Source.

4. Chose file: tiff as the Type. Uncheck save cached data between sessions. Navigate to the downloaded .tif and choose that. Press Create Data Source.

5. That creates the data source as you see it in the screenshot below. Open the image and it displays, no need to do anything.

Just for the heck of it, I did File - Link, navigated to the .tif and linked it. Here it is as a second "linked" (same thing) data source:

Attachments:
zack.png
zack2.png

joebocop
514 post(s)
#15-Feb-19 17:52

What the.

These instructions are so basic even I can follow them.

I had not tried File --> Link. My earlier attempts had all been of the Right-click --> Create --> New Datasource (file:tiff) variety.

If instead of that I do File --> Link, both my example image as well as zackthecat.tiff display correctly.

In the case of my example image, there is modest period of a dialog showing "copying" when I open the image component, but it does display. This note still displays in the log:

2019-02-15 09:48:06 *** Can not handle image with PhotometricInterpretation=4

Is there a difference between File-->Link and

CREATE DATASOURCE [Data Source] (

  PROPERTY 'Source' '{ "Source": "C:\\\\Users\\\\joe\\\\Desktop\\\\cc_cogt_jpeg-ycbcr_mask.tif", "SourceCacheExternal": false }',

  PROPERTY 'Type' 'tiff'

);

Attachments:
ztc_wtf.png

joebocop
514 post(s)
#15-Feb-19 20:21

Ok, so I had been caching no data like

Under those circumstances, I get a white-only component.

If I instead enable cache, like...

...then it "works", while still misunderstanding the "mask band".

Does that make sense, and does that behaviour equate to what is expected? The absence of cache will preclude the data source from displaying anything?

Thanks again, learning here.

Attachments:
ds_cache.png
ds_no_cache.png
ds_works.png

Dimitri


7,411 post(s)
#16-Feb-19 06:01

Yes, that's correct. As noted in the File - Link topic, "The File - Linkcommand is basically a simplified form of the File - Create - New Data Sourcecommand. " It checks Save Cache by default. Creating a data source explicitly saying not to use cache is not the same. Leaving the Save Cache box checked in the File - Link dialog, and leaving the Cache Data box checked when using Create Data Source, lets the system get around some of the limitations of read-only linked data. You can assign initial coordinate systems and the system can do (usually) some style manipulations as well that otherwise wouldn't be possible.

Try an experiment with the zackthecat.tif file: in Windows set the file to be read-only. Now, you can see what happens when you try to create a data source on it with the default option to Save Cache unchecked.

It can be trickier to do integration with the results of a GDAL stream because that's something of a black box, but the one thing you do know is that it is not bi-directional. It's a one-way, read-only thing.

A mild surprise is this

2019-02-15 09:48:06 *** Can not handle image with PhotometricInterpretation=4

That's a surprise on the GDAL front and on the Manifold front. GDAL apparently has different ways of expressing itself when it announces a four-band image. Import NAIP through GDAL and you get a separate RGB image and a mask image. Do it this way and you get a PhotometricInterpretation=4. OK.

Manifold should be tweaked to take that and consume it the way it does native NAIP fourband import, as an RGBA so that people can manipulate that fourth band as they like.

Dimitri


7,411 post(s)
#17-Feb-19 05:39

Forgot to mention: I sent in a suggestion to do as discussed in the last sentence above.

adamw


10,447 post(s)
#18-Feb-19 13:11

Yes, this is expected. If you do ... "SourceCacheExternal": false ... = uncheck 'Cache data' when creating the data source, the system will not let you change the style, because it has nowhere to put the changes, the data set is treated as read-only and that includes things like properties that define the style. Consequently, the style remains hard-set to the default 0..255 and that makes the image white.

For now, either link with cache or copy / paste the image to the writable parent data source and work with that image.

For the future, we are considering changing the default style for images to use the range of available values automatically. That way when linking without cache, you still won't be able to change the style (of the image inside the data source, will still be able to change the style if you copy and paste the image into a writable parent data source), but the image will at least make some sense. (Although even then, we'd suggest to save the cache. Without the cache, we have to recompute intermediate levels over and over, for example. If there are specific reasons for not wanting the cache, let's rather talk about them and try to solve them.)

For the mask band, could you reupload the image linked above somewhere else? I get "The specified key does not exist" when I try to download it.

joebocop
514 post(s)
#18-Feb-19 16:13

Thank you Adamw.

Here is the file:

http://gte-rexcfnagudxgnuuhpqphnpuq.s3-us-west-2.amazonaws.com/or_test/cc_cogt_jpeg-ycbcr_mask.tif.

adamw


10,447 post(s)
#19-Feb-19 07:07

Thanks, we reproduced the complaints about photometric interpretation, will take a look into what's going on.

adamw


10,447 post(s)
#19-Feb-19 11:52

A follow up:

We investigated and the file contains two series of images (same data at different levels): the "normal" one and the "mask" one. We are importing the "normal" image fine, but are ignoring the "mask" image. We will adjust the code to import the "mask" image as well (by making unmasked pixels in the "normal" image invisible).

Thanks for the file.

joebocop
514 post(s)
#04-Feb-20 18:52

Apologies for digging up an old thread.

There seems to be a difference between the way a non-cached ECW and non-cached TIFF are handled.

I am frequently finding myself wanting to use tif and ecw images within a Manifold project, but not needing (or wanting) to import or otherwise cache those image data within the Release 9 project files. I do not need to do any analysis on the data from those images, and the source image files themselves already contain very fast tiled overviews, of which I hope Release 9 can make use. These files can be large-ish (~10gb) and are stored in cloud locations, so a lengthy wait for "caching" isn't desirable in this case; we'd prefer to be able to read only those portions of the file that are being requested by a Map viewport (or whatever). The ECW data port seems to allow this, though at the cost of not being able to apply any styling.

Here is a screen shot of an ECW and a GeoTiff in a Release 9 project. The GeoTiff doesn't seem to display anything, whereas the ECW does.

Here is the Style pane for the ECW

And for the TIF:

Here is the relevant portion of gdalinfo output for the ECW (left) and GeoTiff (right)

First, perhaps I am doing something wrong? Perhaps the TIF display is as a result of some mask band weirdness? If so, thank you for pointing it out. We have a lot of "Cloud Optimized GeoTIFFs" that are YCbCr, rgb+mask, with internal overviews, which is perhaps an exotic format.

Second, conceptually, is it important that image data be "cached" within a project in order to apply styling? Ideally, from my perspective, the style description (JSON, right?) could be applied to tiles "on-the-fly", without the need to save any actual image data within the project.

Thank you again.

Attachments:
ecw_vs_cogtiff.png
gdal_both.png
style_ecw.png
style_tif.png

Dimitri


7,411 post(s)
#05-Feb-20 07:45

There seems to be a difference between the way a non-cached ECW and non-cached TIFF are handled.

Since you mention cache I assume you mean when you create a data source or link to either ECW or TIFF. If you want fine control over cache, use Create - New Data Source.

ECW and TIFF are two very different technologies, so how they behave is different. ECW is designed to stream only those parts of the image you request. TIFF is basically "all or nothing."

this case; we'd prefer to be able to read only those portions of the file that are being requested by a Map viewport (or whatever). The ECW data port seems to allow this, though at the cost of not being able to apply any styling.

There are three cache options when you create a data source on an ECW or TIFF. When the Cache data box is off that means no caching of any kind. When the Cache data box is on, the other two boxes control how cache is used. The big one is whether you save cached data between sessions, since that will grow the map file by the size of the image. Let's assume that's turned off.

When the Cache data box is on, only that data which has been pulled from the server is cached. In the case of ECW, only that data which is streamed to fill the viewport is cached. You need to get that much anyway, so there's no point in turning off the Cache data box when using ECW.

TIFF format does not allow partial streaming. You have to download the entire TIFF image to show any part of it, and that goes for using intermediate levels (which a cloud server might not expose, since those are sidecar files). Just the act of opening a TIFF image within a data source to show it means you have to download the thing into local memory. Downloading it into local memory is the same as downloading it into cache.

But if you turn off the Cache data box, you're saying "don't do that." That's why when you try to create a data source on a TIFF but you also uncheck the Cache data box you don't see any image. There's nothing to look at if you don't download the image into cache.

If you turn off the Cache data box when creating a data source on an ECW, you won't be able to style the ECW image. If you turn off the Cache data box when creating a data source on a TIFF, you won't be able to either see the TIFF or to style it .

Second, conceptually, is it important that image data be "cached" within a project in order to apply styling?

Yes. You need a place to put the styling information for the image. If the data source is cached, that styling information can go into the cache.

from my perspective, the style description (JSON, right?) could be applied to tiles "on-the-fly", without the need to save any actual image data within the project.

That's exactly what happens when you check the Cache data box. Try that with an ECW and you'll see there is no cache subfolder created within the System Data folder, if you don't check the "Save cached data between sessions" box. If you don't check the "Save cached data between sessions" box, there is no image data (or styling data) saved within the project as a persistent cache.

If you don't check the "Save cached data between sessions" box, that's also why the styling disappears, even if you save the project, when you open the project again. The data source is still there, but there's no styling because the system was told to not save cached data between sessions.

There is useful discussion of the above in the File - Create - New Data Source topic, as well as in the Importing and Linking topic.

joebocop
514 post(s)
#05-Feb-20 18:35

As ever, thank you for taking the time, very much appreciated.

On the topic of my ECW file data source and the behaviour of "cache". When I create a data source with only this box checked

and then subsequently "open" the image component in that data source, I am presented with this "Migrating Data" dialog (with the datasource DDL pasted behind for reference), which appears to import the entire image; the process takes ~ 14 minutes to complete.

Thereafter, pans and zooms behave as if the image is local to the map file, and there is no subsequent "Migrating" dialog when I zoom in. Doesn't that indicate that the data port has streamed the entire image (and not just the topmost overview) upon first open in this case?

Here is the resulting component, which remains un-stylable, and which displays an artifact on the rightmost margin of the image.

I post this just because the equivalent workflow in QGIS is fast and easy, with the ecw's topmost overview displaying immediately, and with the ability to immediately style. Subsequent overviews are streamed when requested, on zoom, pan. I believe it would be useful to include the ability to quickly view and style images that reside outside of projects for the sole purpose of viewing, where no processing will be done, similar to an Image Server data source.

On the topic of my TIF, I should clarify that the TIF files we deal with are "Cloud Optimized GeoTIFF" format (https://www.cogeo.org/), and are "optimized" to be streamed. The overviews are internal to the file (not sidecars) and both the data and overviews are arranged in tiles, rather than stripes. The gdalinfo output I pasted above shows the internal format, which is arranged similar to ECW, but which uses JPEG compression internally, rather than whatever wavelet magic is happening in ECW. A link to an example image file in this format is provided in my post above.

Perhaps what I am actually doing here is requesting a new format be supported, since "cogtiff" deviates from TIF significantly enough in this regard. I'll put that into a suggestion and send it off.

Attachments:
ecw_opened.png
nds_sql_and_copying_dialog.png
new_data_source.png

Dimitri


7,411 post(s)
#06-Feb-20 03:21

and then subsequently "open" the image component in that data source, I am presented with this "Migrating Data" dialog (with the datasource DDL pasted behind for reference), which appears to import the entire image; the process takes ~ 14 minutes to complete.

Never seen that before. Try an experiment: create a data source on an ECW file that is hosted on your local hard disk, not out in a cloud volume. What you'll see is that it does exactly this:

is fast and easy, with the ecw's topmost overview displaying immediately, and with the ability to immediately style. Subsequent overviews are streamed when requested, on zoom, pan.

That's what happens in Manifold, normally, with ECWs as well. What's going on in this case is a) something about the data source, or b) something about the "ecw" format. The "migrating" dialog is normally something Manifold shows only when migrating some sort of older format into a newer version, for example, from Release 8 to 9. Could it be this was something used in 8? Are there any sidecar files accompanying it?

I believe it would be useful to include the ability to quickly view and style images that reside outside of projects for the sole purpose of viewing, where no processing will be done, similar to an Image Server data source.

That's what data sources provide. As you noted about image servers, create a new Bing streets data source (it's one of the favorites) and you'll see the above is what you get.

But all that is up to the host format. Some formats, like PNG, TIFF, JPG, etc., don't support partial multi-resolution grabs like ECW or MrSID do. That's why people use technologies like tile-served imagery (WMS, TMS, image servers, etc.) to provide very large images over the web.

Perhaps what I am actually doing here is requesting a new format be supported, since "cogtiff" deviates from TIF significantly enough in this regard. I'll put that into a suggestion and send it off.

Yes. COG is not a TIFF. I note, by the way, that cogeo's use of NASA's logo to imply that NASA supports the format appears to be misleading: they don't cite any use by NASA. Instead, they cite radiant.earth as republishing some data in COG that they obtained from NASA.

joebocop
514 post(s)
#06-Feb-20 19:33

Thank you.

The ECW and COG files had been stored on Google Drive, exposed locally as "g:\". To eliminate that layer of abstraction I've copied the files from that "g:\" location onto my local desktop, which resides physically on this laptop's local disk (for clarity).

For additional clarity, and because I may have confused Read-Only and Read-Write data source configs when reporting my experiences previously (gah, sorry), here are clearer descriptions of the behaviour.

Neither files has any sidecars associated with it. Neither file has ever been touched by Release 8. The ECW was produced by GlobalMapper, and the TIF by GDAL 2.4.

The image component for ECW data source, a 128*128 UINT8*4 image, under various configurations.

1. Read-Only, No Cache: Image component opens immediately, style with no ability to style

2. Read-Write, No Cache: Image component opens immediately, with no ability to style

3. Read-Only, Cache: Image component opens after ~100s "Migrating Data" dialog, with no ability to style

4. Read-Write, Cache: Image component opens after ~100s "Migrating Data" dialog, with ability to style. The 4 bands in "RGBA" (R:0,G:1,B:2,A:3) incorrectly display the transparency layer, however. Changing A:Value=0 results in correct display, though without any transparency.

The image component for TIF data source (pointed at a COG), a 256*256 UINT8*3 image, under various configurations.

1. Read-Only, No Cache: Image component opens immediately, is blank, with no ability to style

2. Read-Write, No Cache: Image component opens immediately, is blank, with no ability to style

3. Read-Only, Cache: Image component opens after ~55s "Migrating Data" dialog, displays correctly (including correctly detecting the MASK as alpha), with no ability to style

4. Read-Write, Cache: Image component opens after ~57s "Migrating Data" dialog, displays correctly (including correctly detecting the MASK as alpha), with ability to style. The 3 bands in "RGB" (R:0,G:1,B:2,A:Value=0) display correctly, including transparency.

I can provide the files to tech; they are ~340mb for the ECW and ~290mb for the COG. Should I email a public URL to those files?

Postscript on the COG format, I'll send in a suggestion to support natively. ESRI added support for COG and offers a toolset (https://github.com/Esri/OptimizeRasters) for creating and manipulating them, as well as the NASA-favoured LERC-compressed MRF format (https://github.com/nasa-gibs/mrf/blob/master/doc/MUG.md) and (https://github.com/nasa-gibs/mrf/blob/master/spec/mrf_spec.md).

Thank you again.

Dimitri


7,411 post(s)
#07-Feb-20 14:33

Regarding read only, read/write: If you create a data source as read only, you're saying "don't change this in any way" .... that means, no styling.

You can get around that in a way by making a virtual copy. See the various topics in the User Manual for dealing with read-only, web-based data sources, for example, this one, or this one.

What you haven't said is how you bring the images into Manifold. Are you using File - Link or are you using File - Create - Data Source?

I can provide the files to tech

Have you launched a support incident to tech? If you have, then if they ask you to send a link, do that.

If you want help from folks in the forum here to help you figure out what is going on, the why not post a link here?

Keep in mind it never hurts to file a bug report. Everybody knows that 99% of them aren't bugs, so there's no worry about reporting something that turns out not to be a bug. My gut feel is that this is not a bug but something weird about the files, but then... you never know until you look at all details.

adamw


10,447 post(s)
#10-Feb-20 11:19

What happens here is that when linking with cache, you are telling Manifold to cache everything, and that just isn't necessary.

The options for cache include 'Cache only data that is expensive to compute dynamically'. That option tells the system to make its own decisions on what to cache vs what to read directly from the original file or files. By default, the option is checked and the system does not put into the cache things that can be obtained from the original file easily. In your screens the option is unchecked, and by unchecking it you are telling the system to stop acting smart and just cache everything, duplicating data if needed.

Check the option back on and things should get back to normal.

For ECW, the system will only cache component properties (style / coordinate system, etc). The size of the .MAPCACHE file is going to be something like 800K because of the initial overhead of just having the initial structures inside it, but it won't get larger. Creating a cache of that size will take pretty much no time.

For TIFF, the system will cache component properties and intermediate levels - unless you have intermediate levels already stored in a separate .RRD file, then the cache will only include component properties. If intermediate levels are absent, they will be cached, yes. To avoid reading the entire file to render it with zoom to fit, for example. The size of the cache will be something like 1/3 or 1/4 of the original file, give or take.

adamw


10,447 post(s)
#10-Feb-20 11:25

Also:

We'll consider adding an option to read cloud-optimized TIFF files directly.

We'll consider adding an option to read data referenced by GDAL VRT files - by invoking GDAL on them.

Manifold User Community Use Agreement Copyright (C) 2007-2021 Manifold Software Limited. All rights reserved.