Subscribe to this thread
Home - General / All posts - Speed and memory strategy for very large image merge
Mike Pelletier


1,859 post(s)
#29-Jan-21 22:55

Beyond having lots of storage for the project and temp files, I'm wondering what is the best strategy when merging lots of imagery (say 500 GB). Should we import all the images and then save the project before doing the actual merge? This wouldn't help with speed because of the additional step, but perhaps it would clear temp files to allow the merge if temp file space is filling up.

Maybe link the images instead of import? Linking is faster but once they are added to the map, seems like they may take just as long to display as importing.

Thoughts?

tjhb

9,627 post(s)
#30-Jan-21 00:54

Avoid opening images.

In practice that means—except one. Create a map from one image, zoom to its extent alone (or to a tiny window); then drag other images in, do nothing else, then merge.

dchall8
847 post(s)
#30-Jan-21 19:26

Interesting strategy. It most definitely takes time to render all the images. That strategy works if you are certain you have all the images and they fill the checkerboard you want. The images I get from the government sites have file names like m_2909924_ne_14_1_20140531.tif or USGS_NED_OPR_TX_Lower_SanBernard_B6_2017_dem_14rmt815790_TIFF_2019.tif. These are quarterquad images with the designation for the quarterquad built into the name. What you don't know from looking at one, or 4 of these files, is what the designation for the adjacent quarterquads are. Generally you are looking for nw, ne, sw, and se for each quad, but not necessarily always and not all the images seem to be valid and uncorrupted. What I have seen is holes in my mosaic which I need to figure out the code for and retrieve. Also note that it is very easy to get crosseyed when looking at a list of these files in a folder or in Manifold. I dragged them into a map, merged them, and created a new map for the merged image.

The files I see are typically 8MB and I'll work on 36 at a time. So I'm not trying to do files as big as Mike is. But what I do is merge 36 at a time and then bring in 36 more and merge them with the first merged image. As I recall if I bring them in or link to them, it takes the same amount of time. My computer is 7 years old 3.40 gigahertz Intel Core i7-4770 hyperthreaded 8 core with 16 GB RAM and an NVIDIA GeForce GT 730. Don't remember the time it took to merge these files, but it was not offensive.

This image is 13 miles east/west and 19 miles north/south.

Attachments:
Bandera DEM working.jpg

tjhb

9,627 post(s)
#30-Jan-21 21:03

It's a strategy borrowed mostly from adamw.

If you need to check coverage (of course, understandable) then this can be done by using drawings. I would create either a straight AOI from the rect of each image (simple and quick) or for more detail, a mosaic of areas showing each visible tile for each image.

Rects or tiles can be labelled. With an area fill style, this shows what you need to know, with vector speed, in other words avoiding the overhead of rendering each image.

Let me know if you want code to test this out. (Mostly written for other purposes, just a tweak or two required.)

(It would work similarly to what Global Mapper does for image catalogs, if you are familiar with that.)

Dimitri


6,511 post(s)
#31-Jan-21 06:31

There are some useful ideas in the Example: Create USGS File Names with Transform topic that may help managing many downloaded data sets. I find it very useful to have an index layer labeled with the names of each USGS file.

dchall8
847 post(s)
#31-Jan-21 20:17

That topic is full of information for someone dealing with these images.

tjhb

9,627 post(s)
#31-Jan-21 22:45

Yes, but it is far too long.

It needs dividing up into, I would guess, about six separate topics.

Making “leads” available into that many topics would be a challenge, but I think it is possible.

It also does not address the question asked by the OP.

Offer is still open for trial code.

dchall8
847 post(s)
#31-Jan-21 23:09

Testing my memory here, but are you thinking of the code that takes a list of file names, created in DOS, and then using Manifold to find those files to link to? I believe that was written to sort through LiDAR files - at least that's what I'm remembering about it. It was a game changer back in the day.

tjhb

9,627 post(s)
#01-Feb-21 00:06

No.

Manifold code that draws AOI areas, or individual tile areas, for images after import or linking them into Manifold.

So that you can manipulate and manage many images without the overhead of opening them.

Mike Pelletier


1,859 post(s)
#01-Feb-21 00:39

Thanks for suggestions everyone. I hadn't thought of not linking without viewing before the merge. I actually made the mistake dchall8 mentioned above and pulled in a large number of tiles from the wrong folder. Tim, that code would be handy if it is of interest to you and have the time.

While the linking approach saves time, I wonder though if it will still create similarly large temp files. If someone thinks they might run into memory troubles for temp files, then I think you could save in groups. Am I correct in thinking that data that is saved in the project somehow doesn't stress the temp file for the final merge of the groups as much as data that was just linked or imported without a save?

Mike Pelletier


1,859 post(s)
#01-Feb-21 20:24

The trouble with linking is that images are placed into a file tree (term?) and not accessible by the dialogue that pops up when creating a new map. Nor can you easily select all the images in the project pane, even with the images filtered, because they are buried in the file tree. They also go through a loading process when added to the map, even if not visible on the current map extent, that is about the same as importing.

I sent in a suggestion to come up with a better process that will hopefully also have so way to better handle the impact on temp file size.

Dimitri


6,511 post(s)
#02-Feb-21 08:07

to better handle the impact on temp file size.

Well, the amount of space required for temp files depends on the amount of data you are working with, the options you choose, and on the nifty features that Manifold provides. There's also a lot of interaction with Windows.

That Manifold makes liberal use of such storage options is a good thing, because your time is worth way more than the cost of absurdly inexpensive disk space. That is especially true if using a bit of extra disk space can save you from catastrophes like a power or hardware failure in the middle of saving a big project. Redoing work for a big project, just one, can cost you several times more than the cost of an inexpensive big disk. Having plenty of storage space is really cheap insurance.

The general rule of thumb is that you should have three times the size of your project available as free space on disk. The size of your project is what it is, plus as much again to enable a temporarily cached version to get some protection from hardware failures during processing, plus as much again as working space. That advice is usually phrased as "three times the size of the project in free TEMP space" because most folks can't keep track of what's temp and what's not temp space.

How you run your project can also affect the project size. For example, importing takes more room in a project than linking. But when you start merging to create a single raster out of many linked rasters, you still end up reading the data from all those linked rasters into project memory, be it cache in memory, cache on disk, temp files on disk, etc.

Other options can affect the total size of the project in play. For example, when you link a file, did you check the "Save cache" checkbox? See the Cache and Linked Data section of the Importing and Linking topic. If you checked that box, you commanded Manifold to use cache, which means larger project size in memory, which means more use of secondary storage (disk, temp files, page file, etc) for larger projects. Cache is generally a very good thing, well worth the storage space involved.

To take another example, Manifold creates a .SAVEDATA file in a double-tap regime to avoid catastrophic corruption of .map projects if a power failure happens right in the middle of a save. It's part of Manifold's hardening against common system failures, to make Manifold as reliable as possible. That does mean saves take slightly longer and it also means there will be greater use of temp files given the interaction with Windows, limited main memory, etc. But it also means far greater reliability than you get with other software. When I leave an array of programs open on my Windows desktop in the evening, if the next morning I see Windows updated itself and rebooted (!$#!), the one application I never have to worry about is Manifold, if I had several projects left open.

I give those examples to illustrate why as a rule of thumb, it doesn't pay to second-guess what the system is doing in terms of optimally utilizing project files, working files like .MAPCACHE, and purely temporary files that may be in temp space or pagefiles. There are very many moving parts involved in getting performance, being able to work with larger data despite relatively small main memory, and being hardened against disasters as much as possible.

If you find that space is tight, the good news is that a solution is easy and inexpensive: A 6TB Hitachi hard disk is $106 and a Seagate 6TB hard disk is $115. A 12 GB Hitachi is $194. Install a larger hard disk if you will be doing 500 GB projects. That will give you plenty of room for projects and for temp space. It also will provide plenty of room for archival storage, so you can save intermediate versions of projects in case you discover an error in workflow and want to go back to some intermediate version, without having to redo everything from the beginning.

Nor can you easily select all the images in the project pane, even with the images filtered, because they are buried in the file tree.

Here's a quick way to drop many images (I'd do 50 or 100 at a time, using folders to keep them conveniently organized) from within many data sources into a map:

1. Read about how to select items in swaths, where you ctrl-click one and then Shift-ctrl-click another and all in between get selected as well. The Layers Pane and Tables topic has a nice example.

2. Create the map into which the images will be dropped.

3. Choose File - Link and then in the Link dialog choose 50 or 100 or so and link all at once into the folder for that batch.

4. In the Project pane, set it to display images only (button to the right of the Filter box).

5. Starting at the lowest data source, click the + box to open it. Starting at the lowest data source lets the system do the work of scrolling through many as they open up. Open all the data sources. Takes about a second per data source.

6. Ctrl-click the first image to select it, scroll to the last data source and Shift-ctrl-click the last one, to select those and all in between.

7. Drag and drop all the selected items into the map. Done.

If you want to reduce rendering time, use the Layers pane to turn off all the image layers you've dropped into the map. If it's not visible, no time will be spent rendering it. A layer doesn't have to be visible to be used in the Merge dialog. I've done merges of about 100 GB for images, and didn't find the rendering time to be objectionable, but it can add up when merging 100 GB of, say, LiDAR point clouds. Those layers I turn off.

I grant that the above process involves manual effort. But if you use facilities that enable you to select very many TIF files for linking, and then swath selection to select many components to drop into a map, to turn on or off all together in the Layers pane, etc., it goes quickly even if you deal with hundreds of them.

If you have to do this repetitively, automate the process with a script. SQL is not the right choice because that's generally good where you have many records in one component, but not very many components to iterate through. If you have to do a repetitive process manipulating hundreds of components, use a script.

Mike Pelletier


1,859 post(s)
#02-Feb-21 20:02

Thanks for all the suggestions and commentary!

Well I've got plenty of TBs for storage of the project on a separate spinning disk. At issue is my 500 GB ssd C:/ drive that has Windows installed on it. After freeing up space in Downloads and Documents, I have almost 300 GB of free space. My project is up to 358 GB and will probably grow to 450 GB after today's import. Not up to the 3 times rule unfortunately. Time to upgrade to larger SSD.

I'm importing instead of linking because my understanding is that Temp files are on the C:/ drive and that by importing and saving the project, the Temp files get cleared out. Actually seems like it takes a reboot to fully clear Temp files, as viewed from Explorer report of the C: drive. Essentially moving the data to my large storage drive by importing vs. linking.

Hopefully this effort will be worth it, when it comes time to merge and export to a single .ecw file.

I suppose moving Temp files to the spinning disk via the Environment Variables dialogue in Windows would have solved the issue and allow the linking process to work. However, I didn't see that in Mfd performance notes and wasn't sure if the whole Temp file issue is more complicated than just this method.

Also, I still don't see how to use ctrl shift click and image filter to open up just linked images for dragging on to the map. Works fine for imported images though.

By the way, I normally do this type of work in Globalmapper, which of course is just linking. It's process involves linking all the images and exporting as an ecw. Certainly there are limitations with this but for just exporting an ecw, it generally works well and easily. However, this time it has shifted the image unacceptably, which is why I'm using 9.

Also, I'm generally happy as long as the process works. It's easy to fit within other work.

tjhb

9,627 post(s)
#02-Feb-21 23:58

Mike, for either Manifold 8 or Manifold 9 you should have:

  • one standard local hard disk for the Windows OS, software, and virtual memory
    • preferably an SSD
    • currently ~500GB is normally plenty
  • a second local hard disk, dedicated to the Windows TEMP files (and perhaps extra temporary storage)
    • definitely an SSD, 2TB to 4TB is currently a good choice
    • the user TEMP environment variable must point to a folder on this disk
    • for extra credit this disk should be formatted with (say) 40GB overprovisioning, to avoid occasional long delays on heavy saves
  • a third local disk for project and source data
    • SDD or HDD, as large as you need (could be mirrored for extra safety)
  • network folders if you need them
Mike Pelletier


1,859 post(s)
#03-Feb-21 01:07

Excellent Tim. Thanks and I'll get the second SSD for temp files!

tjhb

9,627 post(s)
#03-Feb-21 01:15

Thanks. P.s. For 9 (not 8) I often open the working version of the current project from a copy on the “TEMP” drive as well, as long as there’s room. That speeds things up even more, and having it on the same drive as TEMP files doesn’t materially slow things down. Meanwhile, the original copy of the project helpfully becomes a fresh backup (and later an independent save target).

jsperr104 post(s)
#03-Feb-21 06:29

Whew!

I have a 1 TB SSD EVO 860 main C:/ OS drive, an EVO 860 Pro F:/ 512 GB TEMP SSD drive, and an E:/ 4 TB SAS two disk striped raid array for downloads and source data in a cheap, used Lenovo D-20 dual XEON 6 core 96 GB RAM Windows 10 workstation.

It's not the ultimate barn burner -- it's quiet and stable which is really nice -- but I did not realize that my TEMP disk should be as large as 2 - 4 TB.

Dimitri


6,511 post(s)
#03-Feb-21 07:58

I did not realize that my TEMP disk should be as large as 2 - 4 TB.

It only needs to be that large if you are working with terabyte sized projects. To scale this, the "whole planet" OpenStreetMap database, that is, every road, every POI, every administrative division, every building footprint, etc., etc., for the entire Earth is only 1 TB. How many projects like that do you do?

Most people's GIS use is well under 50 GB for vector projects and usually under 100 GB for raster projects, so they're not going to come close to running out of TEMP space with a system like yours. Your system sounds like a perfectly reasonable system for what most people do.

Another strategy is to not worry about special arrangements of disk to try to max out speed and simply to use the same drive for working space and for TEMP folders. SSD is fast enough for that to be reasonable, and if the interface to your SSD is PCIe (as in a connection to M.2 SSD), there is no need to think about splitting access over different SATA connections. SSD drives don't have any heads that are moving around so you don't have to think about putting different classes of storage on different "spindles," like in the old days before solid state storage.

tjhb

9,627 post(s)
#03-Feb-21 18:35

I agree. Massive TEMP space is less important with Manifold 9 than with Manifold 8, and splitting storage roles between drives is much less important with SSDs than with HDDs.

If I could I would adjust my previous post accordingly.

Mike Pelletier


1,859 post(s)
#03-Feb-21 19:40

Thanks to you both for the advice! Looking forward to getting a 4TB SSD.

FWIW, I've found it to be almost 3 times faster putting project file and temp on local 5 TB spinning disk vs. putting both on network drive or splitting project on network and temp on local 5 TB drive.

Currently processing the image merge at 1,033 records/s and 64MB/s.

dchall8
847 post(s)
#02-Feb-21 17:41

You've probably already thought of this, but I freed up half a hard drive by moving files to our servers. The office network servers had 15 TB allocated to personal data storage for the 10 of us. Nobody in the office knew so there was 15 TB available. Before I copied a large batch of files to it I checked with the network contractor about the ramifications. The office was charged by the GB for nightly backups, but if I put my data into a non-backed up folder, I could leave it there with no real consequences.

Manifold User Community Use Agreement Copyright (C) 2007-2019 Manifold Software Limited. All rights reserved.