Subscribe to this thread
Home - General / All posts - Installed Manifold 9 Universal and now it's running extremely slow
IanSchelly12 post(s)
#24-Jul-20 00:34

I was testing Manifold Viewer 9.0 64 Bit on processing a huge raster against a large data set of property boundaries and it was blazing faster, processing about 10,000 boundaries a second. I got my IT department to buy Manifold 9.00 Universal and install it on my computer. I first ran my test data to make sure everything was working smoothly, 300 property boundaries which finished in a blink of an eye before now is taking around 90 seconds. I opened Manifold Viewer 9.0 64 Bit see if that was running slow and sure enough it's now crawling too taking about 90 seconds process the test data.

Looking at my task manager when I run this it's only using around 10% of my CPU and 28% of my memory. When I was first running Manifold Viewer 9.0 64 Bit it was basically maxing out my CPU and memory while processes.

So in short. Was testing Manifold Viewer 9.0 64 Bitand it was super fast, installed Manifold 9.00 Universaland now both are running super slow. Any ideas whats going on and how I can fix this? Or what I should look into to see whats going on?

dale

558 post(s)
#24-Jul-20 01:03

Ian,

you will need to let us know which version(s) you are using.

And if at all possible sample data, or directions to the data you are using.

IanSchelly12 post(s)
#24-Jul-20 01:29

Attached is the property polygons and land cover raster that I was testing along with a txt of the query I'm running. This processed in a flash before Manifold 9.00 Universal got installed. The log said it took 80.953 seconds to run query I'm running, before it was under a second. I've also attached a screen grab of the 'About Manifold System' from the help menu

Attachments:
Manifold_system.JPG
Test_data.zip
test_query.txt

dale

558 post(s)
#24-Jul-20 06:05

What version of viewer did you have?

Both viewer and universal are identical as far as versions go. Save and export are disabled in viewer.

Once I have a moment, I'll run your query and report back. Others might beat me to it!

Dimitri

6,104 post(s)
#24-Jul-20 06:12

Haven't looked at the data yet, but a really slow query usually indicates a lack of an index on fields that play a significant role in the query.

If you open the tables used and run Edit - Schema, that will show you what indices are in the schema. If you don't have an index on fields that play a key role, quickly add an index on each such field. I'll bet in that case when you rerun the query it will go much faster.

Keep in mind that Viewer is Release 9 - it's the same source code, just compiled with options to turn off printing, scripts and writing. Everything else is identically the same code.

In future, by the way, don't trap yourself into a mental approach that works against finding a solution: phrase the issue as "Problem with a query running really slowly," because that guides mental energy towards a fix without in advance pinning yourself to what the root cause must be.

Saying the problem is "9 is running extremely slowly" means you already decided what the problem is - not an inefficient schema but a mysterious slow down in 9. That's almost certainly wrong, but when your debugging process starts with that wrong assumption then when trying to find a fix you're already chasing in the wrong direction.

It could, of course, be that you've found a bug in 9. That's unlikely, but possible. But the best way to debug is to describe the problem without pinning yourself to a decision, too early in the debugging process, of what the problem must be. If it turns out to be a bug, and not just a missing index, without fail the bug will be found and promptly fixed.

One last thing... the easiest way to provide test data is to provide a Manifold project in .mxb format: that's compact and usually will fit within the limits of what you can append on the forum. That would help since your query does not match the test data provided (always best to test exactly the same situation). Could you do that for your test data and query?

Dimitri

6,104 post(s)
#24-Jul-20 09:33

OK, I've taken a look at the data and have tinkered with the query. Here is the test query with names adjusted to match the test data. I also added ; at the ends of the VALUE statements, the Function definition, etc.

--ALTER TABLE [Properties Table] (

-- ADD v03 INT32,

-- ADD v13 INT32,

-- ADD v15 INT32,

-- ADD v19 INT32,

-- ADD v24 INT32,

-- ADD v33 INT32)

VALUE @target NVARCHAR = ComponentCoordSystemAutoXY([Land_cover]);

VALUE @source NVARCHAR = ComponentCoordSystemAutoXY([Properties]);

VALUE @conv TABLE = CALL CoordConverterMake(@target, @source);

FUNCTION ValueForGeom(@g GEOM, @v INT32) FLOAT64 AS (

  SELECT Count(*)

  FROM CALL TileGeomToValues([Land_cover], @g)

  WHERE [value] = @v

 ) END;

UPDATE [Properties Table] SET 

  v03 = ValueForGeom(CoordConvert(@conv, [Geom]), 3),

  v13 = ValueForGeom(CoordConvert(@conv, [Geom]), 13),

  v15 = ValueForGeom(CoordConvert(@conv, [Geom]), 15),

  v19 = ValueForGeom(CoordConvert(@conv, [Geom]), 19),

  v24 = ValueForGeom(CoordConvert(@conv, [Geom]), 24),

  v33 = ValueForGeom(CoordConvert(@conv, [Geom]), 33);

(uncomment the -- comments in front of ALTER and ADD the first time you run it... those lines are commented out so you can re-run the query when tinkering with the text and just update the new field added the first time.)

That's a good query to start with.

On the machine I'm using (an old machine) that original query runs in 86 seconds. It's not parallel because there is no THREADS statement in there to make it parallel. Adding THREADS is a first step.

However, parallelization won't help total speed all that much with the test data because much of the time is spent in computations for the two really big objects. Basically, the query gets done instantly on the smaller geoms and then spends its time working on the two big ones. Without parallelization they get done in sequence. With parallelization, they'd get done simultaneously, each in its own parallel thread, so the more big geoms there are in the actual working data the more parallelization will help.

The query depends on the TileGeomToValues() function, which in this case is being used as a brute force way to grab all values within a geometry into a table, many millions of records for the big geoms working against a raster that would have over 12 million records at one value per pixel.

TileGeomToValues() isn't internally parallelized within a geom because there's nothing in there worth parallelizing. Adding THREADS will let the function operate on different geoms in different threads, but that's not going to make the thread running on a single big geom any faster. However, there will be a gain with bigger data where there are multiple big geoms, since those won't be done in sequence but will be done in as many threads as can be mustered.

Besides adding THREADS, we can also rewrite the query for slightly more efficiency:

--ALTER TABLE [Properties Table] (

-- ADD v03 INT32,

-- ADD v13 INT32,

-- ADD v15 INT32,

-- ADD v19 INT32,

-- ADD v24 INT32,

-- ADD v33 INT32)

VALUE @target NVARCHAR = ComponentCoordSystemAutoXY([Land_cover]);

VALUE @source NVARCHAR = ComponentCoordSystemAutoXY([Properties]);

VALUE @conv TABLE = CALL CoordConverterMake(@target, @source);

FUNCTION ValueForGeom(@g GEOM, @v INT32) FLOAT64 AS (

  SELECT Count(*)

  FROM CALL TileGeomToValues([Land_cover], @g)

  WHERE [value] = @v

 ) END;

UPDATE (

  SELECT mfd_id, v03, v13, v15, v19, v24, v33,

    ValueForGeom(CoordConvert(@conv, [Geom]), 3) AS c03,

    ValueForGeom(CoordConvert(@conv, [Geom]), 13) AS c13,

    ValueForGeom(CoordConvert(@conv, [Geom]), 15) AS c15,

    ValueForGeom(CoordConvert(@conv, [Geom]), 19) AS c19,

    ValueForGeom(CoordConvert(@conv, [Geom]), 24) AS c24,

    ValueForGeom(CoordConvert(@conv, [Geom]), 33) AS c33

  FROM [Properties Table] THREADS SystemCpuCount() BATCH 1

SET v03 = c03, v13 = c13, v15 = c15, v19 = c19, v24 = c24, v33 = c33;

That query runs in 28 seconds on my machine. It's faster because the two big geoms are computed in their own threads without waiting on each other or on the other geoms, while all the other geoms are computed in their own threads. I used a BATCH of 1 so that both big geoms wouldn't just happen to be grouped within the same thread. If your non-test data has a lot of big geoms, you should see an improvement because all of the big geoms will be run in their own threads. On a machine like a 12 core Ryzen with 24 threads, that could be a big help.

So what else could be done? It seems inefficient to re-run TileGeomToValues() for each of the values of interest. That's asking the system to sweep through the raster for all geoms every time. So one can imagine a third variation running TileGeomToValues()just once:

--ALTER TABLE [Properties Table] (

-- ADD v03 INT32,

-- ADD v13 INT32,

-- ADD v15 INT32,

-- ADD v19 INT32,

-- ADD v24 INT32,

-- ADD v33 INT32)

VALUE @target NVARCHAR = ComponentCoordSystemAutoXY([Land_cover]);

VALUE @source NVARCHAR = ComponentCoordSystemAutoXY([Properties]);

VALUE @conv TABLE = CALL CoordConverterMake(@target, @source);

FUNCTION ValuesForGeom(@g GEOMTABLE AS (

  SELECT

    Sum(CASE [value] WHEN 3 THEN 1 ELSE 0 ENDAS c03,

    Sum(CASE [value] WHEN 13 THEN 1 ELSE 0 ENDAS c13,

    Sum(CASE [value] WHEN 15 THEN 1 ELSE 0 ENDAS c15,

    Sum(CASE [value] WHEN 19 THEN 1 ELSE 0 ENDAS c19,

    Sum(CASE [value] WHEN 24 THEN 1 ELSE 0 ENDAS c24,

    Sum(CASE [value] WHEN 33 THEN 1 ELSE 0 ENDAS c33

  FROM CALL TileGeomToValues([Land_cover], @g)

 ) END;

-- put values for each geom into temp table

SELECT mfd_id, SPLIT CALL ValuesForGeom(CoordConvert(@conv, [Geom]))

INTO [properties table temp]

FROM [properties table]

THREADS SystemCpuCount() BATCH 1;

-- prepare to join

ALTER TABLE [properties table temp] (ADD INDEX [mfd_id_x] BTREE ([mfd_id]));

-- move values to original table

UPDATE (

  SELECT t.mfd_id,

    v03, v13, v15, v19, v24, v33,

    c03, c13, c15, c19, c24, c33

  FROM [properties table] AS t INNER JOIN [properties table temp] AS s

    ON s.[mfd_id] = t.[mfd_id]

)

SET v03 = c03, v13 = c13, v15 = c15, v19 = c19, v24 = c24, v33 = c33;

-- remove temporary table

DROP TABLE [properties table temp];

That query runs in 66 seconds. I don't understand why it is slower than the second query, but I suspect that the SUM(CASE... constructions cost way more time than gained by running TileGeomToValues() once.

Other avenues to increase speed come to mind:

1) Chop up the raster into four pieces (copy and paste the image and adjust the rects) and adapt the query to use all four. That may provide greater opportunities for parallelization, depending on the layout of big geoms compared to boundaries between chopped up rasters. For example, if a big geom falls entirely with the same piece of a chopped up raster there will be no difference. Could be worth a try.

2) TileGeomToValues() is a brute force instrument in that it gathers all values for all pixels when you're only interested in some values. It would be good if Manifold had a function, like the other aggregate statistics functions for rasters, that could provide a count of all pixels with a given value, like 3 or 24 or whatever. That could be dramatically faster.

3) A hybrid of 2) and 1) above implemented internally within Manifold that applied specific functions as in 2) to individual tiles falling within geoms, sort of like 1) but not chopping up the raster but instead using the existing organization into tiles. That could provide many opportunities for parallelization.

I've submitted this as a case study to Engineering, since 2) and 3) are something Manifold must do internally. They are interesting ideas that are very close to what is already in there.

To summarize:

An initial improvement, the second query above, cuts time from 86 seconds to 28 seconds. The query is not instantaneous because there is a huge amount of work, mainly within two large geoms that each involve millions of records. I wouldn't be surprised if Arc took ten times as long.

It's impressive that TileGeomToValues() works as well as it does being called six times, as in the original query and in the second query above. But that clearly could be improved if Manifold added an aggregate to count up pixels with a given value that could be used instead of TileGeomToValues(). That would drop time significantly, probably to only a few seconds. The third idea above might also drop it even more.

I don't know why the third query runs slower. But that's a question for engineering people who know what they're doing. I know they like digging into such things so I've sent it in. :-)

Ah... one last thing: 9 and Viewer run all three queries with the same timings, as expected.

jsperr83 post(s)
#26-Jul-20 15:20

I have similar timings when I run the queries. (Hardware details are in my user profile.)

The properties table appears to be a real mess. It has 301 records for what looks to be 50 parcels. A lot of them appear to be duplicates or a parcel that is displaced from another of the same dimensions. Cleaning up the data might make things run a lot faster -- I might try that at some point.

Control Clicking through the drawing to select a parcel leads to some pretty bizarre results. Parcel selection by drawing a "contained within" box (control-shift-drag-release-drag) will highlight some of the problems in the table for a given parcel.

2020-07-26 09:26:47 -- Save: C:\Users\User\Downloads\IanSchelly.map (0.017 sec)

2020-07-26 09:28:28 Render: [Map] (0.626 sec)

2020-07-26 09:31:54 -- Query: [Query] (97.446 sec)

2020-07-26 09:34:34 -- Query: [Query 2] (29.085 sec)

2020-07-26 09:41:27 -- Query: [Query 3] (74.613 sec)

-- z/34 --

IanSchelly12 post(s)
#26-Jul-20 22:36

I know the property data is a mess with lots of duplicates but for reasons to long to type hear I need to keep them all. But in short these are self declared property boundaries and duplicate boundaries registered under different names are something we are looking into.

I'm using a Dell Precision T7910 workstation with Intel Xeon CPU E5-2630 v3 @ 2.4GHz with 32GB of installed memory. It's got 8 cores and 16 logical processors.

Running the 3 query's as written by Dirmiti I get these times

[Query] (85.732 sec)

[Query 2] (30.186 sec)

[Query 3] (65.021 sec)

I just feel like I'm going insane because when I was testing my code on Viewer, I was processing the data you ran in under a second. When I ran my full data set with 1.4 million properties on a much larger raster it was processing at just over 10,000 records a second and that took just over 25 minutes. I had to have my IT person purchase and install Release 9 and it just doesn't run the same so it makes me feel like they either did something wrong during installation or some of the other updates she installed are interfering in some way. Before when I ran my code it would use close to 99% of my CPU power, now its donw to 30%. But me and you have similar computer specs and are getting similar processing times so this all just leaves me dumbfounded.

jsperr83 post(s)
#27-Jul-20 13:58

We now have three nearly identical timing results from three comparable machines running Dimitri's three queries. I don't think there is anything wrong with your installation.

I looked at the records where things slow down (add 1 to the record number displayed on the screen, select it in the properties database and view the drawing) and they seem to be the larger and more complex properties. Perhaps when you first ran your query on the full database it encountered a slew of smaller records that it ripped right through.

I have a mxb file on my website that is nearly 900 MB in size and contains 14.5 million NYC taxi records. Download it if you like, open it, delete the sumtable, fire up task manager and resource monitor, and launch the query to rebuild the sumtable. The initial read is about 325,000 records per second and it takes about 225 seconds total to rebuild sumtable. Resource monitor shows 11 threads (set in the query) fully saturated.

Please don't get discouraged -- it is great to see so many new names here participating in the forum -- learning, contributing, and bringing fresh perspectives. People with enormous knowledge and experience contribute here freely and generously and have pointed me in the right direction many times.

IanSchelly12 post(s)
#27-Jul-20 15:13

Thanks for looking into this! After spending way to many hours trying to trouble shoot this I realized I made an huge oversight error. I learned the basics of Manifold on Viewer and was running basic test queries in order to compare results to ArcMap and was also logging the times, about 10,000 records a second. I wrote my final query that we all have been running, ran it while doing other work and came back to check the results later without looking at the time. It took my IT department about 3 weeks to by Manifold so when it finally got installed and I ran the full data set with my final query it was running so slow compared to what I remember. I checked my logs that I didn't note well and assumed they were for my final query. This morning I had the epiphany the time logs were using the simple query that just counts number of total pixels with no regards to the value, (basically get the area of each property via pixel count). I ran this query and the times matched what was in my logs! So the big take away is I'm a moron and I need to take more detailed notes.

But because of all of this and with Dimitri's help making my code more efficient, my code now runs much faster than it would have without asking this.

Thanks everyone!

tjhb

9,320 post(s)
#24-Jul-20 09:54

[Overlapped with Dimitri's post above, which I will read and digest.]

So far just reading the test_query...

It is very inefficient, both in the FUNCTION ValueForGeom(), and in the UPDATE statement.

There is no point in doing multiple calls (here 6) to CoordConvert() for each identical Geom in the UPDATE statement, and no point in calling TileGeomToValues() multiple times for the same converted Geom inside the function.

At a basic level, you're doing at least 12 really difficult things here, each time you need to do just one.

If you had a more efficient approach when using Viewer, then I would almost bet my house that the code was very different from this.

This code needs rewriting completely, following a clear restatement of its purpose (you don't say what that is).

That is the problem.

IanSchelly12 post(s)
#24-Jul-20 18:39

I'm going to be reading though all of Dimitri's thread in detail but just to answer a few quick questions I saw when I glanced through these replies.

I was/am using Viewer 9.0.172 and just bought Manifold System 9.0.172.3 and have been running my original code on both. The purpose of this code is to get the pixel counts of each unique raster value per property, so I can calculate the area of each land cover type for every property. The end result are new fields are added to the property polygons for each unique value and then are attributed with the pixel count of each value in each property polygon.

While my code may be really inefficient as I'm just learning all of this, it ran extremely fast on both my test data, which I provided here, and the entire data sets which are much larger when I first had Viewer installed. After Manifold System 9.0.172.3 was installed and I ran the same code as before it ran extremely slow. And now when I run it on view its running slow as well. So my initial through was something happened during the install that is not allowing Manifold to harness all the power of my computer.

So in short. Manifold view ran super fast, I bought and installedManifold System 9.0.172.3 and now both run slow.

Dimitri

6,104 post(s)
#25-Jul-20 09:43

So my initial through was something happened during the install that is not allowing Manifold to harness all the power of my computer.

It's easy to take the first step towards testing that proposition. On a virgin machine, try first 9.0.172 and then 9.0.172.3 to see what happens, using identically the same data set (I've attached it to make such a test easy). If on a virgin machine both 172 and 172.3 go fast, and then on your production machine exactly the same code using exactly the same project goes slow, then it's time to look at what are the differences in the two machines.

Build 9.0.172 is the current "classic" build for both Viewer and 9. You can download it and install it, either as a Windows Installer installation or as a portable installation, on a completely fresh machine that's never had Manifold installed on it before.

Build 9.0.172.3 is the current "cutting edge" build for both Viewer and 9. It likewise can be downloaded and installed, but only as a portable installation.

I recommend using the portable installation in all test cases, so you can be absolutely sure there is nothing about Manifold's code that is causing differences. Just for the heck of it, I compared a Windows Installer 172 against a portable 172.3, with no difference. So for convenience I'd just use the portable installation throughout.

Attached to this post is an .mxb project that contains your test data and the three queries I published earlier, the first of which is the test query you posted. Use that project for all tests so in all cases you are making an apples to apples comparison.

Test all three queries running Viewer using build 9.0.172on a completely virgin machine and note the timings. To prove to yourself it is not a difference between Viewer and Release 9, download Release 9 for build 9.0.172 and try that on the same machine and note the timings (for the first 30 days of a serial number you don't have to activate it, so you can try your 9 license on other machines during that time without using up an activation key).

What you'll see is that timings are the same for build 9.0.172 for Viewer and 9. That's expected, given that Viewer is 9. Timings should be identical give or take very slight effects from how Windows might be caching disk access, an effect that will be minimized if you use SSD.

Next, repeat the above on exactly the same machine using exactly the same project, but this time download and launch build 9.0.172.3 versions of Viewer and 9. You'll see the timings are the same, also as expected. Why? Because nothing between 9.0.172 and 9.0.172.3 changed in the TileGeomToValues function or in other parts of the infrastructure that the queries use.

The tests above are simple and easy for anybody on this thread to repeat. They fairly certainly prove the problem is not a difference between Viewer and 9, and the problem is not a difference between 9.0.172 and 9.0.172.3. Don't waste any time chasing that as a possibility.

It seems pretty clear there was something else, other than the builds used, that was different between testing reported "for 9.0.172" and for "after 9.0.172.3 was installed".

What are such things that might be different? It's a matter of details. It seems unlikely that somehow Windows changed in a very fundamental way, just by happenstance, when 9.0.172.3 was launched. After all, there is no "installation" of 172.3. It's a portable installation, so there's no "installation." It's just unzipping a zip file and then launching a .exe. There's nothing at all about that which in any way changes Windows. So it seems that the changes before and after, are about something else. Some ideas:

1. Different queries were run. I can adjust the three queries with an almost imperceptible change so they'll run instantly, albeit not quite right. If you don't notice the change, you might think the difference between running the queries with that change in 172 and running them as published in 172.3 is installing 172.3. Nope. It's the small change in the query.

2. Different data sets were used. Run the queries using a Properties drawing that has few or none of really big areas and it goes much faster. You can simulate this by adjusting Query 2 in the example project I've attached to change the line

FROM [Properties Table] THREADS SystemCpuCount() BATCH 1

to

FROM [Properties Table] WHERE GeomCoordCount([Geom]) < 200 THREADS SystemCpuCount() BATCH 1

That restricts the operation of the query to exclude the big areas.

3. Something in Windows or in third party software (antivirus software, etc) is interfering with execution, perhaps by checking temp files or some executables on the fly for malware. Such packages can decide one executable is OK and another is not.

4. Something else that's a difference, such as launching 172 from SSD with the project in SSD and launching 172.3 and the project from slow storage connected over a slow network.

If you try the sample project with both 172 and 172.3 and you get around 20 to 30 seconds execution time for Query 2, that's expected performance on a typical machine running a not too fast (ie SATA connected) SSD. If you have a machine with the same hardware and Windows software that runs the sample project queries significantly faster than using 172 or 172.3, then that's a very big surprise and worth looking at in considerable detail.

By the way, just to try out the impact of threads, running Query 2 in the sample project on a Threadripper with 48 threads took 19.5 seconds. That's slightly better than expected given it is only two big areas in the sample project that take most of the time.

Attachments:
IanSchelly.mxb

IanSchelly12 post(s)
#27-Jul-20 00:04

Thanks once again Dimitri, I've been working from home because of COVID and been accessing my production computer remotely but I'll trying these tests out on a virgin machine tomorrow and hopefully get get to the bottom of this.

IanSchelly12 post(s)
#27-Jul-20 15:17

I got it figured out Dimitri. You can read my detailed reply above to user tjhb, but in short I testing a simpler query a month ago and when finally got M9 installed and ran the complex code in this thread I assumed my previous timed results were for the complex code since a few weeks had passed since I last tested it on Viewer. But all was not lost! Thanks for helping to adjust my code because it run much faster than what I passed along to start this thread. Thanks a lot for your help!

Dimitri

6,104 post(s)
#27-Jul-20 19:24

My pleasure. Don't beat yourself up for an oversight. That's easy to do and you're just getting started.

I think we all can look forward to it getting faster yet. Because of the discussion in this thread a good idea came up in terms of a useful function that could be added, to get counts by specific value, instead of getting that using a less targeted function. I feel confident that will get added sooner or later and then this workflow and related tasks will get significantly faster.

That's the way it should be: real life work probing strengths and weaknesses and people working together to evolve better solutions. :-)

Manifold User Community Use Agreement Copyright (C) 2007-2019 Manifold Software Limited. All rights reserved.