Subscribe to this thread
Home - General / All posts - Problem with Dissolve
pslinder1
220 post(s)
#10-Feb-21 01:09

I have a drawing with 3M+Polygons and I am trying to do a dissolve transform on it and grouping the results by 15 different values. Manifold 9 has been running for over 5 hours. Is there a problem?

I am using a Xeon computer with 24 cores and Manifold is only using about 2% of the total CPU power.

tjhb

9,550 post(s)
#10-Feb-21 05:47

Is there a problem?

Why are you trying to dissolve on 15 different values?

Better: create a composite value combining those 15, then make a BTREE index [correction: BTREEDUP or BTREEDUPNULL] on the result, then dissolve on that.

~Instantaneous?

pslinder1
220 post(s)
#10-Feb-21 16:18

I am not sure what you mean by a composite value? Currently I have a field in the drawings table that has only 15 unique values and I am trying to dissolve each area into one of those 15. How can I use a "composite value" to achieve the same ends and make it go faster? I have never used the indexes you mentioned. Could you spell it out a little more for a noob?

apo
136 post(s)
#10-Feb-21 17:08

just a question to better understand one dimension of your problem. The 15 values are refering to the same variable (column) or 15 variables ?

tjhb

9,550 post(s)
#10-Feb-21 17:57

Thanks apo, you nailed my misunderstanding exactly! My fault only, perfectly clear now.

Might need further discussion, but the first thing is to get a BTREEDUP index (or BTREEDUPNULL, if there might be null values) on the field being used for the dissolve. Building that index will take time, but it will be time well spent.

I would also run GeomNormalize before attempting to combine geometry.

Then either a dissolve, or perhaps better, a GeomUnion with grouping with manual control of threads (and even, in this case, batches).

pslinder1
220 post(s)
#10-Feb-21 20:01

Why do you think a BTREEDUP index will help? There are no nulls in the GEOM field.

apo
136 post(s)
#10-Feb-21 20:05

just because you GROUP BY and indexes help a lot and even more than a lot. If you have no NULL then the BTREEDUP is the good one because with 3M of records and 15 values you will have duplicates

tjhb

9,550 post(s)
#10-Feb-21 20:25

As apo said. There are no nulls -> use BTREEDUP not BTREEDUPNULL. You misread.

Substituting then: because then the SQL engine will know that there is a finite number of groups, and moreover what group each record is in.

That makes all the difference in the world.

Try it.

pslinder1
220 post(s)
#10-Feb-21 19:35

Just one field with only 1 of 15 possible values for each record.

apo
136 post(s)
#10-Feb-21 19:53

then I would follow the advices of tjhb first setting an index and then use a query using the GeomUnionAreas.

The number of polygons is one this, their complexity is another one. How complex are your polygons or could you describe those a bit more?

pslinder1
220 post(s)
#10-Feb-21 20:04

The polygons were created from a land use raster (15x15 meters) using the trace function in Manifold. IT is very large covering 900,000 square kilometers.

apo
136 post(s)
#10-Feb-21 20:11

then the complexity of the shapes might be one problem. I'm testing that on my side but one guess is that it might be a good idea to split your 3M down into convex parts and combine those after. Sounds nut but it might be easier to couple simple tasks on a greater number of shapes than trying to merge complex shapes. I had this issue few years ago in M8 and this two steps approach did the trick.

As said I'm testing that right now on a dataset, so just a guess

tjhb

9,550 post(s)
#10-Feb-21 20:15

Then can we discuss why you haven't tried our suggestions.

First, add an index (as above).

Secondly, normalize geometry (not topolopy).

Thirdly, use a Union query, with multiple threads (and possibly large clusters).

Just try it, and report.

pslinder1
220 post(s)
#10-Feb-21 20:33

Thank you both for the advise. I am in the process of creating the index. Curious about the dissolve does it not use multiple threads is that why Union is better?

apo
136 post(s)
#10-Feb-21 20:42

if you have a look to the area (dissolve) function details (select the view query button) it is an alias to using the GeomUnionAreas function.

apo
136 post(s)
#10-Feb-21 22:06

having my test done part of the problem which will remains with the index even if this improves a lot the calculation time is due to the dimensionality of your problem.

Trying to dissolve a large number of polygons in a restricted number of classes make you ask the system to geometrically compare and union a large number of pairs.

My test on a 10Mio triangles to be merged in 16 classes took 100 minutes but merging those in 4'000x16 classes and then in 16 classes took me 7 minutes. I would suggest you first merge those for example by region and class and then by class only. This way you reduce the dimensionality of the pairing operations. I would name that a hierarchical merge.

pslinder1
220 post(s)
#10-Feb-21 23:33

Thanks a lot for the insight and the help. That makes a lot of sense.

tjhb

9,550 post(s)
#11-Feb-21 00:39

Remember to report because this is a forum.

pslinder1
220 post(s)
#12-Feb-21 22:37

I ended up breaking the data set into 40 separate files and running the operations on each. That proved fairly fast.

On the single large dataset creating the indexes did not seem to improve anything. Same with normalizing the geometry. I did continue to use the dissolve function in the template instead of the Union Query; so that could still be the problem. I did not notice the when breaking up the datasets that if I had a .map file (I created a separate project for each chunk) over 1 GB that it became interminably slow. Under 1GB and things were fairly speedy.

Dimitri


6,436 post(s)
online
#15-Feb-21 08:56

I did not notice the when breaking up the datasets that if I had a .map file (I created a separate project for each chunk) over 1 GB that it became interminably slow. Under 1GB and things were fairly speedy.

There's nothing special about 1 GB being a magic boundary for Manifold, but it could be that the way your system is set up going above that results in the disk thrashing. You don't say what you're doing for disk, how that's organized in terms of page files and such, how much ram, what else is running (40 tabs open in Google chrome?) and so on.

I ended up breaking the data set into 40 separate files and running the operations on each. That proved fairly fast.

That seems incredibly unnecessary, like there absolutely has to be a way in 9 not to have to do that. My strong impression is that something basic has been overlooked, or the query being used is ordering the system to do something very inefficiently, or some other detail. Weird performance issues are often all about the details of what's being done. To help out, we need to know all details about your data and what you're doing.

Could you post your starting data, a description of what you want to accomplish, and the query you are currently using? That will eliminate guesswork and will allow everyone to provide specific advice on how to do what you want fast.

Manifold User Community Use Agreement Copyright (C) 2007-2019 Manifold Software Limited. All rights reserved.