Subscribe to this thread
Home - General / All posts - applying filtering/selection within GeomOverlayTopologyIdentityPar @drawing, @overlay
yves61
438 post(s)
#22-Feb-21 10:06

I have a PARCELS drawing with approx 1 million records and a BUILDINGS drawing with also approximately 1 million records.

I am trying to achieve to cut the BUILDINGS drawing by using the PARCELS drawing as a cookie cutter.

First , to be clear I know how to do this using the TRANSFORM – OVERLAY method .

I am successful with this on my computer on smaller scales : say for number of SOURCE (Buildings) records times number of OVERLAY (Parcels) records = 62.500.000.000

Larger than this scale, - on my computer - it takes hours to accomplish up to no results, and being less efficient too .

So my aim now is to scale down the PARCELS and BUILDINGS drawings by filtering/selecting ona common gemNISCODE field value. But I am struggling on how and where to integrate this filtering on this gemNISCODE field value within the TRANSFORM - OVERLAY query .

On top I would also like to loop through all gemNISCODE values within the BUILDINGS and PARCELS drawing. Iam not quite sure if I need to use a EXECUTE WITH or using a script.

Any help of course is much appreciated.

The auto-generated code so far is below

-- $manifold$

--

-- Auto-generated

--

-- Overlay

-- Layer: Gbg40000

-- Field: Geom

-- Operation: identity

-- Overlay: Adp40000

-- Overlay selection only: FALSE

-- Overlay fields: adp_{name}

-- Result: (new table)

-- Result type: geom

-- New drawing: Gbg_cutBy_Adp

-- New table: Gbg_cutBy_Adp Table

-- Resources: all CPU cores, all GPU cores

-- Transform selection only: FALSE

--

-- prepare begin

CREATE TABLE [Gbg_cutBy_Adp Table] (

[mfd_id] INT64,

[OIDN] INT32,

[UIDN] INT32,

[gemNISCODE] NVARCHAR,

[adp_mfd_id] INT64,

[adp_OIDN] INT32,

[adp_UIDN] INT32,

[adp_gemNISCODE] NVARCHAR,

[Geom] GEOM,

INDEX [mfd_id_x] BTREE ([mfd_id]),

INDEX [Geom_x] RTREE ([Geom]),

PROPERTY 'FieldCoordSystem.Geom' ComponentFieldCoordSystem([Gbg40000 Table], 'Geom')

);

CREATE DRAWING [Gbg_cutBy_Adp] (

PROPERTY 'Table' '[Gbg_cutBy_Adp Table]',

PROPERTY 'FieldGeom' 'Geom'

);

-- prepare end

VALUE @drawing TABLE = CALL ComponentFieldDrawing([Gbg40000 Table], 'Geom');

VALUE @overlay TABLE = [Adp40000];

DELETE FROM [Gbg_cutBy_Adp Table];

INSERT INTO [Gbg_cutBy_Adp Table] (

[OIDN], [UIDN], [gemNISCODE],[adp_mfd_id], [adp_OIDN], [adp_UIDN], [adp_gemNISCODE],

[Geom]

) SELECT

[s_OIDN], [s_UIDN], [s_gemNISCODE], [o_mfd_id], [o_OIDN], [o_UIDN], [o_gemNISCODE],

[s_Geom]

FROM CALL GeomOverlayTopologyIdentityPar(@drawing, @overlay, 0, ThreadConfig(SystemCpuCount()));

This code is straightforward working on smaller number of records but for my 1 million to 1 million record Source and Overlay drawings it takes too much time on my computer and is less efficient as the size increases.

tjhb
10,094 post(s)
#22-Feb-21 11:40

I am trying to achieve to cut the BUILDINGS drawing by using the PARCELS drawing as a cookie cutter.

Are you sure? This seems backwards. It implies that, in general, buildings are larger than parcels, or that buildings usually straddle multiple parcels.

I would expect the opposite situation, and the opposite procedure.

Anyway, what is the underlying purpose? That is what matters.

yves61
438 post(s)
#22-Feb-21 13:33

Yes indeed . Within the bunch of 1.000.000 records some 300.000 buildings straddle multiple parcels.

I'd like to calculate the percentage of each (part of) building (straddling or not) on the parcels.

I'd like to calculate the footprint percentage of each building ( or part of straddling building) on each parcel.

Parcels may have multiple buildings (straddling or not).

So I thought I need to use GeomOverlayTopologyIdentityPar in order to identify each (part of) building (straddling or not) on each parcel . Then sum each (part of) building that falls within each parcel and do the arthmetic.

But in the end my computer's memory is not large enough to tackle the whole 1 million x 1 million records in one batch too.

Any ideas ? How to optimize ?

Dimitri


7,413 post(s)
#22-Feb-21 14:21

I hate to guess about this stuff, so I'll just ask: could you tell us a bit about your system (hardware and software)? This is a big job, so it probably gets into territory where hardware (what CPU? How much RAM? What kind of storage? (local? shared drive on a remote system? hard disk? SSD? etc) and software (64 bit windows? Windows 10?) matter.

yves61
438 post(s)
#22-Feb-21 17:02

I am running this on a local laptop computer.

Meanwhile I have been working on this I made some renaming and changes to the initial coding, which probably will read easier.

******** SATRT CODE ************

-- $manifold$

--

-- Auto-generated

--

-- Overlay

-- Layer: Buildings

-- Field: Geom

-- Operation: identity

-- Overlay: Parcels

-- Overlay selection only: FALSE

-- Overlay fields: parcels_{name}

-- Result: (new table)

-- Result type: geom

-- New drawing: Buildings_cookieCut_Parcels

-- New table: Buildings_cookieCut_Parcels Table

-- Resources: all CPU cores, all GPU cores

-- Transform selection only: FALSE

--

-- prepare begin

CREATE TABLE [Buildings_cookieCut_Parcels Table] (

[mfd_id] INT64,

[OIDN] INT32,

[UIDN] INT32,

[gemNISCODE] NVARCHAR,

[gemNAAM] NVARCHAR,

[calcArea] FLOAT64,

[parcels_mfd_id] INT64,

[parcels_UIDN] INT32,

[parcels_OIDN] INT32,

[parcels_calcArea] FLOAT64,

[calcArea_CutBldg] FLOAT64,

[pctCutBldgonSourceBldg] FLOAT64,

[pctCutBldgonParcel] FLOAT64,

[Geom] GEOM,

INDEX [mfd_id_x] BTREE ([mfd_id]),

INDEX [Geom_x] RTREE ([Geom]),

PROPERTY 'FieldCoordSystem.Geom' ComponentFieldCoordSystem([Buildings Table], 'Geom')

);

CREATE DRAWING [Buildings_cookieCut_Parcels] (

PROPERTY 'Table' '[Buildings_cookieCut_Parcels Table]',

PROPERTY 'FieldGeom' 'Geom'

);

-- prepare end

VALUE @drawing TABLE = CALL ComponentFieldDrawing([Buildings Table], 'Geom');

VALUE @overlay TABLE = [Parcels];

DELETE FROM [Buildings_cookieCut_Parcels Table];

INSERT INTO [Buildings_cookieCut_Parcels Table] (

[OIDN], [UIDN], [gemNISCODE], [gemNAAM], [calcArea], [parcels_mfd_id], [parcels_UIDN], [parcels_OIDN], [parcels_calcArea], [calcArea_CutBldg], [pctCutBldgonSourceBldg],[pctCutBldgonParcel],

[Geom]

) SELECT

[s_OIDN], [s_UIDN],[s_gemNISCODE], [s_gemNAAM], [s_calcArea], [o_mfd_id], [o_UIDN], [o_OIDN], [o_calcArea], geomArea([s_geom],0), geomArea([s_geom],0)/[s_calcArea], geomArea([s_geom],0)/[o_calcArea],

[s_Geom]

FROM CALL GeomOverlayTopologyIdentityPar(@drawing, @overlay, 0, ThreadConfig(SystemCpuCount()))

-- WHERE [s_gemNISCODE] = 11001

******** END CODE ************

Now with the above code I cookie cut the Buildings drawing with the Parcels drawing, get the individual area for each (part of) building , calculate the percent ratio for each (part of) building versus the original building area, and likewise versus the parcel area.

This is running well. I tested adding a WHERE clause which runs well.

Two more aims I would like to achieve :

1. As the million Buildings records x million Parcels records is too large for my computer's memory to handle swiftly, I aim to filter the Buildings and Parcels drawing tables on a common field : gemNISCODE

Have a loop over all gemNiscode values and execute the above Transform-Overlay as a sub-routine

How could I put this into a script or EXECUTE from another query passing parameters for the gemNISCODE values ?

2. Calculate the final footprint ratio of all the (parts of) buildings for each Parcel they are contained in. I suppose that a mere

Select parcels_mfd_id , SUM([pctCutBldgonParcel]) as footprintratio from [Buildings_cookieCut_Parcels Table]GROUP BY [parcels_mfd_id] will do.

As usual , any help is welcome.

Dimitri


7,413 post(s)
#22-Feb-21 18:33

What is your disk drive?

yves61
438 post(s)
#22-Feb-21 18:49

Dimitri

I’ve got a local C: drive / hosting applications

52,7 GB free space on a total of 184 GB

I’ve got a local D: drive / hosting data

430 GB free space on a total of 931 GB

oeaulong

521 post(s)
#22-Feb-21 19:00

And where is your TEMP folder located?

yves61
438 post(s)
#22-Feb-21 19:52

Both TMP and TEMP on C:\Users\...\AppData\Local\Temp

The TEMP is using 486 MB.

tjhb
10,094 post(s)
#22-Feb-21 20:05

Yes, this is nuts.

You don't have enough TEMP space for a large project.

184GB is normally enough for the Windows OS and applications, and possibly the swap file too (even with 32GB RAM), but squeezing TEMP into that space as well is just not right if you are doing serious work.

I'm not sure if that's the problem, but it stands out.

NB it can be hard to tell how much TEMP space is actually in use, especially in heavy use. Windows doesn't always measure this accurately when it is busy.

StanNWT
196 post(s)
#26-Oct-21 20:43

Hi Tim,

I just encountered an issue with Manifold 9.0.175, it's not specific to a very, at least I don't think so. I've noticed that temp space that Manifold is using is in the path:

C:\Users\username\AppData\Local\Temp

I'd like it to be on a thunderbolt connected 4 drive Samsung 860 EVO SSD RAID 0 in my NAS, which has 3TB+ free space, as I only have 800GB free space on my Samsung 960 Pro 2TB <C:> drive and don't want to run out of space during large projects that get into half a TB in size. Is this a windows setting to change the user temp space on the <C:> drive or is there a command line setting I can use in Manifold for Manifold only to change to a different swap drive?

rk
621 post(s)
#26-Oct-21 20:50

TEMP variable

https://georeference.org/forum/t157446.3

yves61
438 post(s)
#22-Feb-21 21:03

Edited: 22:00

On running this code on a 1 million building records x 1 million parcels records the TEMP folder went up to 3,64 GB

tjhb
10,094 post(s)
#22-Feb-21 21:19

Measured how? It can be hard to measure.

How big is the Manifold project, if you save the building records and parcel records together, prior to processing?

You need more room, even if only in principle!

Looking further ahead, I am wondering if this is a case where a "manual" SQL join can make a better job than a built-in overlay which is partly blind.

We have many helpful functions to deploy when we know the data.

Of course you could post sample data for experiment.

yves61
438 post(s)
#24-Feb-21 12:16

will reply shortly with sample data.

As suggested please find attached some sample data The file size has been extremely reduced, just for example purpose.

Please have a look at Comments , and also inline comments in qry1_

As always, to all members on this forum, many thanks in advance for suggestions .

Attachments:
M9_example.mxb

tjhb
10,094 post(s)
#24-Feb-21 20:22

Thanks! That is great.

Will try some things out--bearing in mind what you say, it is a small subset.

yves61
438 post(s)
#04-Mar-21 09:48

To cope for (limited) computer memory and disk space - and in case I need to filter down the large original million object Source and Overlay drawings to smaller proportions

I found this working to my needs

a) for the Source

run a custom sql SELECT query1 on my million record table/drawing (as Source)

create a drawing2 based on this query1

reference that drawing2 in the Transform (overlay) pane

b)for the Overlay

run a custom sql SELECT query2 on my million record table/drawing (as Overlay)

create a drawing3 based on this query2

reference that drawing3 in the Transform (overlay) pane

c) hit Transform button or Edit query button if need be

All of the above can be written in one SQL query.

Dimitri


7,413 post(s)
#05-Mar-21 06:53

I respectfully note that when you do the above you're not filtering down the large, original, million object drawings to smaller proportions. The queries you write are still working with all of those objects on the fly. For example, a drawing created from a query that does a SELECT to pull a subset of records from a million records is not a static thing: it is running that SELECT query against the full million record table(s).

You're right that all the a, b, c parts of your procedure can be written in a single query. That single query also will be working on all of the many objects on the fly.

If doing the a, b, c procedure results in significantly faster performance than what you got before, that's an indication that the SQL you were using before perhaps was not the most performant approach, but that the SQL represented by the a, b, c procedure is more performant.

In such cases it pays to examine the differences between what was done before and in the new procedure, to see if what was done before could be slightly re-written for better performance.

dchall8
1,008 post(s)
#23-Feb-21 06:21

If you broke it into 4 pieces, will it run?

yves61
438 post(s)
#23-Feb-21 08:02

On my laptop computer, with the present settings, it runs fine up to 250.000 records x 250.000 records within approx 15 minutes.

I see I could downsize the @drawing (Buildings) and @overlay (Parcels) tables by filtering these tables on a joint locality field [gemNiscode] and use a select INTO query that will create new (filtered) tables for Buildings and Parcels . There are 300 distinct [gemNISCODES] .

So I understand I will probably need to write a script and need to LOOP over all distinct [gemNiscodes] instances and execute the TRANSFORM-OVERLAY as a sub query.

Or are there other possibilities ?

Could anyone share some code for helping me further in the right direction ?

I am not dextrous in scripting and programming.

Dimitri


7,413 post(s)
#23-Feb-21 12:18

My gut feel is that this is simply a large job that is very disk intensive and your computer has a slow disk setup.

Usually in a hard disk based system when things go fast up to a critical amount of data and then they start going slow, that's a sign the Windows system has transitioned from all jobs fitting into memory into swapping to disk, which is very much slower.

When the job fits into available RAM with no need to swap out to disk it goes fast. When the job gets larger than what fits easily into whatever RAM memory Windows leaves for it, then there's a lot of thrashing to slow disk.

The simplest way to make it faster is to replace your conventional hard disk with an SSD. Then, when Windows starts swapping to "disk" the storage is very much faster so there is less impact. You should be using SSD anyway, and there are plenty of inexpensive SATA interface SSDs that are designed to replace laptop hard disks. It's remarkably easy to do and there are free utilities for moving your system into a new, bootable SSD that replaces the original hard disk.

I could be overlooking something simple, so maybe I'll think of something more useful than the above.

Manifold User Community Use Agreement Copyright (C) 2007-2021 Manifold Software Limited. All rights reserved.