Subscribe to this thread
Home - General / All posts - applying filtering/selection within GeomOverlayTopologyIdentityPar @drawing, @overlay
yves61
205 post(s)
#22-Feb-21 10:06

I have a PARCELS drawing with approx 1 million records and a BUILDINGS drawing with also approximately 1 million records.

I am trying to achieve to cut the BUILDINGS drawing by using the PARCELS drawing as a cookie cutter.

First , to be clear I know how to do this using the TRANSFORM – OVERLAY method .

I am successful with this on my computer on smaller scales : say for number of SOURCE (Buildings) records times number of OVERLAY (Parcels) records = 62.500.000.000

Larger than this scale, - on my computer - it takes hours to accomplish up to no results, and being less efficient too .

So my aim now is to scale down the PARCELS and BUILDINGS drawings by filtering/selecting ona common gemNISCODE field value. But I am struggling on how and where to integrate this filtering on this gemNISCODE field value within the TRANSFORM - OVERLAY query .

On top I would also like to loop through all gemNISCODE values within the BUILDINGS and PARCELS drawing. Iam not quite sure if I need to use a EXECUTE WITH or using a script.

Any help of course is much appreciated.

The auto-generated code so far is below

-- $manifold$

--

-- Auto-generated

--

-- Overlay

-- Layer: Gbg40000

-- Field: Geom

-- Operation: identity

-- Overlay: Adp40000

-- Overlay selection only: FALSE

-- Overlay fields: adp_{name}

-- Result: (new table)

-- Result type: geom

-- New drawing: Gbg_cutBy_Adp

-- New table: Gbg_cutBy_Adp Table

-- Resources: all CPU cores, all GPU cores

-- Transform selection only: FALSE

--

-- prepare begin

CREATE TABLE [Gbg_cutBy_Adp Table] (

[mfd_id] INT64,

[OIDN] INT32,

[UIDN] INT32,

[gemNISCODE] NVARCHAR,

[adp_mfd_id] INT64,

[adp_OIDN] INT32,

[adp_UIDN] INT32,

[adp_gemNISCODE] NVARCHAR,

[Geom] GEOM,

INDEX [mfd_id_x] BTREE ([mfd_id]),

INDEX [Geom_x] RTREE ([Geom]),

PROPERTY 'FieldCoordSystem.Geom' ComponentFieldCoordSystem([Gbg40000 Table], 'Geom')

);

CREATE DRAWING [Gbg_cutBy_Adp] (

PROPERTY 'Table' '[Gbg_cutBy_Adp Table]',

PROPERTY 'FieldGeom' 'Geom'

);

-- prepare end

VALUE @drawing TABLE = CALL ComponentFieldDrawing([Gbg40000 Table], 'Geom');

VALUE @overlay TABLE = [Adp40000];

DELETE FROM [Gbg_cutBy_Adp Table];

INSERT INTO [Gbg_cutBy_Adp Table] (

[OIDN], [UIDN], [gemNISCODE],[adp_mfd_id], [adp_OIDN], [adp_UIDN], [adp_gemNISCODE],

[Geom]

) SELECT

[s_OIDN], [s_UIDN], [s_gemNISCODE], [o_mfd_id], [o_OIDN], [o_UIDN], [o_gemNISCODE],

[s_Geom]

FROM CALL GeomOverlayTopologyIdentityPar(@drawing, @overlay, 0, ThreadConfig(SystemCpuCount()));

This code is straightforward working on smaller number of records but for my 1 million to 1 million record Source and Overlay drawings it takes too much time on my computer and is less efficient as the size increases.

tjhb

9,550 post(s)
#22-Feb-21 11:40

I am trying to achieve to cut the BUILDINGS drawing by using the PARCELS drawing as a cookie cutter.

Are you sure? This seems backwards. It implies that, in general, buildings are larger than parcels, or that buildings usually straddle multiple parcels.

I would expect the opposite situation, and the opposite procedure.

Anyway, what is the underlying purpose? That is what matters.

yves61
205 post(s)
#22-Feb-21 13:33

Yes indeed . Within the bunch of 1.000.000 records some 300.000 buildings straddle multiple parcels.

I'd like to calculate the percentage of each (part of) building (straddling or not) on the parcels.

I'd like to calculate the footprint percentage of each building ( or part of straddling building) on each parcel.

Parcels may have multiple buildings (straddling or not).

So I thought I need to use GeomOverlayTopologyIdentityPar in order to identify each (part of) building (straddling or not) on each parcel . Then sum each (part of) building that falls within each parcel and do the arthmetic.

But in the end my computer's memory is not large enough to tackle the whole 1 million x 1 million records in one batch too.

Any ideas ? How to optimize ?

Dimitri


6,436 post(s)
#22-Feb-21 14:21

I hate to guess about this stuff, so I'll just ask: could you tell us a bit about your system (hardware and software)? This is a big job, so it probably gets into territory where hardware (what CPU? How much RAM? What kind of storage? (local? shared drive on a remote system? hard disk? SSD? etc) and software (64 bit windows? Windows 10?) matter.

yves61
205 post(s)
#22-Feb-21 17:02

I am running this on a local laptop computer.

Meanwhile I have been working on this I made some renaming and changes to the initial coding, which probably will read easier.

******** SATRT CODE ************

-- $manifold$

--

-- Auto-generated

--

-- Overlay

-- Layer: Buildings

-- Field: Geom

-- Operation: identity

-- Overlay: Parcels

-- Overlay selection only: FALSE

-- Overlay fields: parcels_{name}

-- Result: (new table)

-- Result type: geom

-- New drawing: Buildings_cookieCut_Parcels

-- New table: Buildings_cookieCut_Parcels Table

-- Resources: all CPU cores, all GPU cores

-- Transform selection only: FALSE

--

-- prepare begin

CREATE TABLE [Buildings_cookieCut_Parcels Table] (

[mfd_id] INT64,

[OIDN] INT32,

[UIDN] INT32,

[gemNISCODE] NVARCHAR,

[gemNAAM] NVARCHAR,

[calcArea] FLOAT64,

[parcels_mfd_id] INT64,

[parcels_UIDN] INT32,

[parcels_OIDN] INT32,

[parcels_calcArea] FLOAT64,

[calcArea_CutBldg] FLOAT64,

[pctCutBldgonSourceBldg] FLOAT64,

[pctCutBldgonParcel] FLOAT64,

[Geom] GEOM,

INDEX [mfd_id_x] BTREE ([mfd_id]),

INDEX [Geom_x] RTREE ([Geom]),

PROPERTY 'FieldCoordSystem.Geom' ComponentFieldCoordSystem([Buildings Table], 'Geom')

);

CREATE DRAWING [Buildings_cookieCut_Parcels] (

PROPERTY 'Table' '[Buildings_cookieCut_Parcels Table]',

PROPERTY 'FieldGeom' 'Geom'

);

-- prepare end

VALUE @drawing TABLE = CALL ComponentFieldDrawing([Buildings Table], 'Geom');

VALUE @overlay TABLE = [Parcels];

DELETE FROM [Buildings_cookieCut_Parcels Table];

INSERT INTO [Buildings_cookieCut_Parcels Table] (

[OIDN], [UIDN], [gemNISCODE], [gemNAAM], [calcArea], [parcels_mfd_id], [parcels_UIDN], [parcels_OIDN], [parcels_calcArea], [calcArea_CutBldg], [pctCutBldgonSourceBldg],[pctCutBldgonParcel],

[Geom]

) SELECT

[s_OIDN], [s_UIDN],[s_gemNISCODE], [s_gemNAAM], [s_calcArea], [o_mfd_id], [o_UIDN], [o_OIDN], [o_calcArea], geomArea([s_geom],0), geomArea([s_geom],0)/[s_calcArea], geomArea([s_geom],0)/[o_calcArea],

[s_Geom]

FROM CALL GeomOverlayTopologyIdentityPar(@drawing, @overlay, 0, ThreadConfig(SystemCpuCount()))

-- WHERE [s_gemNISCODE] = 11001

******** END CODE ************

Now with the above code I cookie cut the Buildings drawing with the Parcels drawing, get the individual area for each (part of) building , calculate the percent ratio for each (part of) building versus the original building area, and likewise versus the parcel area.

This is running well. I tested adding a WHERE clause which runs well.

Two more aims I would like to achieve :

1. As the million Buildings records x million Parcels records is too large for my computer's memory to handle swiftly, I aim to filter the Buildings and Parcels drawing tables on a common field : gemNISCODE

Have a loop over all gemNiscode values and execute the above Transform-Overlay as a sub-routine

How could I put this into a script or EXECUTE from another query passing parameters for the gemNISCODE values ?

2. Calculate the final footprint ratio of all the (parts of) buildings for each Parcel they are contained in. I suppose that a mere

Select parcels_mfd_id , SUM([pctCutBldgonParcel]) as footprintratio from [Buildings_cookieCut_Parcels Table]GROUP BY [parcels_mfd_id] will do.

As usual , any help is welcome.

Dimitri


6,436 post(s)
#22-Feb-21 18:33

What is your disk drive?

yves61
205 post(s)
#22-Feb-21 18:49

Dimitri

I’ve got a local C: drive / hosting applications

52,7 GB free space on a total of 184 GB

I’ve got a local D: drive / hosting data

430 GB free space on a total of 931 GB

oeaulong

304 post(s)
#22-Feb-21 19:00

And where is your TEMP folder located?

yves61
205 post(s)
#22-Feb-21 19:52

Both TMP and TEMP on C:\Users\...\AppData\Local\Temp

The TEMP is using 486 MB.

tjhb

9,550 post(s)
#22-Feb-21 20:05

Yes, this is nuts.

You don't have enough TEMP space for a large project.

184GB is normally enough for the Windows OS and applications, and possibly the swap file too (even with 32GB RAM), but squeezing TEMP into that space as well is just not right if you are doing serious work.

I'm not sure if that's the problem, but it stands out.

NB it can be hard to tell how much TEMP space is actually in use, especially in heavy use. Windows doesn't always measure this accurately when it is busy.

yves61
205 post(s)
#22-Feb-21 21:03

Edited: 22:00

On running this code on a 1 million building records x 1 million parcels records the TEMP folder went up to 3,64 GB

tjhb

9,550 post(s)
#22-Feb-21 21:19

Measured how? It can be hard to measure.

How big is the Manifold project, if you save the building records and parcel records together, prior to processing?

You need more room, even if only in principle!

Looking further ahead, I am wondering if this is a case where a "manual" SQL join can make a better job than a built-in overlay which is partly blind.

We have many helpful functions to deploy when we know the data.

Of course you could post sample data for experiment.

yves61
205 post(s)
#24-Feb-21 12:16

will reply shortly with sample data.

As suggested please find attached some sample data The file size has been extremely reduced, just for example purpose.

Please have a look at Comments , and also inline comments in qry1_

As always, to all members on this forum, many thanks in advance for suggestions .

Attachments:
M9_example.mxb

tjhb

9,550 post(s)
#24-Feb-21 20:22

Thanks! That is great.

Will try some things out--bearing in mind what you say, it is a small subset.

dchall8
808 post(s)
#23-Feb-21 06:21

If you broke it into 4 pieces, will it run?

yves61
205 post(s)
#23-Feb-21 08:02

On my laptop computer, with the present settings, it runs fine up to 250.000 records x 250.000 records within approx 15 minutes.

I see I could downsize the @drawing (Buildings) and @overlay (Parcels) tables by filtering these tables on a joint locality field [gemNiscode] and use a select INTO query that will create new (filtered) tables for Buildings and Parcels . There are 300 distinct [gemNISCODES] .

So I understand I will probably need to write a script and need to LOOP over all distinct [gemNiscodes] instances and execute the TRANSFORM-OVERLAY as a sub query.

Or are there other possibilities ?

Could anyone share some code for helping me further in the right direction ?

I am not dextrous in scripting and programming.

Dimitri


6,436 post(s)
#23-Feb-21 12:18

My gut feel is that this is simply a large job that is very disk intensive and your computer has a slow disk setup.

Usually in a hard disk based system when things go fast up to a critical amount of data and then they start going slow, that's a sign the Windows system has transitioned from all jobs fitting into memory into swapping to disk, which is very much slower.

When the job fits into available RAM with no need to swap out to disk it goes fast. When the job gets larger than what fits easily into whatever RAM memory Windows leaves for it, then there's a lot of thrashing to slow disk.

The simplest way to make it faster is to replace your conventional hard disk with an SSD. Then, when Windows starts swapping to "disk" the storage is very much faster so there is less impact. You should be using SSD anyway, and there are plenty of inexpensive SATA interface SSDs that are designed to replace laptop hard disks. It's remarkably easy to do and there are free utilities for moving your system into a new, bootable SSD that replaces the original hard disk.

I could be overlooking something simple, so maybe I'll think of something more useful than the above.

Manifold User Community Use Agreement Copyright (C) 2007-2019 Manifold Software Limited. All rights reserved.