Subscribe to this thread
Home - General / All posts - Delete identical objects on overlapping shapefile
mapgraphics10 post(s)
#18-Jun-18 00:52

I have two shapefiles, each showing rivers and nothing else. The areas of the shapefiles overlap, so that some rivers appear on both shapefiles. The attributes on each overlapping object are identical on both overlapping shapefiles. How can I select (on one shapefile) only the rivers that appear on the other shapefile? (I have a nagging feeling there's something really simple that I'm missing here....)

Charles Syrett

Map Graphics

tjhb
10,094 post(s)
#18-Jun-18 01:15

In principle:

DELETE FROM [Drawing 1]

WHERE [Geom (I)] IN

    (SELECT [Geom (I)] FROM [Drawing 2])

;

Oops, you didn't want DELETE, but SELECT. So

UPDATE [Drawing 1]

SET [Selection (I)] = 

    [Geom (I)] IN 

    (SELECT [Geom (I)] FROM [Drawing 2])

;

This requires that common objects be identical in every respect. (It is harder if they don't, involving criteria and perhaps subjective judgements.)

LeRepère
153 post(s)
#18-Jun-18 01:20

As I understand it, the Except statement could also do the job.

<SELECT query> EXCEPT <query> - Return a table of all records in the <SELECT query> results but not in the <query> results.

SELECT Name FROM Invoices

  EXCEPT SELECT Name FROM Students;

Returns all names found in the Invoices table except those who are students.

tjhb
10,094 post(s)
#18-Jun-18 01:22

Quite right, if you only need a list.

mapgraphics10 post(s)
#18-Jun-18 17:49

Thanks for this. I'm unfamiliar with the geekier aspects of Manifold...I spend most of my time in a graphics environment! Is what you've written here a Query? I tried it as a Query, inserting the Drawing name, and for "Geom" I tried the column name, but I received an error message saying "Token expected". What am I missing here?

Charles Syrett

Map Graphics

Dimitri


7,413 post(s)
#19-Jun-18 09:04

Yes, it's a query. It's a query written for Release 8, where intrinsic fields have an (I) in their names. A Release 9 query would be different, since intrinsics are computed on the fly and, by tradition, the geometry field is named just plain Geom.

Are you using Release 8 or Release 9?

tjhb
10,094 post(s)
#19-Jun-18 23:06

Sorry for not having replied to this. I thought I had--but it was all in my head.

I had a question about this:

I tried it as a Query, inserting the Drawing name, and for "Geom" I tried the column name, but I received an error message saying "Token expected".

What is the column in question? I makes me think I may have misunderstood the objective. Going back...

The areas of the shapefiles overlap, so that some rivers appear on both shapefiles. The attributes on each overlapping object are identical on both overlapping shapefiles.

I understood this to mean that river geometry is identical between some objects in two (or more) drawings. And that where geometry is identical, other attributes also match.

If that is the case, then the query only needs to address the [Geom (I)] column--which contains geometry in Manifold 8 (usually just [Geom] in Manifold 9 as Dimitri says).

Other attributes/columns need not come into it, unless you want to verify that they also match (which is possible).

Bear in mind also that the name of the geometry column is "Geom (I)", including the "(I)" as Dimitri says, but in Manifold 8 SQL the "..." quotes are replaced with [...] quotes, required here because of the space within the column name.

tjhb
10,094 post(s)
#20-Jun-18 08:04

I think I get it now. Overlaps are typically partial, not complete.

But so, what is the objective? To clip, to combine...? (Not to delete, which would also remove parts not overlapping, I think.)

Sample data would really help.

mapgraphics10 post(s)
#21-Jun-18 21:37

The data is from the Canadian federal government. They've reorganized their topographic data so that it's grouped by scale, then province. I'm working with the 1: 250,000 provincial data.

Back in the day, this data was grouped by topographic map sheets with lat-long limits. But here's the thing: in reorganizing the data by provinces, they haven't used the provincial boundaries as boundaries for the data. Instead, the data edges still follow the old sheet edges, outside the provincial boundaries.

My area of interest is the two sides of the Ottawa River, which is the boundary between Quebec and Ontario. The Ontario data extends a little ways into Quebec, and vice versa. Hence the overlap.

Perfect world, I'd find a nice clean way to delete any objects within Ontario that fall into the Quebec data set, and vice versa. I did find a graphic way to do this in Global Mapper involving cropped exports, but it tied up the 'puter for hours.

Data can be found here: http://ftp.geogratis.gc.ca/pub/nrcan_rncan/vector/canvec/shp/Hydro/

Thanks!

adamw


10,447 post(s)
#25-Jun-18 16:26

I went to look into the data and deleting duplicates appears to be very easy because each object is tagged with what seems to be a globally unique ID.

Here is what I did:

Downloaded ZIP archives for ON and QC, unpacked both. Created a new MAP file. Imported all SHP files from the ON archive, moved created components into a folder named 'ON'. Imported all SHP files from the QC archive, moved created components into a folder named 'QC'. Opened 'watercourse_1' drawing (ON), I assume this is the data you are interested in. Dropped 'watercourse_1 2' drawing (QC) into the same window. Formatted one of the drawings to thick gray, the other to thin blue, saw something like this (the objects overlap):

(This is when I zoomed closer to the objects, then Alt-clicked those that seem to be on top of each other to see what fields they contain and noted that the values in 'feature_id' coincide and look like globally unique identifiers.)

Opened a command window using View - New Command Window - SQL. Ran the following query (I ran it in three steps, you can run it as a whole):

--SQL9

 

-- step 1

ALTER TABLE [watercourse_1 Table]

  (ADD INDEX [feature_id_x] BTREE ([feature_id]));

 

-- step 2

ALTER TABLE [watercourse_1 Table 2]

  (ADD INDEX [feature_id_x] BTREE ([feature_id]));

 

-- step 3

DELETE FROM [watercourse_1 Table]

WHERE [feature_id] IN (SELECT [feature_id] FROM [watercourse_1 Table 2]);

The money step is the third one which deletes objects from one drawing which have a matching [feature_id] in the other drawing. But if you run that step without preparation, it will run slowly. To make it run fast, I build indexes on the involved fields in steps 1 and 2 - technically, for this particular case you can get away doing just step 2, but it is generally a good idea to build indexes on both (who knows what other queries we will need, so let's just reflect that [feature_id] is unique in both drawings in the data).

The steps run in 0.830 + 3.009 + 15.465 = a bit more than 18 seconds on a test machine, and after that the overlapping objects are only left in 'watercourse_1 2' (QC).

Attachments:
qc-on-overlap.png

mapgraphics10 post(s)
#26-Jun-18 03:49

Nice...thanks. Certainly faster than the 4 hours or so it took me to do it the other way!

Dimitri


7,413 post(s)
#18-Jun-18 08:24

There is a related, possibly useful discussion of interactive GUI visual approaches in the Example: Edit Covered Objects topic.

Manifold User Community Use Agreement Copyright (C) 2007-2021 Manifold Software Limited. All rights reserved.