Subscribe to this thread
Home - General / All posts - grouping lines that are touching
vincent

1,724 post(s)
#22-Aug-18 20:43

Hi,

I'm wondering if there is a simple way to group lines that are touching ? See that attached jpg. I would like to obtain one object from the selected (red) lines. In fact, I want to obtain one object for each isolated group of lines. I attached a map file also (M8).

Is there any function in M8 to do that ? Or should I go with SQL ? Or a script ?

My goal is to discard every group of lines that I do not need in my analysis, to save processing time.

Thank you !

Attachments:
net.jpg
network.zip

tjhb

8,348 post(s)
#22-Aug-18 22:29

There's no built-in function for this in 8 or 9.

It requires a script, which will "walk" the line networks, starting from every free end, incrementally adding its neighbour or neighbours at its opposite end, then repeating for each combined cluster until no more neighbours are found.

So the first step is to list all neighbours at each end of each line, and store these in a dictionary (ID -> list, list). Then initialize a stack with all lines having free ends--these are minimal initial "clusters". Pop each cluster from the stack and check for forward neighbours. If there are none, skip to the next cluster. If there are forward neighbours, add these to the cluster by collecting their IDs and adjusting the list(s) of the cluster's neighbours, then push the enlarged cluster back onto the stack. Continue until the stack is empty. (That's DFS, you could also do BFS using a queue.)

You'd also need to manage what "forward" means for each line, swapping start and end where lines converge. And check for repeat visits, which would show a closed loop (usually a topology error but not always) and prevent the search from finishing.

Something like that. I may have made errors and left things out.

A mistake already: you need to flag lines as they are added to a cluster, to avoid adding the same line traced from different free ends. (That deals with closed loops as well.)

tjhb

8,348 post(s)
#23-Aug-18 00:41

I left a few other things out.

Each cluster would initially be given the ID of the line it is started from, and would keep that ID as neighbouring lines (IDs) are collected for the cluster.

At the end of that process, clusters would consist of a cluster ID and a list of line IDs added to it.

You would then assign the cluster ID to all lines in its list, and use SQL AllBranches() to combine line geometry by cluster ID.

You don't need to do anything with geometry between the first step (find touching neighbours) and the last step (combine geometry). The clustering in between just matches IDs (integers), so is very fast.

Dimitri

5,085 post(s)
#23-Aug-18 06:11

This is a classic task in graph theory: find all connected components, that is, subgraphs, in the overall supergraph. See https://en.wikipedia.org/wiki/Connected_component_(graph_theory)

If you have a copy of the GGF (general graph facilities) library issued years ago by Manifold's predecessor, CDA, you could use the connected component detection functions to quickly find all subgraphs. GGF is not parallel but algorithmically it can't be beat, not even today. It also has the advantage of saving you from reading a truly massive amount of very difficult, specialist literature trying to find a good algorithm to adapt. (Not everything is as clear and as easy to read as https://arxiv.org/pdf/1307.2187.pdf )

In modern times you'd want to parallelize this, which is straightforward. There's surprisingly much algorithmic work that has been done on parallelizing subgraph detection because it is one of the methods used in image analysis. Finding "groups" of pixels that are "similar" can be done by building subgraphs of similar pixels where two similar pixels being next to each other is taken as the pixels being connected, and parallelizing such subgraph detection is job one when it comes to image analysis.

Just for purely historic interest, I've attached a PDF of the relevant chapter of GGF doc. You can see it was somewhat of a specialty product. :-) There's always been some slight interest in porting this into contemporary Manifold as a utility library for work with graphs, but given the very limited audience for such functions and the truly massive effort involved to migrate it, that's never been sensible. That's especially true given the effort that would be required to implement modern work.

For example, in recent years there has been a lot of effort put into work on "uncertain" graphs, because much of what is stored in a database as an edge may or may not be there at any given moment, so there are probabilities attached. That's analogous to using an epsilon in real-life "networks" where maybe because of the accuracy with which road lines were recorded what is really intended to be a connected network actually has gaps.

The best that has happened is that some of the Release 8 network functions are derived from GGF, and those I suppose might appear in 9 some day.

Attachments:
vol1_chap_6.pdf

artlembo

3,096 post(s)
#23-Aug-18 01:03

this can be done relatively easy, if you are willing to live with some 'anomalies'. A 3-step process would be (I changed the name of your drawing to 'A'):

1. buffer the lines (drawing A) ever so slightly, and then union them together

SELECT g

FROM SELECT unionall(buffer([Geom (I)],10)) AS g

FROM A

)

SPLIT BY branches(g)

you can import a drawing from the above query - let's call the drawing [B]. This creates a drawing with a single area feature. You need to bust those areas apart into their component pieces.

2. go back into the GUI, and Decompose the areas you created in step 1 using the Transform bar.

3. now that you have decomposed elements for the areas (drawing B), so, assign the ID number from each area to the lines that are inside them:

UPDATE

 (SELECT A.[Geom (I)], B.[ID] AS bid,magnitude 

  FROM A, B

  WHERE contains(b.[Geom (I)],a.[Geom (I)])

  )

SET magnitude = bid

I got lazy here, and just updated your field called [magnitude] with the ID number from the decomposed areas. But, you can certainly use another column.

That's it.

Now, I said there are some anomalies. How far should you buffer them (10, 5, 0.0001). You'll have to decide that.

artlembo

3,096 post(s)
#23-Aug-18 01:30

OK, here is a way to do it in a single query:

UPDATE

 (SELECT A.[Geom (I)],uid, aid 

  FROM A, (SELECT g, [island], cstr(Area([island]))&"_"&cstr(CoordCount([island])) AS aid

     FROM (SELECT unionall(buffer([Geom (I)],10)) AS g

   FROM A

   )

  SPLIT BY islands(g) AS [island]AS B

  WHERE contains([island],a.[Geom (I)])

  )

SET uid = aid

I added a field called uid in your table. The above query actually takes the unioned areas and busts them apart using the Islands function. Then, to get a "unique" id, I concatenated the coordinate count and the area of the busted up area (yes, I suppose there is a small chance that two different areas will have the same number of coordinates and the same area). But, this will do what you want and you don't have to use the GUI at this point.

BTW, to test it, I only ran it with about 16 lines, so be prepared to wait. Maybe run this, and then go and get a cup of coffee.

adamw


8,217 post(s)
#23-Aug-18 15:24

OK, I had some fun with this as well.

See attached MXB file.

It contains two queries and a script. Open the MXB in a new instance of 9, then copy and paste both queries and the script into your MAP file above, then run them in sequence: Step 1 -> Step 2 -> Step 3.

Step 1: query. Takes the drawing of lines, runs GeomOverlayTouchingPar on it to find all touching lines and puts IDs of all touching lines into a temporary table (temp). Also creates a second temporary table (temp_out) for the script.

Step 2: script, C#. Reads IDs of all touching lines (temp), merges all touching lines into clusters, then writes the cluster ID for each line (temp_out).

Step 3: query. Takes the drawing of lines, joins the cluster ID for each line (temp_out), groups lines by cluster ID and merges all lines within a cluster into a single geom (GeomMergeLines and then GeomNormalize to join branches). Outputs the result into yet another temporary table (temp_clusters), adds an RTREE index for the geoms, copies coordinate system info from the original drawing, then creates a new drawing (temp_clusters_d).

Try opening temp_clusters_d, it should contain what you want.

PS: The code should perhaps be cleaned up a bit. Ie, I am using MFD_ID to identify clusters and these are 64-bit ints, but I am treating them as 32-bit ints in the script. Also, the script should perhaps create the table it writes to by itself, without the help from Step 1, or at least it should start with deleting everything from that table.

Attachments:
query-script-query.mxb

vincent

1,724 post(s)
#23-Aug-18 20:33

Wow ! These are all great inputs !

I'll test and read all this and report what is faster. I started with Art's. Easy to implement, but not fast enough. I already have a version of Tim's suggestion, but I have to improve it with his ideas. Regarding Adam's workflow, I suspect it is the faster, using M9. Got to run it.

vincent

1,724 post(s)
#24-Aug-18 13:50

Adam's solution took around 5 seconds to run. Hard to beat !

Manifold User Community Use Agreement Copyright (C) 2007-2017 Manifold Software Limited. All rights reserved.