Subscribe to this thread
Home - General / All posts - Field - Comma separated Unique values
RAR80 post(s)
#22-Jan-21 13:22

Hi all,

I need some help please.

I have a table in Manifold 9 latest version that has a field populated with comma separated words like:

Marty, Joe, Michael, Marty, Jean, Joe, Jean, ...

...

How do we calculate that field please to remove duplicates and get only unique values like:

Marty, Joe, Michael, Jean

Thank you,

geozap
165 post(s)
#22-Jan-21 14:25

Export to excel or some other spreadsheet, transpose, use the remove duplicates function or some other similar function that calc or libre office might have, transpose again, import back to manifold, clear from the original table all but the id field and the geom field, and use join dialog to combine data from the original and the "cleared" table.

Dimitri


6,436 post(s)
#22-Jan-21 14:45

You could do that entirely in place in Manifold, using SQL.

First, search and replace in that field to replace the sequence of comma and space character with just a comma, so the list only has commas between the names. You can then use the StringToTokens function with the comma , character as the separator character. You can use DISTINCT on the resulting table to pull only distinct values, and then assemble them using the StringJoinTokens aggregate function. You could probably put all that within a single, mighty Expression.

RAR80 post(s)
#22-Jan-21 15:18

Thank you geozap and Dimitri!

I will try as suggested.

Sloots

497 post(s)
#22-Jan-21 17:28

Don't underestimate the power of Regular Expressions....

Assuming you have a table named "tabel" with two fields named "names" and "unique_names" both of type NVARCHAR. The first contains names separated by a comma and a space, you can execute the following SQL statement:

UPDATE [Tabel] SET

  [unique_names] = StringRegexpReplace([names]'\\b(\\w+),\\s(?=.*\\b\\1,?\\s?)''''i');

Fast and no need to leave Manifold.

Chris


http://www.mppng.nl/manifold/pointlabeler

RAR80 post(s)
#22-Jan-21 17:53

Chris,

This is fantastic!!!

Thank you so much!

LandSystems54 post(s)
#22-Jan-21 20:38

Really useful. Thanks for sharing Sloots this is a great example.

lionel

691 post(s)
#22-Jan-21 21:27

really nice

manifold use \\ instead \ so has it own syntax

Is there a way to debug SQL ?

at Online regex tester and debugger: PHP, PCRE, Python, Golang and JavaScript (regex101.com)

Attachments:
regexp.map
regx101.com_2021-01-22_22-21-26.png


union

Dimitri


6,436 post(s)
#24-Jan-21 07:07

manifold use \\ instead \ so has it own syntax

No, that is incorrect. Regular expression syntax in Manifold is the usual syntax.

The \\ appears because of SQL syntax, not regular expression syntax.

The regular expression string is being passed as an argument to the StringRegexpReplace SQL function that is being used within an SQL statement (the UPDATE statement). Any characters that have special meaning to SQL must be escaped, so the text string you pass to the function is preserved literally and not parsed as a special SQL command.

A backslash \ character has special meaning in SQL because it itself is the escape character. It means to take the next character literally and not as a control character. So if you just wrote

\b(\w+)

that is a command to the SQL parser to use the literal text

b(w+)

That's not what you want, as you want the \ characters to stay in there, themselves treated as literal text and not interpreted as SQL controls. The convention in Manifold (as I think is the pretty much universal convention in other packages) is that if you want the escape character to be taken literally, you repeat it: \\ means to take the backslash character as a literal character, not as a control character meaning "escape the next character."

Therefore, when the SQL parser sees

\\b(\\w+)

in a string, it knows that you want it to use the literal text

\b(\w+)

See the "Literals" section in the Queriestopic.

Note that escape characters and how to escape characters with special meaning vary widely among SQL systems, since these are extensions to the SQL standard. Oracle uses a single quote ' character as an escape character within running text, which is highly confusing because a single quote ' character is also used to mark the beginning and end of a string. (!)

SQL Server is also weird, allowing you to specify an escape character of your choosing, but for use in LIKE statement.

PostgreSQL/PostGIS is especially weird, using E in front of strings, but within strings using \\ to escape \ sequences for special characters.

Manifold's use of \ as a universal escape character in all settings with no need for ESCAPE or other declarations is very simple, hard to get wrong, and totally in sync with centuries of C and C++ practice that every literate person since Plato has known and loved. (OK... maybe not every literate person...)

By the way, congratulations to Sloots for a truly insightful contribution. This is one of the smartest things I've seen in a long time, really beautiful for the combination of power, speed, and brevity.

RAR80 post(s)
#24-Jan-21 23:12

Chris,

That was really good for a word.

Do you please have a variant for anything duplicated between commas - I mean ', '?

Thank you again for your help!

RAR

Sloots

497 post(s)
#25-Jan-21 06:57

I don't understand what you mean exactly. Can you post an example?

(The current version works with words separated by ', ')

Chris


http://www.mppng.nl/manifold/pointlabeler

RAR80 post(s)
#25-Jan-21 13:17

Sloots,

I mean if these names were not just first name but First name Middle name and Last name separated by comma space.

Thank you,

RAR80 post(s)
#25-Jan-21 13:29

Sloots,

As an example from the former one:

Marty Hans Robert , Joe Hall Betton, Michael, Marty Hans Robert, Jean Hank Gullam, Joe Hall Betton, Jean Hank Gullam, ...

...

Thank you,

Sloots

497 post(s)
#25-Jan-21 16:40

This should do the trick.

UPDATE [Tabel] SET

  [unique_names] = StringRegexpReplace([names]'\\b([A-Za-z0-9_\\s]+),\\s(?=.*\\b\\1(,\\s)?)''''i');

It works okay if the tekst is separated by <space><comma> ' ,'. In your example there is a additional <space> after the first Marty Hans Robert that causes problems. Remove these first.

Chris


http://www.mppng.nl/manifold/pointlabeler

RAR80 post(s)
#25-Jan-21 17:11

Chris,

This is perfect!

Thank you so much!!!

RAR

Manifold User Community Use Agreement Copyright (C) 2007-2019 Manifold Software Limited. All rights reserved.