Subscribe to this thread
Home - General / All posts - Handling LineFeed In CSV
joebocop
340 post(s)
#30-Jul-18 19:01

Using the CSV data port, how do I handle "LF" characters as text?

In the attached screen capture, a field technician has accidentally added a LF in one of the fields. When I create a data source using this CSV in 9.0.167.7, the LF causes the data port to add a superfluous row to the table. I create the data source using the parameters

"SourceDecimal": ".", "SourceDelimiter": ",", "SourceFieldHeader": true, "SourceForceText": true, "SourceQuote": "\\""

I had hoped that since the field's value was wrapped in double quotes, that declaring the SourceQuote parameter would tell 9 to ignore the pesky LF therein.

Any guidance? Thank you!

Attachments:
linefeed_notepad.PNG
linefeed_row.PNG

Dimitri

5,037 post(s)
#30-Jul-18 19:38

Open the csv text file in an editor that can handle search and replace for LF and CR characters. Replace the CRLF sequence with some temp character not otherwise used, get rid of loose LF's and then replace the temp character with CRLF again. Once you have text in a table in 9, you can search and replace for special characters using \u escapes, as discussed for the Replace Text, Alltransform template.

joebocop
340 post(s)
#30-Jul-18 23:01

Thanks Dimitri. This workflow is related to my need to ingest ~40 csv files per day; would love to avoid find and replace.

If the csv data port candle handle (wasn't sure if I was using it correctly), I'll write some python to do the LF replace before touching with 9. Thanks.

Dimitri

5,037 post(s)
#31-Jul-18 06:39

This workflow is related to my need to ingest ~40 csv files per day;

Ah, well, you didn't say it was not a one-time thing. :-) You can do that in 9 with a script as well, just not using the default .csv dataport. Read the file in as text, manipulate it, write it out, and then import it. Or, since it is just text anyway, go ahead and create a table from the thing when you bring it in the first time. Use whatever scripting language you want. I prefer V8, but hey, if python is your thing, no problem...

It's always a tricky thing when somebody breaks a format in some weird way. In this case, as far as I can see from the example you've provided it is not even just a matter of being able to specify a custom end-of-record character. In this case your data uses CR LF as an end of record but LF alone is not.

I'd like to see the CSV dataport extended to enable specification of a custom character for end of record. You could then specify CR. You'd still have to clean up extra LF characters that appear to the tune of 40 csv files a day, but then if your field personnel are damaging data here and there you'll have some extra work in any case.

While we're on the topic, I'd also like to see the "CSV" dataport extended to handle fixed width fields and fixed length records, so that even without any field or record separators you could extract data.

Dimitri

5,037 post(s)
#31-Jul-18 18:20

Forgot to mention... I have filed both of the above as suggestions...

Manifold User Community Use Agreement Copyright (C) 2007-2017 Manifold Software Limited. All rights reserved.