Subscribe to this thread
Home - General / All posts - Date/time format in M9 and M8
vincent

1,931 post(s)
#29-Jul-20 16:45

Hi,

On my computer, the SQL Datetime format is not the same for M8 and M9. There are like this :

M8 : 2020-03-19 19:08:55

M9 : 03/19/2020 19:08:55

It looks like M8 is using the Regional Parameters of my computer and M9 is not.

Is there a way to change the format in M9 ?

Thank you.

tjhb

9,476 post(s)
#29-Jul-20 18:19

In short, no.

See http://www.georeference.org/forum/t143927.7#143928.

This creates an almost unavoidable trap for non-US users when importing date-time from CSV format, for example, among others.

Once data is correct within Manifold (and known to be in "correct" order, in en-US terms) it can be ~managed using DateTime... functions, to manage elements of the date-time type separately--and perhaps store them in separate fields, for safety.

It's not a great situation. In my opinion it makes Manifold's date-time data type unsafe to use for most of the world.

dale

568 post(s)
#30-Jul-20 00:35

Same as SA units. Kilometre and metre.

tjhb

9,476 post(s)
#30-Jul-20 01:14

Can you expand? Do you mean the SI vs US spelling?

dale

568 post(s)
#30-Jul-20 04:42

My apologies, yes I do. It's still commonplace to spell *metres so SI spelling, in my part of the world.

As far as my extremely cursory search of the manual goes, I have only US distances (feet, inches, chains, Tlalcuahuitl, etc) or US metric (*meters) as a choice.

tjhb

9,476 post(s)
#30-Jul-20 07:42

SI units are SI. US spelling of them is plain stupid. (Lives at stake.)

I wonder if we could ask Manifold to adopt international standards, except for the USA if absolutely necessary. (I am going to look up what Canada does.)

dale

568 post(s)
#30-Jul-20 11:06

Would a localisation file work?

Dimitri


6,280 post(s)
#30-Jul-20 11:53

No, as unit names (currently) are not localizable.

vincent

1,931 post(s)
#30-Jul-20 12:59

Canada : SI rules.

tonyw
619 post(s)
#30-Jul-20 17:14

(I am going to look up what Canada does.)

Hi Tim, I'm in Canada. It's a mash-up, sometimes of a very serious nature. in technical work we follow the SI spelling ("metre") for units. However because we are right next door to the USA and given their population is 10 times Canada's and there is major spill over the border of their newscasts, media, different people in Canada use both SI standard and US variants. Dates are probably the worst for ambiguity. Seeing 03-04-2020 is really ambiguous. So is that April 3, 2020? No most times the majority of Canadians mean March 4, 2020 substituting 03 for March.

That spills into data sets I see. If we are lucky and we have enough dates in a data set there will be a number larger than 12 in one of the positions in the date and from that we can figure out which system the author used, a number larger than 12 can only be "day" in that position. So 04-28-2020 and 28-04-2020 can only be April 28, 2020.

The federal government in Canada has jurisdiction over weights and measures and has official guidelines (but not that everyone follows the official guidelines) for dates writing either big to small 2020-04-03 (April 3, 2020) or small to big, 03-04-2020 for April 3, 2020. Either way is unambiguous, month is always in the middle. However I'd say the majority of Canadians put the day in the middle and we don't know what the author's preference when we acquire data. We can only examine the data looking for clues (a number larger than 12).

Time is another problem. I use the 24 hour clock in data, it's unambiguous. However you see a lot of data with 12 hour clocks and AM and PM. Worse is if the time is a text field being a mix of numbers and text. Argh.

adamw


9,447 post(s)
#24-Aug-20 11:51

With 'meter' vs 'metre', what specifically you want to happen:

(a) have a unit named 'meter' and a separate unit named 'metre' with the same scale factor (same for kilometers, etc), so that when you tell that the coordinate system is in 'metre's, the choice sticks and shows 'metre' on other systems?

(b) have a user-level option to spell 'meter' as 'metre' on your machine?

Or something else?

We can allow localizing unit names, that would work like (b). This isn't terribly easy and names in coordinate system definitions stored in metadata would remain unchanged ('meter' or 'm') with only the UI translating them (to 'meter') on the fly, but if that's what is needed here, we can do it.

tjhb

9,476 post(s)
#24-Aug-20 14:17

(c) Generally and by default, use the standard international spelling for the SI unit named 'metre'.

Perhaps allow localization to 'meter' as an option.

tjhb

9,476 post(s)
#24-Aug-20 15:30

It might be different if Manifold were a US product. (I don't think it would be different even then, but it might.)

Manifold is not a US product, it is (at least now) now clearly international, which is something it takes admirably seriously in other ways. (Collations, RTL order, double-byte characters, translatable UI, ... though not yet dates, the subject of this thread.)

It is in line with that to treat international units by their international scientific names.

Dimitri


6,280 post(s)
#24-Aug-20 15:40

the standard international spelling for the SI unit named 'metre'.

Fair enough, but there's a practical issue, arising from cultural asymmetries in awareness of spelling variations.

Given the ubiquity of Americanisms worldwide, everybody internationally knows what a "meter" is. They may look at it, roll their eyes, and think "Oh, those Americans...", but they'll know what it is.

In contrast, in the US, using "metre" by default will be thought by some users to be a different unit of measure from "meter". "meter" is so universally used in the US, the standard spelling even within the scientific community, and there is so little exposure to international spellings in many settings within the US, that this is a factor. Some folks really won't know that a "metre" is a "meter."

I'd therefore hesitate to change the default, and would suggest a) or b).

adamw


9,447 post(s)
#24-Aug-20 15:59

OK, let me ask a different question then. Suppose we switch to 'metre'. How should we add an option to use 'meter' for those who want it?

tjhb

9,476 post(s)
#24-Aug-20 16:35

Dimitri is right. This particular argument is not worth resolving, because no one can be misled. There is no context where we can mistake a length for, say, a parking meter.

Manifold uses US English generally, and 'meter' is in line with that. In principle it may be wrong (I think so), but in practice it's no more difficult than using 'color' (which is not).

mlinth
441 post(s)
#23-Aug-20 20:08

I've run into the same issue. I have some dates in dd/mm/yyyy format in text, and M9 casts them in mm/dd/yyyy format: i.e. this:

values (cast('31/12/2020' as datetime), cast('01/12/2020' as datetime), cast('2020-12-01' as datetime))

Gets me 12/31/2020 0:00:00, 1/12/2020 0:00:00, 12/1/2020 0:00:00

Adam wrote something back in, er, 2018 about adding some date conversion functions. Are these on the way, or have I missed them? I will send in a suggestion, to be on the safe side. Otherwise the only thing I can do right now is to reformat the date string into yyyy-mm-dd and do the cast on that, i.e.

cast(StringSubstringLen('01/12/2020', 6, 4) &'-' &  StringSubstringLen('01/12/2020', 3, 2) &'-' &  StringSubstringLen('01/12/2020', 0, 2)  as datetime)

It's a bit of a banana skin at the moment, has to be said.

adamw


9,447 post(s)
#24-Aug-20 11:46

The conversion functions are not yet available. We view parsing dates as part of a bigger issue of formatting / parsing back typed data in general. This is a relatively big bunch of work, as such it has to compete with other big bunches of work, and so far other bunches have been winning.

Maybe there is something that we could do about the dates specifically without waiting for the entire bunch. We'll see.

tjhb

9,476 post(s)
#24-Aug-20 15:09

FWIW I don't think conversion functions are enough in the case of dates.

The errors here are so universal and so hard to detect that they become very serious.

Manifold's treatment of dates is currently unstable, because its behaviour is essentially undefined.

Conversion functions will change that, yes, but only for users who know about the problem and make an extra effort to control it.

What is needed here is a solution that will work for users who do not know about the problem, or who are not currently thinking about it. Or, nearly as good, a solution that will fail with a meaningful warning whenever the default processing does not make sense.

(The current behaviour of silently "fixing" one case in a hundred potentially 18 cases out of 30, but leaving all others to be silently wrong, is nuts.)

I do not know the solution, but I suspect a large part of it may be to respect the user locale by default.

Dates are an important enough data type to merit their own treatment, even if that entails exceptions to general rules. it

adamw


9,447 post(s)
#24-Aug-20 15:43

I guess I might be missing something.

Where specifically do the errors occur?

We know of a couple of cases with imports where dates are stored as text - most notably, CSV. There, if the text is ambiguous, eg, 3/10/2020 instead of 3-Oct-2020, the import might make a wrong guess. This is similar to how it was in 8 and similar to how it is in all other applications (like Access) and components (like various ODBC drivers for CSV) which try to parse dates from text automatically - unless the user specifies the format the dates are in, some or all of the guesses might end up being wrong. What we should do to fix this in 9 is allow specifying the format for the dates.

What are other cases where the current parsing of dates which does a best-match guess could be problematic and why specifying the exact format would not be a good solution for them?

tjhb

9,476 post(s)
#24-Aug-20 16:13

My first point is, I suppose, that a date like '3/10/2020' is almost never ambiguous. Not to the user (because of custom) and not to the user's Windows system (because of locale).

The only time a date is ambiguous is if there is no such context. And in that case, a date like '3/31/2020' is also ambiguous. It is either 31 March 2020, or (as it would always be in New Zealand, and in most of the world) wrong. Without context, who knows?

Manifold should not see it as ambiguous either. It should either know or throw, never guess.

Secondly, a case-by-case approach is not good enough. It is no good if successive dates in a table, stored in text format, are interpreted with opposite day-month order on parsing. That is worse than useless and dangerous.

Third, there are at least two more important contexts, besides CSV. One is manual data entry, the other is parsing SQL literals. These should follow defined rules, aligned with user culture, and be consistent from one date to the next. (And throw a type error if a supplied date makes no sense in the current culture, even if it would in another.)

Fourth, even if it is necessary to store all dates in one format, it must be possible to show them in a format valid for the user.

Yes, it would be good to be able to specify the exact date format as well, even going against applicable custom/locale (useful for writing code across borders). But that is not enough to solve the problem or be safe.

adamw


9,447 post(s)
#24-Aug-20 16:33

I think it is more complex than that.

What is the context for importing a CSV file? The default date format selected by the user for his account? But the dates in the file can be incompatible with it - eg, the default format could be set to MM/DD/YY, but the dates in the file could be formatted as YYYY/MM/DD with four digits clearly specifying that the first token is year - or MM-DD-YY with a different separator - or MMM-DD-YY with a name of the month instead of the number. In all these cases the format used by the dates in the file will differ from the format selected by the user on the system. If we say that we should tolerate different separators, treat MM and MMM as the same thing, etc, that's fine, but then let's say openly that we need some guessing to be done, just more intelligent than the current (and ideally specify the desired rules or at least list cases of current misguesses which we want to fix).

Not only that, but what if the file was created on a system with different settings? With the rule being to follow the default date format, the format on the system the file is being imported is irrelevant, it's the format on the system the file was generated that is important.

I agree that the current situation - which is: use a fixed format when printing, best-match guess any format when parsing - is not ideal. That's why we want to change it. But we don't see much of a choice but to let specify the desired format exactly both ways whenever this is necessary.

tjhb

9,476 post(s)
#24-Aug-20 16:50

I don't know. My gut feeling is to parse only the formats established in the system locale, and treat everything else as an error (giving NULL where appropriate). That way no one is ever misled, at worst disappointed. Disappointed is more easily fixed.

But I am also thinking what does Excel do, what does LibreOffice Calc do. I think they handle this well.

adamw


9,447 post(s)
#24-Aug-20 17:05

They preview the data values and allow selecting the format for each field. If the format is not selected or a particular value doesn't match it, the value is left as text. There's quite a bit of guessing even after the format is specified, eg, different separators are fine, etc. We can do something similar, yes.

mlinth
441 post(s)
#24-Aug-20 21:59

Thanks for the response and the thoughtful discussion.

What makes the current behaviour dangerous for me is that it is very easy for data to get changed without noticing. At no point do I get to say what happens with my data so, as Tim points out, my DD/MM/YYYY dates get silently imported wrong if the day is less than the 13th of a month.

I'm avoiding this by importing anything with a date via Excel or JSON*.

Personally, I would rather the CSV dataport not guess, and import date fields as text instead. For conversion, I like the Postgres to_date(text,text) functions, which take as arguments the string to convert and the format of the string. As a user I have to consciously state what the format is.

Anyway, I appreciate that this is not a trivial issue, and the approach above might suit me but Manifold has to cater for all users.

Thanks again,

M

* an aside, but Manifold is by far the best tool I have used for working with JSON files.

adamw


9,447 post(s)
#25-Aug-20 08:41

Thanks.

We do have an option to import all data in a CSV file as text - 'Read all fields as text'. However, this option applies to all fields, not just those that get recognized as dates. Plus all options like that are currently kind of hard to reach, you have to link to specify them. We also lack a conversion function that would take a string and a date format and return a date.

We'll likely do this: (a) add conversion functions to make conversion possible (to do entirely in Manifold without resorting to scripts) even if in a roundabout way, (b) extend CSV with specific options for parsing dates (keep text / try parsing a generic DMY / MDY / YMD / try parsing a single specific format), (c) allow specifying options during imports easier.

Manifold User Community Use Agreement Copyright (C) 2007-2019 Manifold Software Limited. All rights reserved.