Subscribe to this thread
Home - General / All posts - Float32 numbers typed in tables changing
geozap97 post(s)
#24-Oct-18 14:56

I create a table and I add a float32 field in it. Then I type 2.51 in a new row but the number becomes 2.509999990463257. Similar changes happen with most of the numbers I tried.Numbers with 0.5 as decimals (5.5, 100.5 etc) stay the same. The difference is very small, but what is happening? This difference may cause problems. For example it can change the result of a selection: Selection greater or equal to 2.51 fails to select 2.51 because it has become 2.509999990463257.

I checked also with float64, and the change in numbers doesn't happen in that case.

geozap97 post(s)
#24-Oct-18 16:13

I forgot to mention, I am talking about M9.

tjhb

8,335 post(s)
#25-Oct-18 22:32

I've been trying to find you a nice link where this is explained. I'll keep trying, but in the meantime here is a short answer.

This is an artefact of representing decimal fractions in binary.

In any base system, not all fractions can be expressed exactly in a finite number of digits. Think of 1/3 in base 10. You need an infinite series of 3s after the decimal point to express this fraction exactly. Any finite expression is approximate.

That is the case because the decimal digits in base 10 express inverse powers of 10 (a/10^1 + b/10^2 + c/10^3 + ...), and there is no finite series of inverse powers of 10 that sum to exactly 1/3.

On the other hand, 1/2 can be expressed exactly in decimal digits, because 5/10^1 is exactly 1/2.

The same thing is true in binary, but for inverse powers of 2. 1/2 can be expressed exactly, because 1/2^1 is (trivially) 1/2. But 1/10 or 2/10 cannot, because there is no series a/2^1 + b/2^2 + c/2^3... which sums to exactly 1/10 or 2/10. Those two numbers can be expressed exactly in a finite expansion in base 10, but not in base 2. The same is true for many other fractions.

(This can be expressed better, in terms of primes.)

Back to your example, 2.51 (in base 10). Can it be expressed exactly in base 2? I should work this out explicitly but for now I will draw the lazy inference that it can--but that doing so requires more decimal digits than are available in FLOAT32.

When you type decimal 2.51 and it is stored as FLOAT32, it becomes the nearest binary approximation available in that type, which is not exact. Conversion back to decimal shows a slightly different number than what was entered.

When you type decimal 2.51 and it is stored as FLOAT64, it can be expressed exactly, given the larger number binary digits. When it converted back to decimal, there is no loss of precision.

At least, that is my inference. I should check the actual expansions of 2.51 as FLOAT32 and FLOAT64 if I were not so lazy.

tjhb

8,335 post(s)
#25-Oct-18 23:35

I think my inference regarding 2.51 in FLOAT64 is incorrect. I think that it can't be represented exactly in either FLOAT32 or FLOAT64, and the apparent difference probably comes down to different rounding for display. But this is a lazy inference too, with no testing (for now).

The best thread on this I have come across is Why can't decimal numbers be represented exactly in binary?

tjhb

8,335 post(s)
#26-Oct-18 00:47

That's correct.

In FLOAT32, 2.51 is reproduced (decimal -> binary -> decimal) as

2.50999999046325684000

In FLOAT64, 2.51 is reproduced as

2.50999999999999979000

These are the values shown in Manifold 8, with display precision set to 20 places.

Manifold 9 rounds floating-point values for display to 15 places (sometimes 16 places). This can't currently be adjusted.

When the FLOAT32 value is rounded in Manifold 9, to 15 places, we get

2.509999990463257

When the FLOAT64 value is rounded to 15 places, we get

2.51[0000000000000]

geozap97 post(s)
#26-Oct-18 06:27

I think I got the idea. Thanks.

So, I suppose to avoid problems like the one I mentioned in my first post, we have better always use float62 (double in Manifold 8) instead of float32, even when only 2 decimals seem enough (like with currency).

tjhb

8,335 post(s)
#26-Oct-18 06:40

It is not so easy to avoid this issue. It’s built in to all CPUs that use binary representation.

The number of decimal places (in base 10) is not relevant.

It is the actual fraction involved (and the number base) that is important.

Sure, using float64 rather than float32 can give a closer approximation for a given decimal value—but an approximation remains an approximation, so tests like =, >, < remain susceptible with both data types. (As in the excellent example in your first post. That applies equally for float64.)

If you are using a language that has a built-in Decimal type, then you can use that (at some cost in speed).

SQL9 does not have a Decimal type. Python and C# do, for example.

Understanding what is going on is the only good approach.

It was a great question I think. It’s not as much talked about as it should be.

Dimitri

5,082 post(s)
#26-Oct-18 06:52

SQL9 does not have a Decimal type. [...]

It’s not as much talked about as it should be.

Very true. Oracle, SQL Server, DB2, and PostgreSQL do have a DECIMAL/PRECISION type, as far as I know. ArcGIS does not. Do you think 9 should introduce this?

tjhb

8,335 post(s)
#26-Oct-18 22:09

I had to think about that one!

On balance, I think no, not necessarily. Not unless and until you start marketing a Manifold product (or derivative*) for, say, options pricing, or accounting, or (even) as a general spreadsheet replacement.

[Or even maybe cardboard derivatives.]

For real-world [geographic] floating-point values, continuity is the important thing, and qualitative rather than discrete comparisons.

For the time being, if someone wants to build (say) a financial package on top of Manifold 9 and Radian, they will in any case be writing their own core methods, or at the very least custom scripts and script functions, which can then use the .NET Decimal type as required. They would probably only be using SQL9 for superfast transport, and perhaps to benefit from free data parallelism.

Besides that, the eventual possibility of adding a DECIMAL type to SQL9 is to my mind complicated by its having (so far) only one native numeric type, which as I understand it is FLOAT64. (I need to do some more work to clarify this, extending it to xN variants for example.)

In both those respects, I mean that there may be higher priorities.

geozap97 post(s)
#26-Oct-18 10:14

"Sure, using float64 rather than float32 can give a closer approximation for a given decimal value—but an approximation remains an approximation, so tests like =, >, < remain susceptible with both data types."

So, when using float64 you see on screen a value closest to the one you have typed, but because your typed value might still be a little bit different than the the one stored, you still might get a wrong result in tests like <,>,=, right? If I understand well, float64 doesn't improve things compared to float32 in this issue and tests like 10.61000000000000000000000001=10.61 (float64-many stored decimals) and 10.610000000000001=10.61 (float32-less stored decimals) will equally fail. (the numbers and the decimals are just a random example)

Moreover you might get confused when using float64, because you see on screen a value equal to the one you typed, but still stored different in the 25th decimal.

Dimitri

5,082 post(s)
#26-Oct-18 12:09

Yes and no. The key thing to understand, perhaps simplifying a bit, is that the binary representation is consistent, so the "susceptible" bit is often not a factor in real life use.

Fire up your copy of the Nwind sample database and try:

SELECT * FROM [Products] WHERE [Unit Price] = 21.35;

That picks out the Chef Anton's Gumbo Mix product every time, at a Unit Price of 21.35. So why does that work if float64 numbers representing 21.35 may be different in the 15th decimal place?

The answer is that when the decimal value 21.35 got stored into the table it got stored into exactly the same binary representation using a float64 as the value 21.35 becomes when used in the query. As far as the computer is concerned, they're the same.

Tim's right, in that this is not as much talked about as it should, and sometimes when systems try to dodge the details by talking about precision to a given number of decimal places they may be implementing a technique that conceals the difference between float64 representation in binary and true, decimal numbers.

So, if you have a "money" format that automatically rounds those very long float64 representations of a value like 10.61 to just the nearest two digits after the decimal point, well, it's still a float64 but it is just being displayed as if it had only two decimal digits. Simple methods like rounding display to a given level of precision can go a long way towards avoiding any issues.

Mike Pelletier


1,529 post(s)
#26-Oct-18 14:58

Would changing the variable to an integer value be a practical way to solve the problem?

SELECT * FROM [Products] WHERE CInt([Unit Price]) > 21.35

Perhaps this could be added to the Mfd9's query builder until and if the decimal/precision type is introduced?

I was blissfully unaware of this issue! Thanks all.

tjhb

8,335 post(s)
#26-Oct-18 22:37

Would changing the variable to an integer value be a practical way to solve the problem?

If the source field is cast as integer, then comparison to a floating-point value is a bit spurious.

Your example is equivalent to

SELECT * FROM [Products] WHERE CAST([Unit Price] AS INT64) >= 22;

or more simply

SELECT * FROM [Products] WHERE [Unit Price] >= 22;

To use a non-integer value as the threshold, you need a non-integer source value.

By contrast, consider this potentially problematic example. Say the source field [Unit price] is stored as FLOAT32. Now run this query:

SELECT * FROM [Products] WHERE [Unit Price] >= 2.51;

What happens here, for a value entered into the FLOAT32 column as "2.51"?

Answer: no records, because "2.51" in the table means its FLOAT32 representation, which is less than the FLOAT64 representation of the "2.51" literal read by the compiler. (See above for the figures.) The SQL9 compiler always assumes FLOAT64 for numeric values, even for integers.

It would be the same for = as for >=. No records returned.

This can be fixed by matching the literal type to the source type (where it is not FLOAT64):

SELECT * FROM [Table] WHERE [Unit Price] >= CAST(2.51 AS FLOAT32);

That returns the expected record. (Same for =.) This should be done.

Dimitri

5,082 post(s)
#27-Oct-18 06:26

Taking the Products table from Nwind as an example, it is interesting to see the difference between what float64 and float32 produce. Create a new, float32 field called [Unit Price Float32] and then, using the transform pane, copy the contents of the Unit Price field into that new field. You'll see some values like 23.25 stay 23.25 while other values, like 21.35, are displayed as 21.350000381469727.

Tim's admonition to use CAST to maintain comparative order is correct, as it is not as simple a matter as just slicing off unwanted digits deep past the decimal point. 9.2, for example, is represented as 9.199999809265137 as a float32.

I have to admit to taking the lazy way out by always using float64. That only works, of course, until it doesn't work, and then you have to think about stuff instead of sweeping the need for extra thought under the rug. I think we need a DECIMAL or PRECISION type. That does not eliminate the many issues involved in binary representation of numbers, but at least it provides a well-known way of managing them.

adamw


8,204 post(s)
#27-Oct-18 10:44

A small note:

Manifold 9 rounds floating-point values for display to 15 places (sometimes 16 places). This can't currently be adjusted.

In 9, we don't round during display by default, we are displaying the shortest sequence of digits that converts back to the exact same numeric value.

tjhb

8,335 post(s)
#27-Oct-18 12:11

I think you have mentioned this before Adam, but I didn't remember it.

To adjust the wording slightly, I hope without adjusting the meaning:

We are displaying the shortest sequence of decimal digits that converts exactly to the stored binary value.

But is this true only for FLOAT64 values?

I am thinking for example of the representation of the nearest binary values to 2.51, given above, for FLOAT32 and FLOAT64. Only the displayed FLOAT64 value seems the shortest available.

Slightly related.

adamw


8,204 post(s)
#27-Oct-18 18:07

Yes, this is correct - the shortest sequence of decimal digits that converts exactly to the stored binary value.

This is for printing FLOAT64 values. We are currently printing FLOAT32 values by converting them to FLOAT64 and printing the result. The conversion from FLOAT32 to FLOAT64 is exact, but the printout has extra digits - maybe we should cut them off.

The binary representation of 2.51 is this:

The exponent is 40 (hexadecimal), this is stored exactly in both FLOAT32 and FLOAT64.

The mantissa (the stored part without the implied upper bit) is 20 A3 D7 0A 3D 70 A... and then this continues but FLOAT64 already ends. FLOAT32 ends earlier - after 20 A3 D7, so this loses some data, and if we convert it to FLOAT64 padding the mantissa with zeroes and print it out, this looks worse than 2.51. Again, we should perhaps take into account that we are printing a 32-bit value, that will print it as 2.51, but we have to make sure this does not break anything first (have to think about it, it might backfire).

tjhb

8,335 post(s)
#30-Oct-18 00:59

Thanks Adam, that helps very much.

If and when you do this

Propagate numeric types not to FLOAT64 but to the narrowest type appropriate for the result based on the types of operands.

perhaps it would make sense to treat FLOAT32 values in the same way as FLOAT64 are treated now? That is, to show the shortest sequence of decimal digits that converts exactly to the stored binary FLOAT32 value.

But not to change anything for FLOAT32 display before then?

adamw


8,204 post(s)
#30-Oct-18 08:15

We probably don't have to make these changes together, and can make them in any order.

We'll see.

Manifold User Community Use Agreement Copyright (C) 2007-2017 Manifold Software Limited. All rights reserved.