-
Notifications
You must be signed in to change notification settings - Fork 48
Add arbitrary precision decimal floating point #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
The specification followed (partially, to stay compatible with |
|
On Mon, Nov 29, 2021 at 07:09:04AM -0800, LdBeth wrote:
Resolves proposal #69
Ready for review.
You can view, comment on, or merge this pull request online at:
#71
Let me say first that I am not fun of decimal float. But since
you already wrote it and it is certainly more useful than some
domains that we already have, there is no principal obstacle
to including it.
Now some specific remarks. First trivialites: in Makefile.in
(and in general) I normally try to minimize size of diffs.
Which in case of file lists in Makefile.in means adding entries
in relevant place (mostly in alphabetical order), splitting
line if it overflows. So no larger reformatings. Second
triviality: our file should have newline at end (your diff
produced file with no newline at end).
For inclusion of new code we normally want also tests (in
src/input subdirectory there are several tests files which
may give you idea how this is done).
You probably want for DecimalFloat to be recoginzed by Spad
compiler. For this you will need changes to
src/interp/compiler.boot, probably in compSel1 and constant_coerce.
Now, concerning transcendental functions, I feel that your
approach of copying and modyfiying code is somewhat problematic.
Namely we do improve such code from time to time and having
duplicate means that version in decimal will get stale (have
no improvement) or we will have to do work twice, separately
for each version. Unless you have strong reason to do
otherwise the simplest approach is to convert to float,
use float function and convert back. Convertion to float
should have cost comparable to cost of multiplication in
decimal float so preformancewise this should be faster
than having function working in decimal format (since operations
in float should be faster).
…--
Waldek Hebisch
|
|
Thank you for the review comments, @hebisch
These are mainly hacks to produce double float literals for the Common Lisp compiler.
Converting to Say,
the General Decimal Arithmetic has these special requirements on |
|
On Wed, Dec 01, 2021 at 09:01:38AM -0800, LdBeth wrote:
Thank you for the review comments, @hebisch
> For this you will need changes to
src/interp/compiler.boot, probably in compSel1 and constant_coerce.
These are mainly hacks to produce double float literals for the Common Lisp compiler.
Besides, I haven't figure out a way to add the support there without breaking the rest of the code.
I mean extra lines like:
domain = $Float and op = "float" and m = ["DecimalFloat"] =>
argl is [mant, exp, 10] =>
[["Sel", ["DecimalFloat"], "float"], mant, exp], m, e]
Basically, this code needs to replace
[["Sel", ["Float"], "float"], mant, exp]
by
[["Sel", ["DecimalFloat"], "float"], mant, exp], m, e]
Spad compiler uses somewhat different representation of code than
interpreter...
> Unless you have strong reason to do
otherwise the simplest approach is to convert to float,
use float function and convert back.
Converting to `Float` is easy, however converting back without significant precision loss is not straight
forward and requires complex computations.
Hmm, both require computation and both may lose some precision (0.5-1 lsb).
You already have main part: float(m, exp, 2)$DecimalFloat will do the work.
Say, `1.2`, is `177088743107611695514*2^- 67` which gives `1.1999993663_703848278`
when feed to the `float(m, e, p)$DecimalFloat` function.
Hmm, with 40 digits precision I get:
(4) -> 1.2 - float(177088743107611695514, -67, 2)
(4) - 0.2710505431_213761085 E -20
Type: Float
(5) -> 1.1999993663_703848278 - float(177088743107611695514, -67, 2)
(5) - 0.6336296151_7220271050_5431213761_085 E -6
Type: Float
This is massive error, you have only about 20 _bits_ of accuracy.
In such case I would look for bugs in decimal artitmetic routines.
Maybe you mixed somwhere decimal with binary precision?
Although `convert10` inside `Float`
can do the job correctly, there are at least 2 problems:
1. elision of trailing zeros. The General Decimal Arithmetic requires using full precisions when the result is inexact.
2. different rounding mode. The `Float` uses binary round to even, while `DecimalFloat` uses round up half. `exp 1.3` gives `3.6692966676_192442204` in binary float, and in decimal it can gives `3.6692966676_192442205`
the General Decimal Arithmetic has these special requirements on `log10`, `power` and `exp` so I have
adapted the algorithm a little.
Hmm, I do not see in your code anything that will handle exact rounding.
For your purpose rounding in Float is irrelevant, 'exp' in Float
just produces rational approximation to correct value of 'exp' which
is supposed to have relative error not exceeding 2^-bits. For
exact rounding you need to check in rounding code if available
accuracy is enough to perform rounding according to the decimal
rules. If not, increase accuracy and repeat. So you need
someting like:
roundExactlyIfCan(x : Float, eps : Float) : Union(DecimalFloat, "failed")
....
with appropriate implementation and loop like:
fbits := initial_bits(digits)
obits := bits()$Float
repeat
bits(fbits)$Float
eps := float(1, -fbits, 2)$Float
xf := float(m, e, 10)$Float
ef := exp(xf)$Float
edu := roundExactlyIfCan(ef, eps)
edu case DecimalFloat =>
bits(obits)$Float
return edu::DecimalFloat
fbits := 3*bits quo 2
roundExactlyIfCan is somewhat tricky to implement correctly and
efficiently. Naively roundExactlyIfCan just can treat its input
as exact rational and compute best approximation by decimal
fraction. If distance to neighbouring decimal fractions is
larger than 2*eps then you know that you got correct rounding.
However, FriCAS exponents can be bignums and implied computation
with exact fractions may be prohibitively expensive. Instead,
roundExactlyIfCan should compute approximation to
ed := (log(2)*e)/log(10) and use integer part of this as
exponent of decimal approximation and exp of fractional part
to correct mantissa. Of course this messing with logs
introduces errors on its own, so roundExactlyIfCan must use
sufficient accuracy so that errors from this approximation
are smaller than eps and then use total estimate of error to
decide if approximation can be correctly rounded. Note that
in FriCAS when you use 20 digits for mantissa you may have
1000 digits in exponent. Then log computation must deliver
hundreds of good digits in order to avoid complets loss of
accuracy. To get idea of what I mean you can compare
2.0^(2^911-1)
and
exp(log(2.0^(2^911-1)))
Mathematically they should be equal, but at 20 digits accuracy
you get completely different numbers. Note that exact rounding
at intermediate stages does not help: exact rounding means that
second expression will consistently give you one result (the
same for all implementations of exact rounding), but widely
different than first expression.
Note that roundExactlyIfCan has some similarity to what direct
implementation of log and exp need to do for exact rounding.
Tradeofs here are that by implementing roundExactlyIfCan you
can have only one implementation, supporting multiple uses.
In exp and log you can take advantage of specific properties
of exp and log. However, after arguments are reduced to standard
range this advantage essentially vanishes. I see no reason
to copy main body of log and exp.
BTW: main part of exp1 is purely integer computation completely
independent of base. So, if you really feel that you need it
you can just export integer calculation from float and call
exported version.
BTW2: I plan to implement more functions using similar
principle as used in exp1 (that is computing rational
approximations purely in integer arithmetic). Then
all this rational approximation code probably will go
to separate package.
…--
Waldek Hebisch
|
Indeed, this due to a bug that I didn't given enough precision in division when the length Given that correction is made, I feel no objection to using binary float for elementary functions.
Yes, please do so. btw right now I find Which is |
|
On Wed, Dec 01, 2021 at 04:47:26PM -0800, LdBeth wrote:
Yes, please do so. btw right now I find `log(x)$Float` gives bad approximation when `x` is close enough to 1
```
(16) -> log 1.0000001
(16) 0.9999999500_0002764028 E -7
Type: Float
```
Which is `0.999999950000003333333` with higher precision.
Hmm,
(1) -> x1 := 1.0000001
(1) 1.0000001
Type: Float
(3) -> l1 := log(x1)
(3) 0.9999999500_0002764028 E -7
Type: Float
(4) -> digits(50)
(4) 20
Type: PositiveInteger
(5) -> l1 - log(x1)
(5) 0.4318995886_1415005798_73530061 E -28
Type: Float
(7) -> x1 - 1.0000001
(7) 0.2430694534_5387021006_899885833 E -20
Type: Float
AFAICS accuracy is as expected. Note last result: using literal
form at different precision you are computing log of different
number.
…--
Waldek Hebisch
|
|
On Wed, Dec 01, 2021 at 04:47:26PM -0800, LdBeth wrote:
Indeed, this due to a bug that I didn't given enough precision in division when the length
of right operand is large.
Note that ATM dvide is affectively doing its own rounding towards
-infinity. Which means that it does not agree with rule for
rounding that you gave. Also, if you want perform correct
rounding you need to either have single rounding at the end
or _very_ carefuly prove that what you do has equivalent
effect. In particular this affects multiplication and
division by integers, it looks that currently they may
round wrong (I do not know if you want them to always round correctly).
Note: I do not promise correct rounding in Float, so for operations
where you want correct rounding Float is usualy bad example.
With the correction been made `float(177088743107611695514, -67, 2)$DecimalFloat`
gives `1.2000000000_000000000`.
There are two other problems. Exponentiation used in float may
lead to nontrivial loss of accuracy for large exponents. If
binary powering is in use, then powering implies number
of multiplications of order of length(e) where length(e) is
length of exponent in bits. Each may cause error of order
of least significant bit so you need something like 2*log2(length(e))
extra bits of accuracy. More efficient version would work
with log of exponent, where you need to increase accuracy to
lenth(e).
This is already problem in Float version, apparently nobody tried
it for large exponents (normally constants have exponents in
reasonably small range where the problem is not visible).
Another problem is if you want correct rounding in float, then
you need dance with computing approximation, checking it it can
be rounded without doubt and repeating in higher precision when
needed. As I wrote, there is no promise of correct rounding in
Float, so I do not consider this as problem for float version.
I do not know if you want to promise correct rounding for float,
if you do then probably it makes sense to have internal version
with no effort for correct rounding (it makes little sense to
spend a lot of effort on correct rounding of things that you
will throw out before user can see them).
…--
Waldek Hebisch
|
My code for
I used rounded addition by calling So far I'm looking at other methods for |
|
On Wed, Dec 01, 2021 at 10:30:21PM -0800, LdBeth wrote:
> Note that ATM dvide is affectively doing its own rounding towards
-infinity. Which means that it does not agree with rule for
rounding that you gave.
My code for `dvide` is different from `Float`, the basic idea is:
1. integer division, if it can do exact division then return exact result.
2. it is guaranteed that `LENGTH y.mantissa - LENGTH r` is positive after step 1.
3. get the fraction r / y with exceeded precision. remove trailing zeros.
You have
mw := (r * POW10(ew)) quo y.mantissa
in this step. quo is discarding remainder, which is equvalent to
rounding.
4. adds up integer part with the fraction part
That performs _second_ rounding. In general two roundings give different
result than single rounding. Statisticaly, most of the time result
from two roundings will be the same as single rounding. So, it
is hard to check your routines by testing (testing will catch
crude errors, but normally misses more subtle ones). For basic
four operations there is simple, obviously correct algorithm:
do operation treating your numbers as exact rationals and then
find best approximation by decimal fraction of requested
precision (where "best" is decided by rounding rule). To
have faster code normaly implementaions take various shortcuts.
But for each shotcut you need a proof that it gives the
same result as maive version. Even addition requires proof.
For example, with round towards +infinity rule adding
anything positive would get strictly bigger result. So
current addition code would be wrong. In other words,
correctness of current addition code depends on fact that
rounding rule is from "round to nearest" family. The point
of the above is that close to decision boundary tiny change
can lead to different rounding. Unless you already did so
I suggest to Google for "table makers dillema", there are
references which more deeply explain related issue.
…--
Waldek Hebisch
|
|
On Thu, 02 Dec 2021 23:10:08 +0800, hebisch wrote:
in this step. quo is discarding remainder, which is equvalent to
rounding.
I suppose `(r * POW10(ew)) quo y.mantissa` only does truncation in
decimal, and with excesses precision it won't affect the analog
round-half-up, since only one decimal digits after the precision is
used. No?
It is actually quite close to work out the results in fixed point
number and giving the separated exponents part.
On Thu, 02 Dec 2021 23:10:08 +0800, hebisch wrote:
Unless you already did so
I suggest to Google for "table makers dillema", there are
references which more deeply explain related issue.
I see that means it is reasonable to have transcendental functions
with up to 1 ulp error for arbitrary precision float. And it does in
turn affects decimal float as well.
I guess that gives me the idea it is better to use a different class
of algorithms in decimal float point.
Best,
---
LDB
|
|
On Fri, Dec 03, 2021 at 12:17:54AM -0800, LdBeth wrote:
On Thu, 02 Dec 2021 23:10:08 +0800,
hebisch wrote:
>
> in this step. quo is discarding remainder, which is equvalent to
> rounding.
I suppose `(r * POW10(ew)) quo y.mantissa` only does truncation in
decimal, and with excesses precision it won't affect the analog
round-half-up, since only one decimal digits after the precision is
used. No?
Unless you prove otherwise "since only one decimal digits after the
precision is used" is likely to be wrong: if you are close to the
tie digits very far away can flop result in different direction.
Note that you have signed numbers and quo uses funny rule to
determine sign of remainder, so even if the code works for one
combination of signs it does not mean that it will work for all.
BTW: you said that ties rund upward but did not say if this also
applies to negative numbers.
…--
Waldek Hebisch
|
|
On Fri, 03 Dec 2021 21:33:46 +0800, hebisch wrote:
Unless you prove otherwise "since only one decimal digits after the
precision is used" is likely to be wrong: if you are close to the
tie digits very far away can flop result in different direction.
Note that you have signed numbers and quo uses funny rule to
determine sign of remainder, so even if the code works for one
combination of signs it does not mean that it will work for all.
BTW: you said that ties rund upward but did not say if this also
applies to negative numbers.
First I would explain the rounding. Since the Decimal Arith specifies
sign bit is seperated from the mantissa, `round(x) IS sign(x) *
round(abs(x))` is presumed.
And in addition to that, to stay with some extends of compatible with
original `Float` and `DoubleFloat`, `IF x = 0 THEN sign(x) = 1`.
So floating zero is always positive.
About the divide function:
From the lisp code, and Common Lisp spec, we can know that:
[q, r] := divide(x, y)$Integer
q * y + r = x IS ENSURED
AND
q IS x/y rounded towards zero
AND
q = x quo y
(Guarding that r ~= 0) rewriting (for y ~= 0), can get:
q + r / y = x / y
so at least we know the sign of remainder won't bother, because `q`
must have the same sign with `r / y`. Due to how sign is manipulated
in typical floating, (- a) + (- b) MUST BE IDENTICAL TO - (a + b), and
the rounding rule, there should be no rounding problem cause by sign.
Hence it is just a question left if we can deduce
`q' + r' quo y = x' quo y`, when q', r', x' are q, r, y multiplied
by the same power of 10. And in fact it is true.
…---
LDB
|
|
On Fri, Dec 03, 2021 at 06:42:10AM -0800, LdBeth wrote:
On Fri, 03 Dec 2021 21:33:46 +0800,
hebisch wrote:
>
> Unless you prove otherwise "since only one decimal digits after the
> precision is used" is likely to be wrong: if you are close to the
> tie digits very far away can flop result in different direction.
> Note that you have signed numbers and quo uses funny rule to
> determine sign of remainder, so even if the code works for one
> combination of signs it does not mean that it will work for all.
>
> BTW: you said that ties rund upward but did not say if this also
> applies to negative numbers.
First I would explain the rounding. Since the Decimal Arith specifies
sign bit is seperated from the mantissa, `round(x) IS sign(x) *
round(abs(x))` is presumed.
And in addition to that, to stay with some extends of compatible with
original `Float` and `DoubleFloat`, `IF x = 0 THEN sign(x) = 1`.
So floating zero is always positive.
About the divide function:
>From the lisp code, and Common Lisp spec, we can know that:
[q, r] := divide(x, y)$Integer
q * y + r = x IS ENSURED
AND
q IS x/y rounded towards zero
AND
q = x quo y
(Guarding that r ~= 0) rewriting (for y ~= 0), can get:
q + r / y = x / y
so at least we know the sign of remainder won't bother, because `q`
must have the same sign with `r / y`. Due to how sign is manipulated
in typical floating, (- a) + (- b) MUST BE IDENTICAL TO - (a + b), and
the rounding rule, there should be no rounding problem cause by sign.
OK.
Hence it is just a question left if we can deduce
`q' + r' quo y = x' quo y`, when q', r', x' are q, r, y multiplied
by the same power of 10. And in fact it is true.
IIUC you basically say that
(q, r) := divide(x.mantissa, y.mantissa)
ew :I := LENGTH y.mantissa - LENGTH x.mantissa + digits() + 1
mw := (r * POW10(ew)) quo y.mantissa
q*POW10(ew) + mw
is doing the same thing as
ew :I := LENGTH y.mantissa - LENGTH x.mantissa + digits() + 1
mw := (x.mantissa*POW10(ew)) quo y.mantissa
In such case why bother with splitting quotient into two stages?
The second version is simpler and likely faster.
BTW: It seems that you should do
ew :I := LENGTH y.mantissa - LENGTH x.mantissa + digits() + 1
ew := max(ew, 0)
mw := (x.mantissa*POW10(ew)) quo y.mantissa
Otherwise there will be error due to negative power of 10.
Another thing is removal of trailing digits. AFAICS it should be
done in round and maybe display routines, but nowhere else.
With earlier version of your code I get:
(1) -> digits(4)$DecimalFloat
(1) 20
Type: PositiveInteger
(2) -> x := ***@***.***
(2) 1001
Type: DecimalFloat
(3) -> digits(3)$DecimalFloat
(3) 4
Type: PositiveInteger
(4) -> x/1
(4) 1001
Type: DecimalFloat
(5) -> ***@***.***
(5) 1.00 E 3
Type: DecimalFloat
(6) -> ***@***.***)
(6) 1001
Type: DecimalFloat
ATM it seems that result of division had extra unwanted digit
of accuracy.
Note: the above was intended to test removal of trailing zeros
(which you say should be done) but looks like separate issue.
The point is that before round you have possible trailing guard
digit which round may remove. But AFAICS round does not remove
zeros which may be exposed when guard digit is removed by round.
BTW: I am not sure what spec exactly say about removal of
trailing zeros. Surely, hardware decimal float can not
magicaly shorten it representation and trailing zeros must
be there. So, logically the only place which really need
to deal with trailing zeros is printing routine. Or maybe
mantissa and exponent (if they are considered part of the
spec).
…--
Waldek Hebisch
|
|
On Sat, 04 Dec 2021 08:08:10 +0800, hebisch wrote:
ATM it seems that result of division had extra unwanted digit
of accuracy.
Err, I didn't expected shrinking of precesion would happen,
The cause is
```
if r = 0 then
[q, x.exponent - y.exponent]
```
The result lacks rounding. Doing a `LENGTH` in DecimalFloat is more
expensive than `length` in for binary, since when `r = 0` then `q`
would be exact, I choose to not round the result.
That a mistake indeed, I shall fix that.
BTW: It seems that you should do
ew :I := LENGTH y.mantissa - LENGTH x.mantissa + digits() + 1
ew := max(ew, 0)
mw := (x.mantissa*POW10(ew)) quo y.mantissa
Otherwise there will be error due to negative power of 10.
it is
```
ew :I := LENGTH y.mantissa - LENGTH r + digits() + 1
```
So, `x.mantissa * POW10(ew)` would not make sense if `ew` is negative.
The `dvide$Float` uses `shift2` which would be fine.
Doing a `(q, r) := divide(x.mantissa, y.mantissa)` can ensure
`LENGTH r <= LENGTH y.mantissa`, so `ew` is always positive.
In such case why bother with splitting quotient into two stages?
The second version is simpler and likely faster.
Another thing is removal of trailing digits. AFAICS it should be
done in round and maybe display routines, but nowhere else.
It is due to requirements from General Decimal Arithmetic.
Setting precision to 9 digits,
divide(1, 3) ==> 0.333333333
divide(5, 2) ==> 2.5
divide(1, 10) ==> 0.1
divide(12, 12) ==> 1
divide(2.400, 2.0) ==> 1.20
And,
| There is a one-to-one mapping between abstract representations and the
| result of this conversion. That is, every abstract representation has
| a unique to-scientific-string representation.
Which means that for example, float(1000, -3)$DF should be
displayed as `1.000`, different to float(1,3)$DF that is
displayed as `1E3`.
For the reason behind that can see FAQ on Decimal Arithmetic,
Why are the encodings unnormalized?
http://speleotrove.com/decimal/decifaq5.html#unnlay
Why is the arithmetic unnormalized?
http://speleotrove.com/decimal/decifaq4.html#unnari
…---
LDB
|
|
On Fri, Dec 03, 2021 at 05:50:05PM -0800, LdBeth wrote:
On Sat, 04 Dec 2021 08:08:10 +0800,
hebisch wrote:
>
> ATM it seems that result of division had extra unwanted digit
> of accuracy.
Err, I didn't expected shrinking of precesion would happen,
That is normal thing when precision is settable. You calculate
critical things in higher precion and then use result in lower
precion calculations...
The cause is
```
if r = 0 then
[q, x.exponent - y.exponent]
```
The result lacks rounding. Doing a `LENGTH` in DecimalFloat is more
expensive than `length` in for binary, since when `r = 0` then `q`
would be exact, I choose to not round the result.
That a mistake indeed, I shall fix that.
Yes, computing decimal length exactly has its costs. But if you
depenend on length for correctness you (sometimes) need to pay the
cost. Length is critical for correct display and WIDTH function
in src/interp/i-output.boot tries to be correct and reasonably
fast. To say the truth it looks that currently length10 can
give wrong result for very large numbers. Namely, for correct
operation you need lower bound on log10(abs(x)). Magic constant
4004/13301 is estimate of log(2)/log(10) from _above_. So,
once lenth(x) is big enough you will overestimate and may get too
large result.
BTW1: Why dont you replace x by abs(x) at the beginning of length10?
IIUC you want to ignore sign when determining length.
BTW2: Code in WIDTH makes significant effort to avoid computing
powers of 10: uses pre-computed small powers and computes power
of 10 only if number is quite close to power of 10. Also, instead
of quotient it is enough to compare with power of 10, which is
cheaper. Also note that 10000 bound on exactly computed lenth
is a pragmatic compromise of display subsystem. The method with
magic constants as now would work up to about length of order
10e9. Tweaking constants one can get up to 10e12. Replacing
doubles with Float should allow handling any practicaly reachable
length.
> BTW: It seems that you should do
>
> ew :I := LENGTH y.mantissa - LENGTH x.mantissa + digits() + 1
> ew := max(ew, 0)
> mw := (x.mantissa*POW10(ew)) quo y.mantissa
>
> Otherwise there will be error due to negative power of 10.
it is
```
ew :I := LENGTH y.mantissa - LENGTH r + digits() + 1
```
I looked at diff at link you gave.
So, `x.mantissa * POW10(ew)` would not make sense if `ew` is negative.
The `dvide$Float` uses `shift2` which would be fine.
Doing a `(q, r) := divide(x.mantissa, y.mantissa)` can ensure
`LENGTH r <= LENGTH y.mantissa`, so `ew` is always positive.
> In such case why bother with splitting quotient into two stages?
> The second version is simpler and likely faster.
Now there is even more reasons to ask this.
>
> Another thing is removal of trailing digits. AFAICS it should be
> done in round and maybe display routines, but nowhere else.
It is due to requirements from General Decimal Arithmetic.
Setting precision to 9 digits,
divide(1, 3) ==> 0.333333333
divide(5, 2) ==> 2.5
divide(1, 10) ==> 0.1
divide(12, 12) ==> 1
divide(2.400, 2.0) ==> 1.20
Those are examples. What is the rule? Looking at first four
examples one can suspect rule "remove trailing zeros from
exact results", but this is clearly not true in last example.
Do I understand correctly that
divide(10^9*divide(1, 3), 333333333)
should give 1 (as opposed to 1.000_000_00)?
What about
divide(2.400, 2)
divide(2400, 2)
divide(12e5, 2e5)
divide(1200, 200)
What about case when x is of higher precion with "exact"
representation, the same for y, quotient is not exact,
but after second quo you get several trailing zeros?
That is something like
divide(12300000001, 123)
where 12300000001 is produced in higher precision.
What about additon and multiplication?
Anyway, AFAICS you have zeros-removal code at least in two places,
it makes sense to have as single routine for this. Also, it
looks fishy to round _after_ zeros-removal. Namely, if
you actually round, then zeros-removal was not needed.
And,
| There is a one-to-one mapping between abstract representations and the
| result of this conversion. That is, every abstract representation has
| a unique to-scientific-string representation.
Which means that for example, float(1000, -3)$DF should be
displayed as `1.000`, different to float(1,3)$DF that is
displayed as `1E3`.
That is almost clear, but does not explain what should be the result
of given operation. Almost part deals fact that in usual
scientific representation 3.33e2 is considerd equal to
0.333e3 and in turn equal to 333.
For the reason behind that can see FAQ on Decimal Arithmetic,
Why are the encodings unnormalized?
http://speleotrove.com/decimal/decifaq5.html#unnlay
Why is the arithmetic unnormalized?
http://speleotrove.com/decimal/decifaq4.html#unnari
The links above (and a lot of material on site above) is just
propaganda. For implementation we need clear rules (which
I was not able to find after quick look at this site).
…--
Waldek Hebisch
|
|
On Sat, 04 Dec 2021 14:09:15 +0800, hebisch wrote:
BTW1: Why dont you replace x by abs(x) at the beginning of length10?
IIUC you want to ignore sign when determining length.
length(-x) for positive x will be 1 smaller than length(x), since
right now I'm using quo for length10 computation to correct the
estimation that won't affect the result. However if length10 would be
changed accordingly to improve the efficiency then ... it would need
such a change.
> divide(2.400, 2.0) ==> 1.20
Those are examples. What is the rule? Looking at first four
examples one can suspect rule "remove trailing zeros from
exact results", but this is clearly not true in last example.
That last one example isn't exceptional, it is because
2.400 is 2400E-3 and 2.0 is 20E-1, divide(2400,20) is exact.
The result is (2400/20)E(-3-(-1)), 120E2, 1.2 .
There is indeed an algorithm been describe in the spec
http://speleotrove.com/decimal/daops.html#refdivide
However I choose to not directly follow that, because that involves a
chain of substraction and shift in base 10, far less efficient than
the one in Float. Actually, many implementation of that spec are using
text string to represent floating point numbers, and shifting with
base 10 would be easy that way, but I is isn't very convenient to
program strings in SPAD.
So instead of calculate division digits by digits, I choose to get the
remainder, calculate a "decimal expansion" from it, then add it with the
quotient.
*
I do aware now that with the current trailing zero removal approach,
if the decimal expansion has a zero in it, and a precision setting
happens to let it becomes the last digit, the result would be
different from the definition fro the spec. Thus I'm planning to
correct it by add test for exact division/sqrt before performing trailing
zero removal.
*
Do I understand correctly that
divide(10^9*divide(1, 3), 333333333)
should give 1 (as opposed to 1.000_000_00)?
Yes.
What about case when x is of higher precion with "exact"
representation, the same for y, quotient is not exact,
but after second quo you get several trailing zeros?
That is something like
divide(12300000001, 123)
where 12300000001 is produced in higher precision.
What about additon and multiplication?
The spec does not cover particular issues with multiple precision
implementation. Assuming following the definition directly,
divide(12300000001, 123) should be 100000000E0.
The spec to addition and multiplication are close enough to
typical floating point implementation.
http://speleotrove.com/decimal/daops.html#refaddsub
http://speleotrove.com/decimal/daops.html#refmult
…---
LDB
|
|
On Sat, Dec 04, 2021 at 01:05:20AM -0800, LdBeth wrote:
On Sat, 04 Dec 2021 14:09:15 +0800,
hebisch wrote:
>
> BTW1: Why dont you replace x by abs(x) at the beginning of length10?
> IIUC you want to ignore sign when determining length.
length(-x) for positive x will be 1 smaller than length(x), since
right now I'm using quo for length10 computation to correct the
estimation that won't affect the result. However if length10 would be
changed accordingly to improve the efficiency then ... it would need
such a change.
Well, current version _may_ work. My point was that using abs(x) is
obviously correct and there seem to be no advantage in not using it.
> > divide(2.400, 2.0) ==> 1.20
>
> Those are examples. What is the rule? Looking at first four
> examples one can suspect rule "remove trailing zeros from
> exact results", but this is clearly not true in last example.
That last one example isn't exceptional, it is because
2.400 is 2400E-3 and 2.0 is 20E-1, divide(2400,20) is exact.
The result is (2400/20)E(-3-(-1)), 120E2, 1.2 .
There is indeed an algorithm been describe in the spec
http://speleotrove.com/decimal/daops.html#refdivide
However I choose to not directly follow that, because that involves a
chain of substraction and shift in base 10, far less efficient than
the one in Float. Actually, many implementation of that spec are using
text string to represent floating point numbers, and shifting with
base 10 would be easy that way, but I is isn't very convenient to
program strings in SPAD.
AFAICS the following (untested) implements the spec:
shorten(q : I, eq : I, adjust : I) : % ==
while adjust > 0 repeat
(q1, r1) := divide(q, 10)
r1 ~= 0 => break
q := q1
adjust := adjust - 1
round([q, eq - adjust])
dvide(x : %, y : %) : % ==
mx := x.mantissa
my := y.mantissa
ex := x.exponent
ey := y.exponent
adjust := max(LENGTH(my) - LENGTH(mx) + digits() + 1, 0)
(q, r) := divide(mx*POW10(adjust), my)
-- if inexact, then just round
r ~= 0 => round([q, ex - ey - adjust])
shorten(q, ex - ey, adjust)
This "chain of substraction and shift in base 10" is really long
division. adjust above normally is larger than their adjust
when they stop long division. If division is exact than the
loop is shorten makes sure that adjust is the same as when
there is early exit from long division.
Remarks:
- just calling round is OK only because of round-half-up rule,
otherwise round would also need remainder.
- one could optimize by replacing LENGTH(my) by upper estimate
and LENGTH(mx) by lower estimate (such estimates are much
cheaper than exact lenth).
- in shorten most of the time there is no need for round. But
one would need to compute lenth of q (or estimate of lenth)
to know.
- as you noted shoten could be optimized (say using binary
search).
- AFAICS the same shorthen could be used in square root.
BTW: For sqrt IIUC they require round-half-even even if
another rounding mode is in use for other operations.
…--
Waldek Hebisch
|
|
sorry for the delay of reply, I was been busy for end term stuff for a while.
Sure, I'll have that fixed.
That looks good. I'll use that for the definition of
Yes, however the spec does not mention what to do if the precision |
|
On Tue, Dec 21, 2021 at 09:55:47PM -0800, LdBeth wrote:
sorry for the delay of reply, I was been busy for end term stuff for a while.
I hope you will be able to finish this.
Yes, however the spec does not mention what to do if the precision
of operand is higher than current precision. If we insist on correct rounding
in that case, I guess I have to do a correct implementation of `round-half-even`.
I think we should not make exceptions due to precision, that is do
what spec say...
…--
Waldek Hebisch
|
Improve conformance to decarith; add testcase And fix Makefile.in update. fix bug in division DecimalFloat support in SPAD fix obvious problems in divide and sqrt fix sqrt new exp wip still need exp1 add cached exp1 stage changes
Resolves proposal #69
Ready for review.