Jean-François COLONNA

CMAP (Centre de Mathématiques APpliquées) UMR CNRS 7641, Ecole Polytechnique, CNRS, 91128 Palaiseau Cedex, France

france telecom, France Telecom R&D

[

[

[

(CMAP28 WWW site: this page was created on 05/14/1997 and last updated on 11/28/2020 11:52:38 -CET-)

IN PROGRESS/EN COURS:

- 1-INTRODUCTION:
- 2-IN PROGRESS/EN COURS:
- 3-IN PROGRESS/EN COURS:
- 3.1-IN PROGRESS/EN COURS:
- 3.2-IN PROGRESS/EN COURS:
- 3.3-IN PROGRESS/EN COURS:
- 3.4-IN PROGRESS/EN COURS:
- 3.5-IN PROGRESS/EN COURS:
- 3.6-IN PROGRESS/EN COURS:
- 3.7-IN PROGRESS/EN COURS:
- 3.8-IN PROGRESS/EN COURS:
- 3.9-IN PROGRESS/EN COURS:
- 3.10-IN PROGRESS/EN COURS:
- 3.11-IN PROGRESS/EN COURS:
- 3.12-IN PROGRESS/EN COURS:
- 3.13-IN PROGRESS/EN COURS:
- 3.14-IN PROGRESS/EN COURS:
- 3.15-IN PROGRESS/EN COURS:

#!/bin/csh set history=1000 alias ll 'ls -al'IN PROGRESS/EN COURS

ll | moreIN PROGRESS/EN COURS

|archives |------------------- | |courriers | |------------------- | | |in | | |-- | | | | | |out | | --- | |articles | -------- |donnees |------- | |resultats |------------------- | |bruts | |----- | | | |images | ------------------- | |animations | |---------- | | | |fixes | ----- |programmes ------------------- |C |------------------- | |sources | |------- | | | |executables | ----------- |Fortran ------------------- |sources |------- | |executables -----------IN PROGRESS/EN COURS

- the compiler replaces 32-bit computations with 64-bit ones,
- the compiler translates in the same way the two expressions (AxB)xC and Ax(BxC).
- the syntax used to describe a mathematical model (as exhibited with this first "experiment"),
- the compiler used, including its release and the options (as exhibited with this second "experiment"),
- the processor,
- the heterogeneity of the systems,...
- 'A' and 'B' are two equal numbers stored in the 80-bit floating point unit ('FPU'). At that time, it is possible to add them, to multiple them,... and to compare them without ambiguity.
- Suppose that due to a lack of floating point resources, the number A is moved to memory (and then in a 64-bit format) and suppose that the optimizer "ignores" (as it is the case unfortunately...) that move.
- Imagine that some instructions later, 'A' (now a 64-bit number) and 'B' (still a 80-bit number) have to be compared; 'A' is moved from memory to a floating point register and 26 (null) bits are added to it. A and B are no more equal!

IN PROGRESS/EN COURS

int a=10000,b=0,c=8400,d,e=0,f[8401],g; main(){for(;b-c;){f[b++]=a/5;}for(;d=0,g=c*2;c-=14,printf("%.4d",e+d/a),e=d%a) {for(b=c;d+=f[b]*a,f[b]=d%--g,d/=g--,--b;d*=b);}}

IN PROGRESS/EN COURS

main(){int N=100000,n=N,a[N],x;while(--n)a[n]=1+1/n; for(;N>9;printf("%d",x)) for(n=N--;--n;a[n]=x%n,x=10*a[n-1]+x/n);}

[see the C program example]

[see a K program example]

[see the C program example]

[see a K program example]

[see the C program example]

[see a K program example]

[see the C program example]

[see a K program example]

[see the C program example]

[see a K program example]

[see the C program example]

[see a K program example]

#define Fpow(x,y) pow((double)x,(double)y)

[see the C program example]

[see a K program example]

[see the C program example]

[see a K program example]

[see the C program example]

[see a K program example]

[see the C program example]

[see a K program example]

As an example of the consequence of the

S = 0 0

S = S + 1 n n-1

S = n n

#define N "an arbitrary integer positive value..."

main() { int n; float sigma=0;

for (n=1 ; n <= N ; n++) { sigma = sigma + 1.0; }

printf("\n computed sum = %f",sigma); }

Another consequence is

float MUL(x,y) float x,y; { return(x*y); }

main() { float A=3.1,B=2.3,C=1.5; float X,Y;

X=MUL(MUL(A,B),C); Y=MUL(A,MUL(B,C));

printf("\n (%f x %f) x %f = %f\n",A,B,C,X); printf("\n %f x (%f x %f) = %f\n",A,B,C,Y); }

(3.10 * 2.30) * 1.50 = 10.695000 = 0x412B1EB8 #### #

3.10 * (2.30 * 1.50) = 10.694999 = 0x412B1EB7 #### #

This last experiment could be tried with the following program:

main() { float A=3.1,B=2.3,C=1.5; float X,Y;

X=(A*B)*C; Y=A*(B*C);

printf("\n (%f x %f) x %f = %f",A,B,C,X); printf("\n %f x (%f x %f) = %f",A,B,C,Y); }

Then, the best thing to do is to use the program defining the '

The property of associativity for addition is lost too. Using 32-bit floating point numbers (in order to simplify the hexa-decimal printing -see the same experiment us ing 64 bit double precision-), the following progam (where the function '

float ADD(x,y) float x,y; { return(x+y); }

main() { float A=1.1,B=3.3,C=5.5; float X,Y;

X=ADD(ADD(A,B),C); Y=ADD(A,ADD(B,C));

printf("\n (%f x %f) x %f = %f\n",A,B,C,X); printf("\n %f x (%f x %f) = %f\n",A,B,C,Y); }

(1.10 + 3.30) + 5.50 = 9.900000 = 0x411E6666 # #

1.10 + (3.30 + 5.50) = 9.900001 = 0x411E6667 # #

[Obviously a similar anomaly can be seen with the distributivity of the multiplication over the addition]

When studying

2 X = (R+1)X - RX n n-1 n-1

X 0

| --------------- | | \|/ \|/ ' '

2 X = (R+1)X - RX 1 0 0

| --------------- | | \|/ \|/ ' '

2 X = (R+1)X - RX 2 1 1

| --------------- | | \|/ \|/ ' '

2 X = (R+1)X - RX 3 2 2

| --------------- | | \|/ \|/ ' '

etc...

main() { double R=3.0; double X=0.5; int n;

for (n=0 ; n <= 80 ; n++){if ((n%10) == 0){printf("\n iteration(%04d) = %9.6f",n,X);}else{}

X = (R+1)*X - R*X*X;}}

O200 Silicon Graphics (R10000, IRIX 6.5.5m, cc 7.2.1):

option '-O2' option '-O3'

X(00) = 0.500000 0.500000 X(10) = 0.384631 0.384631 X(20) = 0.418895 0.418895 X(30) = 0.046399 0.046399 X(40) = 0.320184 0.320192 X(50) = 0.063747 0.059988 X(60) = 0.271115 1.000531 X(70) = 1.328462 1.329692 X(80) = 0.817163 0.021952

16 16 10 + 1 = 10

O200 Silicon Graphics (processeur R10000, IRIX 6.5.5m, Java):

(R+1)X-R(XX) (R+1)X-(RX)X ((R+1)-(RX))X RX+(1-(RX))X X+R(X-(XX))

X(0000) = 0.5 0.5 0.5 0.5 0.5 X(0500) = 1.288736212247168 0.007057813075738616 1.2767485100695732 1.246534177059494 0.03910723014701789 X(1000) = 1.3327294162589722 0.916560711983132 1.207710752523091 0.27770146115891703 0.26663342726567785 X(1500) = 1.1448646685382955 0.4481000759915065 0.3102077001456977 0.015374092695375374 0.9841637252962943 X(2000) = 1.0548628914440754 0.896126931497168 0.6851138190159249 0.009229885271816535 0.3860923315999224 X(2500) = 1.292802584458599 0.06063433547953646 1.174118726001978 0.6922411856638806 0.020878761210912034 X(3000) = 1.0497821908090537 0.0219606878364607 1.3287403237319588 0.11354602472378028 0.13270749449424302 X(3500) = 0.8115039383609847 1.3213031319440816 0.6545151597367076 0.5760786099237328 1.324039473116061 X(4000) = 0.04922223042798102 1.3203298564077224 0.09243804931690679 0.9496284087750142 1.316597313359563 X(4500) = 0.4745896653599724 0.32865616721789603 0.018965010461877246 0.25384661313701296 0.18512853535354462

PC (processeur Pentium II, LINUX Mandrake 7.0, Java):

(R+1)X-R(XX) (R+1)X-(RX)X ((R+1)-(RX))X RX+(1-(RX))X X+R(X-(XX))

X(0000) = 0.5 0.5 0.5 0.5 0.5 X(0500) = 1.2887362122471679 0.00705781307573862 1.2767485100695732 1.2465341770675666 0.03910723014701789 X(1000) = 1.3327294162589722 0.91656071198313205 1.207710752523091 0.6676224369769922 0.26663342726567785 X(1500) = 1.1448646685382955 0.44810007599150647 0.31020770014569771 0.41049165176544455 0.98416372529629426 X(2000) = 1.0548628914440754 0.89612693149716804 0.68511381901592494 1.0026346845706315 0.3860923315999224 X(2500) = 1.3328681064703032 0.06063433547953646 1.1741187260019781 0.0154001182074282 0.02087876121091203 X(3000) = 1.2956769824648844 0.0219606878364607 1.3287403237319588 0.50504896336548377 0.13270749449424302 X(3500) = 0.19193027175727995 0.37986077053509781 0.6545151597367076 0.38299434265835819 1.324039473116061 X(4000) = 1.2491385720940165 0.96017143401896088 0.09243804931690679 0.6565274346305322 1.316597313359563 X(4500) = 0.00644889182443986 1.3185465795235045 0.01896501046187725 0.94966313327336349 0.18512853535354462

It is mandatory to give without ambiguity the name of the operating system and of the compiler, as well as their releases, when reporting that kind of experiment...

By the way this very small program cannot be wrong and does not use any numerical methods of approximations. All these wrong results shows that this program does not compute the values of the

scale = 010 020 030 040 050 060 070 080 090 100 n = 035 070 105 136 169 199 230 265 303 335

The preceding experiments were about non linear systems. But what is about

main() {The computed values (after a 'gcc' compilation without optimization on a Linux PC) are as follows:double B=4095.1;double A=B+1;

double x=1;

int n;

printf("initialisation x = %.16f\n",x);

for (n=1 ; n <= 9 ; n++){x = (A*x) - B;

printf("iteration %01d x = %+.16f\n",n,x);}}

initialisation x = +1.0000000000000000 iteration 1 x = +1.0000000000004547 iteration 2 x = +1.0000000018630999 iteration 3 x = +1.0000076314440776 iteration 4 x = +1.0312591580864137 iteration 5 x = +129.0406374377594148 iteration 6 x = +524468.2550088063580915 iteration 7 x = +2148270324.2415719032287598 iteration 8 x = +8799530071030.8046875000000000 iteration 9 x = +36043755123945184.0000000000000000

x = 1

A-B = 1 (mathematically speaking...)

Then, it is very important

B = 4095.1 = {0x40affe33,33333333} A = B+1 = 4096.1 = {0x40b00019,9999999a}

A-B = 1.0000000000004547 = {0x3ff00000,00000800} | 1.0 = {0x3ff00000,00000000}

The consequences are numerous; numerical results obtained with a computer will depend on:

These problems will be encountered

On march 2004, this phenomenon can be seen with the following program:

double f(x) double x; { return(1/x); }

main() { double x1=f(3.0); volatile double x2;

x2=x1;

if (x2 != x1) { printf("DIFFERENT\n"); } }

A universal solution (independent of the processor and of the compiler) to this problem seems to test numbers by means of functions, thus moving them to the stack using the 64-bit format. For example, the following test:

A > B

fIFGT(A,B)

unsigned int fIFGT(x,y) double x; double y; { return(x > y); }

Using functions in order to execute the elementary instructions (addition, substraction, multiplication, division, tests,...) of a program offers another advantage: it allows a

At last,

The N-body problem is a useful example of these problems.

So, why do we need real numbers to do physics?

[More information]

Copyright © France Telecom R&D and CMAP (Centre de Mathématiques APpliquées) UMR CNRS 7641 / Ecole Polytechnique, 1997-2020.