david/eppeer
I tried out the EPPEER code, which uses two-step Runge-Kutta methods and OpenMP, because I’m thinking of writing a shared-memory parallel ODE solver code myself.
I downloaded the code from
http://www.mathematik.uni-marburg.de/~schmitt/peer/eppeer.zip
unzipped, and ran
gfortran -c mbod4h.f90
gfortran -c ivprkp.f90
gfortran -c -fopenmp ivpepp.f90
gfortran -fopenmp ivprkp.o ivpepp.o mbod4h.o ivp_pmain.f90
./a.out
I had to fix one line that was trying to open a logfile and failed. I also set
export OMP_NUM_THREADS=4
This runs the code with increasingly tight tolerances on a 400-body problem. The output was (I killed it before it finished the really tight tolerance run(s)
tol, err, otime, cpu 0.10E-01 0.10702 2.9556 10.534
steps,rej,nfcn: 337 88 1399
tol, err, otime, cpu 0.10E-02 0.93692E-01 4.9853 18.585
steps,rej,nfcn: 605 159 2465
tol, err, otime, cpu 0.10E-03 0.66604E-01 7.9798 30.365
steps,rej,nfcn: 994 244 4015
tol, err, otime, cpu 0.10E-04 0.47637E-01 12.026 46.477
steps,rej,nfcn: 1534 324 6175
tol, err, otime, cpu 0.10E-05 0.24241E-01 18.239 70.756
steps,rej,nfcn: 2338 415 9391
If I understand correctly, the last column is total CPU time; the next to last is wall time. For comparison, I ran it without parallelism:
export OMP_NUM_THREADS=1
Then I got the following:
tol, err, otime, cpu 0.10E-01 0.10702 10.382 10.382
steps,rej,nfcn: 337 88 1399
tol, err, otime, cpu 0.10E-02 0.93692E-01 18.297 18.297
steps,rej,nfcn: 605 159 2465
tol, err, otime, cpu 0.10E-03 0.66604E-01 29.814 29.815
steps,rej,nfcn: 994 244 4015
tol, err, otime, cpu 0.10E-04 0.47637E-01 45.854 45.855
steps,rej,nfcn: 1534 324 6175
tol, err, otime, cpu 0.10E-05 0.24241E-01 69.725 69.726
steps,rej,nfcn: 2338 415 9391
tol, err, otime, cpu 0.10E-06 0.53727E-02 105.47 105.48
steps,rej,nfcn: 3539 484 14195
The numbers of function evaluations were identical, confirming that the computations being performed were the same. The speedup (about 3x) is very nice. We should be able to achieve something similar with extrapolation.
These results are actually plotted in the user guide, at the end of Section 4.