david/eppeer

I tried out the EPPEER code, which uses two-step Runge-Kutta methods and OpenMP, because I’m thinking of writing a shared-memory parallel ODE solver code myself.

I downloaded the code from

http://www.mathematik.uni-marburg.de/~schmitt/peer/eppeer.zip

unzipped, and ran

gfortran -c mbod4h.f90 
gfortran -c ivprkp.f90 
gfortran -c -fopenmp ivpepp.f90 
gfortran -fopenmp ivprkp.o ivpepp.o mbod4h.o ivp_pmain.f90
./a.out

I had to fix one line that was trying to open a logfile and failed. I also set

export OMP_NUM_THREADS=4

This runs the code with increasingly tight tolerances on a 400-body problem. The output was (I killed it before it finished the really tight tolerance run(s)

 tol, err, otime, cpu  0.10E-01 0.10702      2.9556      10.534    
 steps,rej,nfcn:  337   88     1399
 tol, err, otime, cpu  0.10E-02 0.93692E-01  4.9853      18.585    
 steps,rej,nfcn:  605  159     2465
 tol, err, otime, cpu  0.10E-03 0.66604E-01  7.9798      30.365    
 steps,rej,nfcn:  994  244     4015
 tol, err, otime, cpu  0.10E-04 0.47637E-01  12.026      46.477    
 steps,rej,nfcn: 1534  324     6175
 tol, err, otime, cpu  0.10E-05 0.24241E-01  18.239      70.756    
 steps,rej,nfcn: 2338  415     9391

If I understand correctly, the last column is total CPU time; the next to last is wall time. For comparison, I ran it without parallelism:

export OMP_NUM_THREADS=1

Then I got the following:

 tol, err, otime, cpu  0.10E-01 0.10702      10.382      10.382    
 steps,rej,nfcn:  337   88     1399
 tol, err, otime, cpu  0.10E-02 0.93692E-01  18.297      18.297    
 steps,rej,nfcn:  605  159     2465
 tol, err, otime, cpu  0.10E-03 0.66604E-01  29.814      29.815    
 steps,rej,nfcn:  994  244     4015
 tol, err, otime, cpu  0.10E-04 0.47637E-01  45.854      45.855    
 steps,rej,nfcn: 1534  324     6175
 tol, err, otime, cpu  0.10E-05 0.24241E-01  69.725      69.726    
 steps,rej,nfcn: 2338  415     9391
 tol, err, otime, cpu  0.10E-06 0.53727E-02  105.47      105.48    
 steps,rej,nfcn: 3539  484    14195

The numbers of function evaluations were identical, confirming that the computations being performed were the same. The speedup (about 3x) is very nice. We should be able to achieve something similar with extrapolation.

These results are actually plotted in the user guide, at the end of Section 4.