Requirements:
OpenFFT
requires FFTW, a C compiler, and MPI. Fortran users may also need a
Fortran compiler to compile the Fortran sample program.
Step 1: Download and install FFTW from here. Assume that
FFTW is installed in /opt/fftw3.
Step 2: Download and extract the OpenFFT tarball
from
here.
Assume that OpenFFT is extracted to /opt/openfft1.0.
Step 3: Modify CC (the C compiler) and LIB (the library path
to FFTW) in makefile in the root folder of OpenFFT to reflect your
environment. Fortran users may also specify FC (the Fortran compiler)
to compile the sample program. Samples of CC (and FC) and LIB in
several environments are given in
makefile. It is worth noting that -Dkcomp should be specified in CC in
the case of the K computer for performance optimization.
CC = mpicc -O3
-I/opt/fftw3/include -I./include
LIB = -L/opt/fftw3/lib -lfftw3
FC = mpif90 -O3
-I/opt/fftw3/include -I./include
Step 4: Issue make to compile and
install the OpenFFT
library. The library will be available at
/opt/openfft1.0/lib/libopenfft.a.
Step 5: Link the OpenFFT library to compile a user program.
mpicc -O3 -o userprogram
userprogram.c -I/opt/fftw3/include -I/opt/openfft1.0/include
-L/opt/fftw3/lib -lfftw3 -L/opt/openfft1.0/lib -lopenfft
mpif90 -O3 -o userprogram
userprogram.f90 -I/opt/fftw3/include -I/opt/openfft1.0/include
-L/opt/fftw3/lib -lfftw3 -L/opt/openfft1.0/lib -lopenfft
Domain Decomposition
OpenFFT
adopts
a 2-D decomposition method that is capable of reusing data when
transposing from one dimension to another
to reduce the
total volume of
communication. Also, the decomposition is adaptive, and automatically
switches between 1-D and 2-D depending on the number of processes and
data size. OpenFFT decomposes in the order of abc, cab, and cba for
performing the 1-D FFTs along the c-, b-, and a-axes, respectively.
Please refer to the publications for detail.
Calling OpenFFT
from a C user program
The
C sample
programs, especially sample01.c, illustrate how to call OpenFFT from a
C
user program. Basically, it involves several steps as follows.
Step
1: Include the OpenFFT header file, openfft.h, in
the program.
#include
<openfft.h>
Step 2: Initialize OpenFFT by calling openfft_initialize().
openfft_initialize(N1,N2,N3,measure_time,print_memory,
&My_Max_NumGrid,&My_NumGrid_In,My_Index_In,&My_NumGrid_Out,My_Index_Out);
Input: 3 dimensions of data:
N1, N2, N3, measure_time and print_memory (0: disabled, 1: enabled).
Output: arrays allocated and
variables initialized.
My_Max_NumGrid: the maximum number of grid points allocated to a
process, used for allocating local arrays.
My_NumGrid_In: the number of grid points allocated to a process when
starting.
My_Index_In: the 6 indexes of grid points allocated to a
process when starting.
My_NumGrid_Out: the number of grid points allocated to a process when
finishing.
My_Index_Out: the 6 indexes of grid points allocated to a
process when finishing.
Step 3: After openfft_initialize() is called,
important
variables are initialized, and can be used for allocating and
initializing local data input and output arrays.
- Allocate the local data input and output arrays based on
variable My_Max_NumGrid, which is the maximum number of grid
points allocated to a process during the
transformation.
input =
(dcomplex*)malloc(sizeof(dcomplex)*My_Max_NumGrid);
output =
(dcomplex*)malloc(sizeof(dcomplex)*My_Max_NumGrid);
- Initialize the local input array from the global input
array. A process is allocated (My_NumGrid_In) grid points
continuously from AasBbsCcs to AaeBbeCce of the 3-D global array,
where:
as
= My_Index_In[0]
bs
= My_Index_In[1]
cs = My_Index_In[2]
ae
= My_Index_In[3]
be
= My_Index_In[4]
ce = My_Index_In[5]
Step 4: Call openfft_execute() to transform input to
output.
openfft_execute(input,
output);
Step 5: Obtain the result stored in the local output
array. Upon exiting, a process is allocated (My_NumGrid_Out) grid
points continuously from CcsBbsAas to CceBbeAae of the 3-D global
array, where:
cs
= My_Index_Out[0]
bs
= My_Index_Out[1]
as = My_Index_Out[2]
ce
= My_Index_Out[3]
be
= My_Index_Out[4]
ae = My_Index_Out[5]
Step 6: Finalize the calculation by calling
openfft_finalize().
openfft_finalize();
Calling OpenFFT
from a Fortran user program
The
Fortran sample
programs, sample05.f90, illustrate how to call OpenFFT from a Fortran
user program. Basically, it is similar to calling from C, except for
the indexes that must be increased by 1.
Step 1: Include
the Fortran interface and the standard iso_c_binding
module
for defining the equivalents of C types (
integer(C_INT)
for
int
, real(C_DOUBLE)
for
double
, complex(C_DOUBLE_COMPLEX)
for dcomplex
,
etc.).
use, intrinsic ::
iso_c_binding
include 'openfft.fi'
Step 2: Initialize OpenFFT by calling openfft_initialize().
openfft_initialize(%VAL(N1),%VAL(N2),%VAL(N3),%VAL(measure_time),%VAL(print_memory),My_Max_NumGrid,My_NumGrid_In,My_Index_In,My_NumGrid_Out,My_Index_Out)
Input: 3 dimensions of data:
N1, N2, N3, measure_time and print_memory (0: disabled, 1: enabled).
Output: arrays allocated and
variables initialized.
My_Max_NumGrid: the maximum number of grid points allocated to a
process, used for allocating local arrays.
My_NumGrid_In: the number of grid points allocated to a process when
starting.
My_Index_In: the 6 indexes of grid points allocated to a
process when starting.
My_NumGrid_Out: the number of grid points allocated to a process when
finishing.
My_Index_Out: the 6 indexes of grid points allocated to a
process when finishing.
Step 3: After openfft_initialize() is called,
important
variables are initialized, and can be used for allocating and
initializing local data input and output arrays.
- Allocate the local data input and output arrays based on
variable My_Max_NumGrid, which is the maximum number of grid
points allocated to a process during the
transformation.
allocate
(input(My_Max_NumGrid))
allocate
(output(My_Max_NumGrid))
- Initialize the local input array from the global input
array. A process is allocated (My_NumGrid_In) grid points
continuously from AasBbsCcs to AaeBbeCce of the 3-D global array,
where:
as
= My_Index_In(1) + 1
bs
= My_Index_In(2) + 1
cs = My_Index_In(3) + 1
ae
= My_Index_In(4) + 1
be
= My_Index_In(5) + 1
ce = My_Index_In(6) + 1
Step 4: Call openfft_execute() to transform input to
output.
openfft_execute(input,
output)
Step 5: Obtain the result stored in the local output
array. Upon exiting, a process is allocated (My_NumGrid_Out) grid
points continuously from CcsBbsAas to CceBbeAae of the 3-D global
array, where:
cs
= My_Index_Out(1) + 1
bs
= My_Index_Out(2) + 1
as = My_Index_Out(3) + 1
ce
= My_Index_Out(4) + 1
be
= My_Index_Out(5) + 1
ae = My_Index_Out(6) + 1
Step 6: Finalize the calculation by calling
openfft_finalize().
openfft_finalize()
Benchmark
The
figures below show some benchmark results with OpenFFT and a couple of
other packages taken on a Cray XC30 machine and the K computer. The
number of data points is 256^3.
(a) Cray XC30
(b) K computer