OpenFFT - An Open Source Parallel Package for 3-D FFTs

Manual of OpenFFT Version 1.0

Installation

Requirements: OpenFFT requires FFTW, a C compiler, and MPI. Fortran users may also need a Fortran compiler to compile the Fortran sample program.

Step 1: Download and install FFTW from here. Assume that FFTW is installed in /opt/fftw3.

Step 2: Download and extract the OpenFFT tarball from here. Assume that OpenFFT is extracted to /opt/openfft1.0.

Step 3: Modify CC (the C compiler) and LIB (the library path to FFTW) in makefile in the root folder of OpenFFT to reflect your environment. Fortran users may also specify FC (the Fortran compiler) to compile the sample program. Samples of CC (and FC) and LIB in several environments are given in makefile. It is worth noting that -Dkcomp should be specified in CC in the case of the K computer for performance optimization.

CC = mpicc -O3 -I/opt/fftw3/include -I./include
LIB = -L/opt/fftw3/lib -lfftw3
FC = mpif90 -O3 -I/opt/fftw3/include -I./include

Step 4: Issue make to compile and install the OpenFFT library. The library will be available at /opt/openfft1.0/lib/libopenfft.a.

Step 5: Link the OpenFFT library to compile a user program.

mpicc -O3 -o userprogram userprogram.c -I/opt/fftw3/include -I/opt/openfft1.0/include -L/opt/fftw3/lib -lfftw3 -L/opt/openfft1.0/lib -lopenfft
mpif90 -O3 -o userprogram userprogram.f90 -I/opt/fftw3/include -I/opt/openfft1.0/include -L/opt/fftw3/lib -lfftw3 -L/opt/openfft1.0/lib -lopenfft

Sample Programs

sample01.c: This program transforms input data values to output data values. It can be executed with an arbitrary number of processes. Its input and output should match the corresponding values in c06fxf3.r. This program does not require any input parameter.
sample02.c. This program is used for benchmarking the performance of OpenFFT with timing and GFLOPS results. It can be executed with an arbitrary number of processes. Time is measured by MPI_Wtime(). A numeric input parameter can be provided for specifying the size of the 3 dimensions. If no input parameter is provided, it will be executed with a default size of 64^3 data points.
sample03.c. This program is similar to sample02.c, except that time is measured by OpenFFT.
sample04.c. This program is similar to sample03.c, with more time measurements.
sample05.f90. This Fortran program is for illustrating how to call OpenFFT from a Fortran user program.

Domain Decomposition

OpenFFT adopts a 2-D decomposition method that is capable of reusing data when transposing from one dimension to another to reduce the total volume of communication. Also, the decomposition is adaptive, and automatically switches between 1-D and 2-D depending on the number of processes and data size. OpenFFT decomposes in the order of abc, cab, and cba for performing the 1-D FFTs along the c-, b-, and a-axes, respectively. Please refer to the publications for detail.

Calling OpenFFT from a C user program

The C sample programs, especially sample01.c, illustrate how to call OpenFFT from a C user program. Basically, it involves several steps as follows.

Step 1: Include the OpenFFT header file, openfft.h, in the program.

#include <openfft.h>

Step 2: Initialize OpenFFT by calling openfft_initialize().

openfft_initialize(N1,N2,N3,measure_time,print_memory, &My_Max_NumGrid,&My_NumGrid_In,My_Index_In,&My_NumGrid_Out,My_Index_Out);

     Input: 3 dimensions of data: N1, N2, N3, measure_time and print_memory (0: disabled, 1: enabled).
     Output: arrays allocated and variables initialized.
             My_Max_NumGrid: the maximum number of grid points allocated to a process, used for allocating local arrays.
             My_NumGrid_In: the number of grid points allocated to a process when starting.
             My_Index_In: the 6 indexes of grid points allocated to a process when starting.
             My_NumGrid_Out: the number of grid points allocated to a process when finishing.
             My_Index_Out: the 6 indexes of grid points allocated to a process when finishing.

Step 3: After openfft_initialize() is called, important variables are initialized, and can be used for allocating and initializing local data input and output arrays.

Allocate the local data input and output arrays based on variable My_Max_NumGrid, which is the maximum number of grid points allocated to a process during the transformation.

input = (dcomplex*)malloc(sizeof(dcomplex)*My_Max_NumGrid);

output = (dcomplex*)malloc(sizeof(dcomplex)*My_Max_NumGrid);

Initialize the local input array from the global input array. A process is allocated (My_NumGrid_In) grid points continuously from AasBbsCcs to AaeBbeCce of the 3-D global array, where:

as = My_Index_In[0]

bs = My_Index_In[1]
cs = My_Index_In[2]

ae = My_Index_In[3]

be = My_Index_In[4]
ce = My_Index_In[5]

Step 4: Call openfft_execute() to transform input to output.

openfft_execute(input, output);

Step 5: Obtain the result stored in the local output array. Upon exiting, a process is allocated (My_NumGrid_Out) grid points continuously from CcsBbsAas to CceBbeAae of the 3-D global array, where:

cs = My_Index_Out[0]

bs = My_Index_Out[1]
as = My_Index_Out[2]

ce = My_Index_Out[3]

be = My_Index_Out[4]
ae = My_Index_Out[5]

Step 6: Finalize the calculation by calling openfft_finalize().

openfft_finalize();

Calling OpenFFT from a Fortran user program

The Fortran sample programs, sample05.f90, illustrate how to call OpenFFT from a Fortran user program. Basically, it is similar to calling from C, except for the indexes that must be increased by 1.

Step 1: Include the Fortran interface and the standard iso_c_binding module for defining the equivalents of C types (integer(C_INT) forint, real(C_DOUBLE) for double, complex(C_DOUBLE_COMPLEX)for dcomplex, etc.).

use, intrinsic :: iso_c_binding
include 'openfft.fi'

Step 2: Initialize OpenFFT by calling openfft_initialize().

openfft_initialize(%VAL(N1),%VAL(N2),%VAL(N3),%VAL(measure_time),%VAL(print_memory),My_Max_NumGrid,My_NumGrid_In,My_Index_In,My_NumGrid_Out,My_Index_Out)

     Input: 3 dimensions of data: N1, N2, N3, measure_time and print_memory (0: disabled, 1: enabled).
     Output: arrays allocated and variables initialized.
             My_Max_NumGrid: the maximum number of grid points allocated to a process, used for allocating local arrays.
             My_NumGrid_In: the number of grid points allocated to a process when starting.
             My_Index_In: the 6 indexes of grid points allocated to a process when starting.
             My_NumGrid_Out: the number of grid points allocated to a process when finishing.
             My_Index_Out: the 6 indexes of grid points allocated to a process when finishing.

Step 3: After openfft_initialize() is called, important variables are initialized, and can be used for allocating and initializing local data input and output arrays.

Allocate the local data input and output arrays based on variable My_Max_NumGrid, which is the maximum number of grid points allocated to a process during the transformation.

allocate (input(My_Max_NumGrid))

allocate (output(My_Max_NumGrid))

Initialize the local input array from the global input array. A process is allocated (My_NumGrid_In) grid points continuously from AasBbsCcs to AaeBbeCce of the 3-D global array, where:

as = My_Index_In(1) + 1

bs = My_Index_In(2) + 1
cs = My_Index_In(3) + 1

ae = My_Index_In(4) + 1

be = My_Index_In(5) + 1
ce = My_Index_In(6) + 1

Step 4: Call openfft_execute() to transform input to output.

openfft_execute(input, output)

Step 5: Obtain the result stored in the local output array. Upon exiting, a process is allocated (My_NumGrid_Out) grid points continuously from CcsBbsAas to CceBbeAae of the 3-D global array, where:

cs = My_Index_Out(1) + 1

bs = My_Index_Out(2) + 1
as = My_Index_Out(3) + 1

ce = My_Index_Out(4) + 1

be = My_Index_Out(5) + 1
ae = My_Index_Out(6) + 1

Step 6: Finalize the calculation by calling openfft_finalize().

openfft_finalize()

Benchmark

The figures below show some benchmark results with OpenFFT and a couple of other packages taken on a Cray XC30 machine and the K computer. The number of data points is 256^3.

(a) Cray XC30

(b) K computer