Crystallographic Fast Fourier Transform
Andrzej Kudlicki, Maga Rowicka, Zbyszek Otwinowski

"The writing of such programs is an undertaking of substantial complexity, which has deterred all but the bravest.''
Gerard Bricogne, ITC Part B

Crystallographic FFT

The solution

The need to have efficient crystallographic FFT routines is obvious, as it allows computation of Fourier transforms of data with symmetries in asymmetric unit only; saving both CPU time and memory. So far, there has been no general, space-group efficient implementation of the FFT. Ideally, such algorithms should operate only on the asymmetric unit and their speed should be similar to P1 FFT transform of the same amount of data. A partial solution has been proposed by Lynn Ten Eyck in 1973, but it only works for a small number of space groups; all FFTs found in crystallographic software are either based on his ideas, or P1 FFTs. Subsequently, the problem has attracted lots of attention and was the subject of more than 20 publications. In particular, a general approach has been proposed by Gerard Bricogne (1996), but without a prescription how to design algorithms for a large number of space groups. We have developed a new approach to crystallographic Fast Fourier Transform (FFT). It provides recipes for algorithms for all 230 space groups. The presented algorithms allow to reduce both computation time and memory usage by a factor equal to the number of symmetry operators in the crystallographic group. The solution is ultimate: it has reached the theoretical limit of computational complexity and it is highly efficient for current computer architecture. The basic idea of the FFT is to decompose a transform of size N into two transforms of size N/2 (or 3 of size N/3, etc.), and then collect the transform in a single step. For some space groups this procedure is performed recursively.
Properties of our algorithms

CPU: The collection step is a O(N) calculation, hence the total number of floating point operations is only slightly larger than the number of operations needed to compute FFT of the asymmetric unit.

Memory use: The algorithms operate only on data from the asymmetric unit, so the do not use any memory for storing redundant data.

Memory access: Despite their complexity, our algorithms result in regular memory access patterns, which helps optimize the use of processor cache.

In-place calculation: The algorithm is designed for calculating symmetric Fourier transforms in-place, i.e. transformed data will be replaced into the same memory area. This helps save more memory, allowing small calculations to fit entirely in the machine's cache, and preventing swapping while processing large datasets.

One-step symmetry decomposition

In our approach we combine successfully the FFT-induced decomposition with the crystallographic symmetry.
To this end an asymmetric unit has to be carefully chosen. Let Gamma denote a computational grid and let Gamma_0\subset \Gamma denote an asymmetric unit. Let G denote the set of symmetry operators from the crystallographic group considered, and let S_g denote an action of the symmetry operator g\in G. We are looking for an asymmetric unit Gamma_0 being a regular subgrid of Gamma and satisfying an additional condition: From this condition, it follows that the computational grid cannot contain points on symmetry elements, e.g. rotational axes or mirror planes. It is possible to find such a computational grid and an asymmetric unit satisfying the above conditions, for 113 crystallographic groups (see Tables below). Such grids are typically non-contiguous, but that allows them to be evenly spaced, and as such valid FFT grids. By definition, any subset of the unit cell is an asymmetric unit if its elements and their images transformed by the group`s symmetry operators fill up the entire unit cell. Consequently, there are many possible choices of an asymmetric unit for each space group. Our choice is a non-standard one, but the data can be transformed from one asymmetric unit into another by a single permutation, which is much faster than the FFT.

Computation of the crystallographic FFT is very similar to the usual FFT. The first step is to calculate Y, a normal, P1 FFT on one of the subgrids. One is enough, because all others are related to it by symmetry operators. The formula for recovery of the final Fourier transform F resembles the "collection'' step in standard FFT. The only difference is that, instead of mixing data from points on different grids one uses data from one grid, taking into account the action of symmetry operators involved. Shifting the computational grid

To the right: examples of shifting the computational grid with respect to the origin. These shifts are expressed in 12th's of grid lenght, so displacement by 1/2 is denoted by a '6'. For clarity, this and other graphics are presented in 2-D.

The FFT computational grid cannot contain points on symmetry elements, e.g. rotational axes or mirror planes. In many cases this requires shifting the grid. Decimation

To the right: a prototype for a reciprocal space asymmetric unit (filled circles in right panel).

Asymmetric units are non-contiguous in the real space and contiguous in the reciprocal space. Examples of decimation in real space:
1. Two-fold symmetry and Four-fold symmetry

To the right: examples of grid decimation. Left panel: the subgrid (blue) consists of every second grid point along x, right panel: subgrid consits of 1/4 of all the points. A denotes the decimation matrix for each case.  2. Eight-fold symmetry

Decomposition of computational grid into subgrids for symmetries with no points on axes: 8-grid decimation 2x2y2z. Data values are defined at centers of colored cubes. Cubes of any chosen color represent a valid choice of an asymmetric unit, consisting of a regular grid. Unit cell asymmetric unit
3. Three-fold symmetry

We divide the grid G into three subgrids G0, G1 and G2:
G0 = {(x,y): x+y=0 mod3}
G1 = {(x,y): x+y=1 mod3}
G2 = {(x,y): x+y=2 mod3}
In other words, the grid G0 consists of points whose sum of coordinates is divisible by 3. From the above definition, it is easily seen that subgrids G0, G1, and G2 are mutually disjoint. Since every integer equals 0, 1 or 2 modulo 3, it follows that G=G0+G1+G2. The subdivision of G is depicted on the right. All points from G lie in the centers of colored triangles. Points from G0 are red, points from G1 are green and G2 is yellow. Observe that every point from G0 is transformed into a point from G2, and further into a point from G1.

Asymmetric unit and unit cell for data with three-fold rotational axes (3(x+y), e.g. space group P3). Data values are defined at centers of every second yellow triangle.

Recursive decomposition and special points

Some symmetries do not allow for a decomposition into subgrids with no points on symmetry operators. In such case it is possible to decompose the original grid into a sum of regular subgrids, some of which do not have points on symmetry elements. To grids without points on symmetry elements we apply the algorithm. Remaining grids are decomposed further. This procedure is repeated recursively until one-point grids are reached. The final Fourier transform can be reconstructed from P1 FFTs of subgrids with no special point and one-point subgrids. As previously, Fourier transform need to be computed only on subgrids not related to each other by symmetry operators. The resulting shape of asymmetric unit is fractal-like.

The crystallographic fast Fourier transform. The recursive symmetry reduction.  An asymmetric unit for the diagonal mirror yx, which consists of several regular subgrids of different sizes.

Asymmetric units in reciprocal space

Let us start with the simplest case, when the asymmetric unit in the real space is given by a single regular grid. Then the first approximation of an asymmetric unit in the reciprocal space, a natural prototype, will be a grid dual to the real-space asymmetric subgrid The prototype is thus defined by the same decimation matrix, as the non-contiguous grid in the real space. Observe, that the reciprocal space AU prototype contains the same number of data points, as the asymmetric unit in the real space. Due to the action of symmetry operators (their form in grid coordinates is modified by the shift of the
real-space grid), some points have undesirable properties: some prototype points have symmetric images in the same prototype, and the value of the Fourier transform in some prototype points is zero.
These points must be excluded from the AU. However, there are always points outside of the prototype, that can be used instead. The points removed from prototype are grouped into several classes. Grid points in each class form regular 1-D or 2-D shapes, and all have the same distance from their out-of-prototype counterparts. An immediate consequence of the above is the possibility of describing the AU in reciprocal space by the decimation matrix, supplemented with a list of groups of special points. For some space groups, more than one prototype shall be required. Constructing the reciprocal space ASU. The prototype (grid) and an asymmetric unit (spheres) for a group with a 4-fold symmetry axis. The number of prototype points not in AU is equal to the number of out-of-prototype points in the asymmetric unit. Data points of different colors have different isotropy groups.

Computing the transform

The symmetric step of the Fourier Transform calculation is very similar to a step in the Cooley-Tukey FFT algorithm, with radix (number of points, for which values are mixed) equal to the number of symmetry operators. The main difference is that the points, whose values are mixed, are related by "mixing operators'', derived from the underlying crystallographic symmetry operators. Also, points that are images of themselves under some of the operators (have a non-trivial isotropy group) must be processed with a lower radix, than other points.

Next, a looping unit is defined within the AU. It consists of points, that together with their images under the action of the mixing operators cover the entire asymmetric unit. There are many possible looping units, and a scoring function is employed to create one as regular as possible. Each looping unit is then filled with regular blocks, over which the loops are performed.

Since it is a linear operation, it can be thought of as a multiplication by a matrix of twiddle factors. What needs to be supplied is the relation between memory addresses of all accessed data, and a description of the way the matrix M changes while moving from one address to another. Again, all this information can be calculated from the symmetry operators, and encoded in a small number of integers. A schematic picture of the unit cell, asymmetric unit, and looping unit for a 4-element symmetry group A schematic graph of the symmetric part of the FT calculation (for 4 operators). At each step of the loop, accessed and modified are four values at points related by mixing operators. The matrix M is different at each point, but its elements change in a regular manner. 