Finite Difference Time – Domain Modelling of Metamaterials : GPU Implementation of Cylindrical Cloak

Finite difference time–domain (FDTD) technique can be used to model metamaterials by treating them as dispersive material. Drude or Lorentz model can be incorporated into the standard FDTD algorithm for modelling negative permittivity and permeability. FDTD algorithm is readily parallelisable and can take advantage of GPU acceleration to achieve speed–ups of 5x–50x depending on hardware setup. Metamaterial scattering problems are implemented using dispersive FDTD technique on GPU resulting in performance gain of 10x–15x compared to conventional CPU implementation.


Introduction
Standard FDTD algorithm cannot cater for negative values of permittivity or permeability.This is because of the Courant stability criterion.As soon as the permeability or permittivity becomes less than unity the algorithm will not be stable.A metamaterial object can be modelled as a dispersive substance using either the Lorentz or Drude dispersive models.These models can yield negative values of permittivity (or permeability) for certain frequency ranges [1].Using these dispersive models, FDTD update equations are modified and permittivity and permeabilities are replaced with terms dependent on frequency of operation.
Two problems are chosen for GPU implementation.First is the electromagnetic wave scattering by a slab with negative permittivity and permeability; also known as DNG (double negative) medium.Second problem is the simulation of cylindrical cloak.An incident Gaussian pulse on DNG slab will undergo dispersion resulting in different frequency components being separated.Refractive index and transmission coefficient are calculated numerically to ascertain the validity of implementation.The cylindrical cloak was first proposed and tested by Pendry et.al. [2].The first FDTD implementation was by Zhao et.al [3] and implemented on Comsol, a commercial electromagnetic simulation software.Simulations are implemented on Matlab, C++ and GPU.Performance comparison reveals a 10-15 times increase in performance with GPU implementations.Performance gain is greater for larger problem sizes and greater simulation times.

Drude Dispersion Model
In ideal conditions the permittivity (and permeability) of a material remain constant for any frequency and throughout the structure of that material.Speed of electromagnetic waves in such a medium remain constant if frequency changes.Additionally, there is no loss in energy as the waves pass through the medium.
In reality, such a material does not exist.Speed of EM waves varies with frequency of operation.Also, there is a loss associated with the material.A material is dispersive if its permittivity or permeability is dependent on frequency [4,Ch. 10].
The relative permittivity in Drude model is given by Where, ω p is plasma frequency and γ is collision frequency.Setting γ = 0 and ∞ = 1, relative permittivity comes out to be negative for ω/ω p > 1 (figure 1).Thus, Drude model can be effectively used to model metamaterials with permittivity or permeability less than one by incorporating it into FDTD update equations.

FDTD Update Equations
Faraday's Law and Ampere's Law in differential form are given by From [5], the update equations for a wave propagating in z direction are given by ( In the conventional FDTD algorithm future magnetic field components are first computed from past electric field components (eq.4).Using the updated magnetic field components, future electric field components are then calculated (eq.5) and the simulation proceeds in a leap-frog manner [5].

FDTD Update Equations Based on Drude Model
Electric flux density and electric field are related by Where = r 0 and r for Drude model is given by equation 1. Substituting Drude model r , equation 6 can be written as Following the treatment of [3] and [4], frequency domain quantities can be converted to time-domain using the relationships jω → ∂/∂t and ω 2 → −∂ 2 /∂t 2 .Moreover, fields multiplying with ω 2 p are averaged in time.Secondorder difference scheme is used, both for single and double derivatives to keep all the terms in accordance with the second-order nature of whole expression.This will result in easier implementation.The final form is given by This is also known as the auxiliary update equation for electric field where a e -e e are scalars given by A similar procedure can be carried out to obtain auxiliary update equation for magnetic field.
For a wave propagating in z direction, FDTD update equations are and

GPU Considerations
With the evolution of 3D graphics arose a need for faster computing means to handle real-time graphics processing.The graphics processing unit or GPU was originally meant to act as a separate processor to handle graphics computations.A modern GPU has several hundred small processors that can work in parallel.Generally, these processors are referred to as shaders.The idea is to divide a problem into smaller sub-problems meant to execute in parallel.To take advantage of GPU acceleration a problem must have either task-parallel or data-parallel nature.

Task-Parallelism
In task-parallelism computation consist of several independent tasks that run concurrently.These tasks may be completely unrelated but the end result is dependent on their outputs.Consider summation of a series, where we want to compute the value of e x from Taylor series.The mathematical expression is given by A single-threaded conventional implementation would have to calculate all the terms one-by-one and then sum up the result in the end.However, in a multi-threaded implementation, each thread would calculate only one term.All the treads will perform their computation in parallel and the end result is then summed up.The computation-intensive task of calculating factorials and higher powers of x is parallelised and results in significant reduction of computation time.

Data-Parallelism
In data-parallelism same operation is performed on individual elements of data.A simple example is that of scalar matrix multiplication.Consider an array of size n being multiplied with a scalar constant c.The result of each multiplication can be calculated independently by assigning a separate thread for the task.This is illustrated in figure 3 where input array is A[] and resultant array is B[].Dataparallelism applies to any scenario where values in resultant data array only depends on values from input array.FDTD is a good example of data-parallelism and a GPU implementation can take advantage of accelerated computing.

Problem Specification
An electromagnetic wave travelling in z direction is incident on a slab with negative values of permittivity and permeability (DNG) at the frequency of operation [6].Sinusoidal wave, Gaussian pulse and Ricker wavelet are used as sources.Transmission and reflection coefficients are calculated at the air-slab interface.Refractive index of slab for a range of frequencies is also computed.By varying parame- [2] cA [3] cA [4] cA [5] cA [6] cA [n] ... ters, the simulation can be scaled to any desired frequency or wavelength.

Simulation Results
The simulation is run for both lossless and lossy cases with sinusoidal, Gaussian and Ricker wavelet sources.The slab parameters are set such that at frequency of operation, f 0 , the permittivity and permeability of slab are both negative and result in a refractive index n = −1.

Simulation Parameters
The number of spatial steps is set as 4096 and simulation is run for 4 × 4096 time steps.The slab is located between steps 1365 and 2731.∆z or spatial step is set as 3 mm and time step, ∆t, is set as 50 ps.Frequency of operation is f 0 = 0.1953125 GHz and Courant number for this configuration comes out to be S c = 0.5.In order to obtain relative permittivity and permeability of −1 at required f 0 , plasma frequencies are set as ω 2 pm = ω 2 pe = 2 × (2πf 0 ) 2 with ∞ = µ ∞ = 1.First order absorbing boundary condition (ABC) are applied on fields at end points.

Incident and Transmitted Fields
Simulation with Gaussian pulse reveals that low frequency components are reflected at the interface which is confirmed from the transmission and reflection coefficients obtained for the air-slab interface.At f 0 , the transmission coefficient is 1 and there are no reflections when a sinusoidal source with f 0 is incident on the slab.Under steady-state conditions, transmitted wave inside the slab has negative phase velocity while energy is propagating in +ẑ direction as expected.

Refractive Index
Following [6], the refractive index was calculated from Where, k 0 was the wave number set as ω 0 /c and the fields were recorded at locations z 1 = 1415∆z and z 2 = 1424∆z.For both, Gaussian pulse and Ricker wavelet, Re(n) was −1 at f 0 while Im(n) was sufficiently close to 0.  The most common 2D configurations are T E z or T M z polarisation where the problem space is confined to xy-plane.
In T E z , electric field components are transverse to z-axis and vice versa.For 2D simulation of DNG slab, T M z and It is important to note here that DNG slab medium is assumed anisotropic.The auxiliary update equations (9 and

Simulation Geometry
The DNG slab interface is perpendicular to incident plane wave propagating in +y direction.Periodic boundary conditions (PBC) are applied at x = 0 and x = x max .In y direction the grid is terminated at both ends by perfectly matched layer (PML) to absorb any incoming waves.The solution geometry is depicted in figure 11.The arrangement of field nodes is shown in figure 12. Magnetic field components are updated first and then used to update electric field.
The solution space is bounded in y direction at both ends by H x which acts as the boundary for PML.H x at these end points is set to 0 so that any out-going fields are reflected back into PML.Essentially, the j size of H x arrays is one greater than H y and E z arrays.

Simulation Parameters and Results
The solution space is 512 cells in both x and y directions without taking into account the width of PML, which is 50 cells wide.The plane wave source is located 10 cells from the lower PML layer.The spatial and temporal steps are set as ∆x = ∆y = ∆ = 3 mm and ∆t = 50 ps, respectively; with a Courant number of 0.5.Frequency of operation is f 0 = 1.5625GHz.The DNG slab parameters are same as in the case of 1D DNG simulation.
Figure 13 shows the refractive index obtained for a range of frequencies with a Gaussian pulse excitation.Again, real part of refractive index at f 0 is close to -1, whereas, imaginary part is close to 0.   If we can somehow make light bend around the corners of an object so that it continues on its trajectory then that object will appear invisible.Invisibility has, for ages, remained confined to only fiction books but in modern day world of optical transformations and artificially engineered materials, this no longer is a mystery.
An electromagnetic cloak at microwave frequencies was proposed and tested for the first time by Pendry et.al. [2].The cloak is cylindrical in shape meant to hide a circular object.This is probably the simplest implementation sufficient for proof of concept.To bend light around the circular object the cloaking material must exhibit negative values of permittivity and permeability.The cloaking medium itself is anisotropic.The first attempt at modelling this electromagnetic cloak using the FDTD algorithm was by Zhao et.al. [3].Their FDTD implementation was tested using simulation software Comsol.

Problem Specification
The implementation in [3] is for T E z while in this paper T M z approach is followed as in the case of DNG slab problems.The cloaking parameters are same and given by Here, r a and r b are the inner and outer radii of cloaking shell.r a and r b are chosen as 0.1 m and 0.

Conclusions
The goal of this paper was to analyse GPU performance gain of dispersive FDTD and GPU implementation of lossless cylindrical cloak.It was shown that materials with negative permeability and permittivity can be effectively modelled using Drude dispersive model.The 1D and 2D cases of slab problem was then implemented using C++ and GPU.A comprehensive performance analysis showed that GPU was able to perform much better both, as the problem domain and maximum time stepping are increased.For smaller problems, however, it is better to use Matlab or C++ due to data copying overhead associated with GPU implementation.
The lossless case of cylindrical cloak, as presented by Zhao et.al. in [3], was implemented using Matlab, C++ and finally on GPU with minor modifications.The simulations showed adequate similarity to accepted results.

Figure 14
Figure 14: Simulation geometry 11)Equations 10 and 11 drive the FDTD algorithm which give future values of By and Dx from past fields.Equations 8 and 9 are auxiliary equations which give future fields H y and E x at n + 1.A dry run without any scatterer is carried out before actual simulation to record incident fields.After simulation any post-processing is done to calculate required parameters like refractive index.The whole algorithm is depicted in figure2.
Compute Transmission and Reflection CoefficientsCompute Refractive Index of Slab Play Time-Domain Simulation Movie It is well known that we can only see things when light bounces off them and reaches our eyes.This is the reason why transparent objects can sometimes be difficult to spot,

Table 1 :
[3]rating system, platform/configuration and compiler/toolchain used for performance testing for T E z presented in[3].The update equations for D z , H x and H y remain the same as in the case of 2D DNG problem.Figure15shows E z under steady-state.Matlab, C++, CUDA and OpenCL implementations are tested with respect to space and time.Operating system, platform/configuration and compiler/toolchain for these implementations are listed in table 1. Simulations use double data type for arrays on 64 bit optimised platform.The CPU is an Intel Core 2 Duo E8400 @3.00 GHz with 4 GB RAM.For CUDA simulations, GPU is GTX 500 Ti with 192 shaders and 1 GB of memory.Visual C++ 2010 Express is the IDE used on 64 bit Windows 7. The flavour of linux is Fedora 14 64 bit.Hardware and software configuration is listed in table 2.
2 m respectively.The FDTD implementation is first tested on Matlab, then implemented using C++ and finally on GPU.The problem geometry is depicted in figure14.The FDTD update equations for T M z are completely analogous to those Figure 15: Ez under steady state for lossless cloak

Table 2 :
The numerical results showed good agreement with theoretical values in the caseCPUIntel Core 2 Duo E8400 @3.00 GHz RAM 4.00 GB DDR2 GPU nVidia Geforce GTX 550 Ti 1 GB Hardware and software used for performance testing of DNG slab problem