I/O in ExaFLOW

Exascale computing will serve for very large capability jobs as well as for workflows with many instances of large-scale simulations. Implied in any case is an extremely large I/O consumption for reading and writing data as well as for storing these on a large-scale filesystem. This applies in particular to fluid-flow simulations. However, data I/O is an emerging bottleneck in highperformance computing (irrespective of application/discipline) because of diverging hardware speed-ups between computation and I/O. This will remain true even with new IO technologies like burst buffers. Also, nonvolatile memory will only gradually help. To reduce the amount of data for storage and handling we propose two solution paths: parallelization of I/O, and I/O data reduction and compression via application-dependent filtering. The main objective of both is alleviation of performance bottlenecks caused by data transfer from memory to disk.

Unsteady fluid-flow simulations produce large amounts of raw data describing the flow physics by a huge collection of time-dependent scalar, vector and tensor data, similar to real-world measurements. However, in this approach, the underlying flow phenomena (e.g. vortices) are only contained in an implicit way and as every object may be discretized by hundreds of grid points and many time steps, there is an enormous data-reduction potential if feature-based data were used for storage instead of 'cbrute-force' storage of raw data. The use of problem-specific filters will be investigated with the goal of reducing the amount of data for I/O in-situ such that the ratio of I/O to floating point operations for exascale computing is improved and physically interesting features are extracted. Present findings for fluid dynamics will be applicable for other disciplines using computational methods and other users as well, including industry.

Strong scaling at exascale using a mixed Continuous Galerkin-Hybridizable Discontinous Galerkin (CG-HDG) approach.

Continuous Galerkin (CG) algorithms are well-established in the numerical methods community, and have been widely used in the context of CFD software as they offer good single-node performance due to the compact problem size per node when compared with discontinuous Galerkin methods [5]. However, they suffer from complex inter-node communication patterns, since any elements meeting at vertices or edges must communicate. Therefore, reduction across many nodes must be made for some degrees of freedom, particularly when the mesh is unstructured. Conversely, discontinuous Galerkin methods lead to a minimal pair-wise communication pattern, since elements connect only through faces [6]. However, this comes at the expense of greater computational cost on each individual node. To achieve exascale performance, an approach is needed which offers good single-node performance, including the use of accelerators and coprocessors, while minimizing the communication cost.

Mesh adaptivity, heterogeneous modelling and resilience.

Realistic flow problems involving turbulence problems quickly require large-scale simulation capabilities. The most crucial aspect in these situations is the proper representation of small-scale flow structures in time-dependent transport problems. This must be accomplished with minimal dissipation, as errors accumulated at small scales may become dominant when propagated through large computational domains over long integration times. To reliably simulate turbulent flows with significant regions of separation requires schemes based on high-order accurate discretisations, e.g. spectral element methods. Such discretisation techniques offer sufficient accuracy and fast, nearly exponential convergence as well as large-scale parallelism and flexibility in prescribing mesh topology. An additional aspect of real flow problems is that simulations may in practice be heterogeneous, with different physical models and/or algorithms applied in different regions.

Flexibility in mesh topology is instrumental, as simulation accuracy depends strongly on the quality of the mesh, which in turn must be adjusted to the (a priori unknown) flow with potentially heterogeneous approaches. This is why mesh generation is considered to be a significant bottleneck in modern and future CFD. As more powerful HPC resources enable the simulation of complex, more realistic and industrially relevant flow problems, reliable mesh generation becomes more problematic, resulting in significant uncertainties in the simulation result. Although numerical uncertainties arise from many sources, including errors due to spatial and temporal discretisations or incomplete convergence, they can be minimised during the simulation by appropriate adaptation of the grid structure to the dynamic flow solution. Such automated mesh adaptivity, combining error estimation with dynamical refinement, is considered an essential feature for large-scale, computationally expensive simulations. It considerably improves the efficient use of resources, simplifies the grid generation and ensures a consistent accuracy (depending on the chosen measure) of the solution.

With the number of computing cores expected to exceed 1 million on an exascale platform, the risk of hardware failures, either permanent or transient, becomes statistically significant. Indeed, some estimates predict the mean time between failures on an exascale platform to be on the order of minutes. This is a serious challenge that must be addressed both at the hardware and the algorithmic level. Regardless of the source of the fault, resilience requires the ability to recover, with some fidelity, the lost results. For this we shall pursue the development of low storage, low complexity models, formulated in-situ and executing in a mirror state at different nodes. If a node fails, these models can be activated to recover the lost solution and operators can be generated. While such a strategy may help recovery in cases where the hardware indicates a fault, we will additionally pursue the development of suitable error indicators to detect silent errors, e.g. bit-flips due to radiation or low power consumption. This combination, embedded into the key computational engine of the CFD models, will ensure fault tolerance and resilience even on very large platforms.

A final crucial aspect is flexibility in applying different physical models and/or solution algorithms to different flow regions, which enables more of the relevant flow physics in complex geometries to be captured at reasonable cost. Such heterogeneous computing combines different modelling methods like Reynolds-Averaged Navier-Stokes (RANS), Large-Eddy Simulation (LES) and Direct Numerical Simulations (DNS), embedding high-resolution zones computed with more costly algorithms within another calculation. Hybrid RANS-LES or RANS-DNS is a promising alternative for simulating problems of practical engineering interest.

Error control, mesh adaptivity, and heterogeneous modeling are recognised as essential aspects and important challenges for exascale CFD workflows.

The main goal of the project is to address current algorithmic bottlenecks to enable the use of accurate CFD codes for problems of practical engineering interest. The focus will be on different simulation aspects including:

  • accurate error control and adaptive mesh refinement in complex computational domains,
  • solver efficiency via mixed discontinuous and continuous Galerkin methods and appropriate optimised preconditioners,
  • strategies to ensure fault tolerance and resilience,
  • heterogeneous modelling to allow for different solution algorithms in different domain zones,
  • parallel input/output for extreme data, employing novel data reduction algorithms,
  • energy awareness of high-order methods.

Specifically, we are going to address the following problems:

  • In complex flow simulations, a priori knowledge of the flow physics and regions within the domain that contain the dominant flow features is generally not available, making the development of adaptive techniques crucial for large-scale computational problems. From the perspective of algorithmic development, this can be broadly categorized into an investigation of scalable, load-balanced mesh-refinement strategies, and effective error estimators based on the spectral discretization within each element that indicate the regions in the flow domain that require additional resolution or coarsening.
  • Communication topologies in exascale systems will be inherently heterogeneous and necessitate new algorithms which lead to communication patterns which best align with the underlying network infrastructure. Exascale systems will require hierarchical parallelization strategies, essentially differentiating between intra- and inter-node parallelism. Achieving good efficiency on both levels while exploiting existing algorithms is a challenge and thus we propose a combination of different algorithmic approaches.
  • On the next generation of large scale computing platforms, the number of computing cores will be so large that the probability of hardware faults becomes significant during a large scale simulation. It is thus essential that algorithms be resilient to such faults, allowing the computation to detect and recover from such faults. Ensuring fault tolerance and resilience is a critical component of the development of simulation tools suitable for the exascale.
  • In complex flow simulations, the physics of the flow can differ drastically in different regions of the domain. Heterogeneous modelling allows the use of different representations of the physics depending on the level of detail required. At the exascale a key challenge to overcome is in maintaining scalable performance when interfacing the models in adjacent regions.
  • Due to the deep memory hierarchy of large-scale systems, I/O is becoming one of the key bottlenecks to overcome. This problem is compounded in CFD simulations that discretise the flow field by a large number of data points and the flow by a collection of scalar and vector data at these points. Overall this leads to data fields that contain an order of magnitude more data than the mesh degrees of freedom. This “raw” data contains not only the flow physics in an implicit manner but also with redundancy, i.e. multiple data points contain the same physical phenomenon, similar or identical data. Innovative in-situ data reduction schemes in conjunction with the exploitation of parallel I/O strategies are proposed to alleviate these problems.
  • Finally, independent of the problem domain, the energy consumption of exascale systems is becoming a limiting factor. Hardware solutions will likely have to work in conjunction with energy-efficient and energy-aware algorithms and implementations to maintain energy consumption at an acceptable level.

The project has the following five main objectives:

  1. Mesh adaptivity, heterogeneous modelling and resilience.
  2. Strong scaling at exascale using a mixed Continuous Galerkin-Hybridizable Discontinous Galerkin (CG-HDG) approach.
  3. I/O in ExaFLOW
  4. Validation and application use cases
  5. Energy efficient algorithms