**Nek5000:** The code is available for download and installation from either a Subversion Control Repository or Git. Links to both repositories are given at: http://nek5000.mcs.anl.gov/install/

The git repository always mirrors the svn. Official releases are not in place since the nek community users and developers prefer immediate access to their contributions. However, since the software is updated on constant basis, tags for stable releases as well as latest releases are available, so far only for the Git mirror of the code.

The reason for this is that SVN is maintained mainly for senior users who already have their own coding practices, and will be maintained at Argonne National Laboratory (using respective account at ANL); the git repository is maintained at github. A similar procedure is followed for the documentation to which developers/users are free to contribute by editing and adding descriptions of features, and these are pushed back to the repository by issuing pull requests. These allow the nek team to assess whether publication is in order. All information about these procedures are documented on the homepage http://nek5000.msc.anl.gov/. KTH maintains a close collaboration with the Nek team at ANL.

The code is daily run through a series of regression tests via buildbot (to be transferred to jenkins). The checks range from functional testing, compiler suite testing to unit testing. So far not all solvers benefit of unit testing but work is ongoing in this direction Successful runs of buildbot determine whether a version of the code is deemed stable.

A suite of examples is available with the source code, examples which illustrate modifications of geometry as well as solvers and implementations of various routines. Users are encouraged to submit their own example cases to be included in the distribution.

The use cases withing ExaFLOW which involve Nek5000 will be packaged as examples and included in the repository for future reference.

**Nektar++ :** The code is a tensor product based finite element package designed to allow one to construct efficient classical low polynomial order h-type solvers (where h is the size of the finite element) as well as higher p-order piecewise polynomial order solvers. The framework currently has the following capabilities:

- Representation of one, two and three-dimensional fields as a collection of piecewise continuous or discontinuous polynomial domains.
- Segment, plane and volume domains are permissible, as well as domains representing curves and surfaces (dimensionally-embedded domains).
- Hybrid shaped elements, i.e triangles and quadrilaterals or tetrahedra, prisms and hexahedra.
- Both hierarchical and nodal expansion bases.
- Continuous or discontinuous Galerkin operators.
- Cross platform support for Linux, Mac OS X and Windows.

Nektar++ comes with a number of solvers and also allows one to construct a variety of new solvers. In this project we will primarily be using the Incompressible Navier Stokes solver.

**SBLI:** The SBLI code solves the governing equations of motion for a compressible Newtonian fluid using a high-order discretisation with shock capturing. An entropy splitting approach is used for the Euler terms and all the spatial discretisations are carried out using a fourth-order central-difference scheme. Time integration is performed using compact-storage Runge-Kutta methods with third and fourth order options. Stable high-order boundary schemes are used, along with a Laplacian formulation of the viscous and heat conduction terms to prevent any odd-even decoupling associated with central schemes.

**NS3D:** The DNS code *ns3d* is based on the complete Navier-Stokes equations for compressible fluids with the assumptions of an ideal gas and the Sutherland law for air. The differential equations are discretized in streamwise and wall-normal directions with 6^{th}-order compact or 8^{th}-order explicit finite differences. Time integration is performed with a four-step, 4^{th}-order Runge-Kutta scheme. Implicit and explicit filtering in space and time is possible if resolution or convergence problems occur. The code has been continuously optimized for vector and massive-parallel computer systems until the current Cray XC40 system. Boundary conditions for sub- and supersonic flows can be appropriately specified at the boundaries of the integration domain. Grid transformation is used to cluster grid points in regions of interest, e.g. near a wall or a corner. For parallelization, the domain is split into several subdomains as illustrated in the figure.

*Illustration of grid lines (black) and subdomains (red). A small step is located at Rex=3.3E+05.*

- Details
- Written by Anna Palaiologk
- Category: Project Description
- Hits: 2443

The main goal of ExaFLOW is to address key algorithmic challenges in CFD (Computational Fluid Dynamics) to enable simulation at exascale, guided by a number of use cases of industrial relevance, and to provide open-source pilot implementations.

The project comprises of the following 4 use cases:

- McLaren Front Wing run by Imperial College London and McLaren Racing
- Wing profile NACA4412 run by KTH Stockholm and the University of Southampton
- Automotive run by the Automotive Simulation Center Stuttgart
- Jet in crossflow run by KTH Stockholm and the University of Stuttgart

- Details
- Written by Anna Palaiologk
- Category: Project Description
- Hits: 3453

# Energy efficient algorithms

The energy use of HPC systems is an important consideration at the exascale. In order to meet the 20MW power target, the entire HPC environment (from the datacentre to the hardware and the software) needs to be made more energy efficient; this includes the scientific applications.

There is an important distinction between** energy efficient and energy aware** algorithms: an energy efficient algorithm will target minimal power consumption, but is effectively static; an energy aware algorithm on the other hand will have some knowledge of power consumption (be this at runtime or from a database that is populated post-execution) and can adapt runtime parameters to reduce energy use. Additionally the contradiction between entropy production by doing fast numerical calculations and the time to solution has to be considered. It might be that a numerical task could be solved by a slow running procedure at a minimum of energy consumed compared to a fast running solution at a high-energy rate. A metric or cost model that takes both time and energy to solution into account has to be developed. An exascale-system is designed to run large problems in as short a time as possible, however energy consideration may not always make this possible and acceptable trade-offs must be found.

Many algorithms that are used in numerical modelling today have a long legacy and are known to work well on systems with limited parallelism. In order to reach the exascale however there needs to be a break in the status quo; many of these algorithms need to be redesigned from the ground up to expose further, massive parallelism and exploit the strength of an exascale system. This presents an ideal opportunity to also include energy considerations into algorithm development.

The limitations of an exascale system, as projected today, will primarily be in the data movement aspects of an application: reading from and writing to memory and disk; and moving data across the network. Floating point operations on the other hand will be 'cheap' in terms of time and energy and keeping the processing cores busy will be a major challenge. Exascale algorithms should minimise data movement and increase computation, thus increasing the 'Instructions per Cycle' (IPC) rates. These computational challenges align with the power challenges: accessing data from cache, memory or the network has higher energy costs compared to integer or floating-point operations.

At the same time energy consumption can also be reduced by changing the frequency at which computing elements are running. This might even be possible without reducing the computational performance to a large extent. For many cycles, a core is not doing useful work because it is waiting for data coming from memory or remote nodes, or it is waiting in synchronisation barriers. However 'active' waiting can be achieved with lower frequency and lower power without limiting the computational performance.

An important pre-requisite to developing energy efficient or aware algorithms is the ability to measure power. Several in-band and out-of-band solutions, with varying levels of accuracy and resolution, exist: from node-level measurements on Cray systems (starting with the XC30 range), to Intelligent Platform Management Interface (IPMI) and Running Average Power Limit (RAPL) reports, to plug-in power measurement boards which are able to measure the power use of different system components (CPU, accelerators, memory, network, disks) separately. HLRS, for instance, uses a measurement system with high frequency sampling for power consumption on a small cluster. This system consists of a two double socket nodes with FDR Infiniband interconnect and an independent measurement system to collect power consumption of various components (CPUs, GPU, per-node power consumption). The measurement frequency ranges from 12 kHz up to 100 kHz. The computational performance is measured at the same time. Both measurements are aligned with the code progress to localise the measurements within the program. This way it is possible to obtain insight into the energy use behaviour of the program. For a system with power consumption so large that it requires coordination with the local power distributor, it will be important not only to reduce the power consumption, but also to request power at a constant, steady level. Significant changes in the power draw must be avoided.

- Details
- Written by Ralf Schneider
- Category: Project Description
- Hits: 1009

# Validation and application use cases

The techniques developed will be implemented in open source modules, integrated in larger CFD packages, including Nek5000, Nektar++, NS3D and SBLI and verified using real-world problems provided by the academic and industrial partners. The test cases used for validation include:

*McLaren front wing.*

We use the representative geometry of a three component front wing with a complex end plate and representative wheel geometry at an experimental Reynolds number. The geometry has been investigated experimentally as a PhD project at Imperial College London, which therefore provides benchmark data. The test case contains complex geometry and ground effect, while also being representative of the aerodynamics found in many other applications such as the open flap and slat geometry of a wing during landing and take off.*NACA 4412 airfoil.*

This family of analytically defined airfoils is a standard test case within CFD; mainly with low-order code or RANS methods. The case is based on a twodimensional geometry, with periodic boundary conditions applied in the spanwise direction, which typically have an extent of 10-20% of the chord. Looking toward exascale computing (and further) the challenge is to run this sort of test case at Reynolds numbers of O(106-107), with different approaches including DNS, LES with resolved near-wall layers and LES with wall models (including heterogeneous models as discussed in the proposal). Note that resolved DNS of such a case will require about 10-100 billion grid points. The adaptive mesh strategy will thus be of particular importance in the turbulent wake of the airfoil where little information about the flow is known. The proposed simulations will be compared with planned experiments in the KTH wind tunnel at comparable Reynolds numbers.*Jet in Crossflow.*

This generic flow case of high practical relevance is obtained when a fluid jet through the wall enters a boundary layer flow along a wall. As its understanding is important in many real applications, this flow has been the subject of a number of experimental and numerical studies over the last decades. On the other hand this flow case with moderately complicated geometry and complex, fully three-dimensional dynamics provides a perfect tool for testing numerical algorithms and tools. Its benefit for testing the feature detection and I/O reduction strategies is that it contains a variety of length scales due to boundary layer turbulence, flow instability and breakdown of the jet. We are going to consider a circular pipe perpendicular to the flat plate.*Industrial use cases from the automotive sector.*

We select representative use cases from different phases of a vehicle design process: conceptual design, preliminary and detailed design, and product validation. Therefore geometry data is provides through the OEM members of the ASCS. In close cooperation with the OEMs relevant CFD problems will be defined. Moreover the focus is on comparison of problem solving with state-of-the art commercial CFD software tools in automotive industry and the new open source exascale techniques in this project.

The target codes cover both incompressible and compressible flow.

- Details
- Written by Ralf Schneider
- Category: Project Description
- Hits: 947