Author: José Flich (UPV)
Contributors: Enrique S. Quintana-Ortí (UPV), David Rodríguez (UPV)
With the end of Dennard’s scaling and the ever-increasing demands for higher performance from conventional HPC numerical codes as well as emerging applications (for instance, machine learning) in the last years, the computer architecture scene has become much more diverse. As a result, we can now find processors that aim to augment the floating-point operations per second and Watt (FLOPS/W) following different paths, for example, exploiting very wide SIMD units, integrating a large count of cores, and/or using more conventional, yet heterogeneous designs combining general-purpose processors with graphics processing units (GPUs). The recent trends in HPC confirm that a hybrid architecture combining general-purpose processors (or CPUs), GPUs, and even specialised accelerators has become the preferred node type for a large range of workloads of interest for HPC and data centers, including machine learning (ML), Big Data, and scientific simulation.
In addition to well-established accelerators like GPUs, reconfigurable hardware devices such as Field Programmable Gate Arrays (FPGAs) have also been gaining popularity over the last few years in high performance computing (HPC) environments, as confirmed by successful projects such as Catapult. FPGAs are nowadays offered as a component by companies when hiring virtual machines in their cloud systems. Moreover, in an attempt to bound response time to users and minimize power consumption when providing Deep Learning services, Google has developed a new kind of accelerator, the Tensor Processing Unit (TPU), delivering substantial higher throughput and more favorable performance/watt ratio compared with contemporary CPUs and GPUs.
Overall eFlows4HPC project approach (architectural optimizations in orange)
The eFlows4HPC project aims to promote the adoption of heterogeneous architectures based on GPUs, FPGAs, and custom accelerators in the European industry. Moreover, the application of Deep Learning technologies to accelerate parts of the applications is being explored in the project and represents an innovation achievement. eFlows4HPC will incorporate additional emerging solutions at hardware level. Reconfigurable hardware, such as FPGAs, is becoming mature and is nowadays considered as a new component to be used in HPC and Big Data systems. The target applications, by convention, run on full 64-bit floating point precision. The project will explore more energy-efficient implementations by trading off intermediate precision versus performance/energy while maintaining the precision of the final results.
The specific objectives for the adaptation to the hardware will be:
- The identification of specific parts (kernels) of the application Pillars under consideration, in order to improve key performance indicators such as energy consumption and execution time.
- Assessing the optimization of such kernels and their customisation to specific heterogeneous architectures, including GPUs, FPGAs and custom accelerators, with special emphasis on the European Processor Initiative (EPI).
- The investigation of specific kernels to improve energy efficiency by trading off accuracy/precision and energy/performance when possible.
- The investigation of how the storage components of the software stack will benefit from novel storage technologies in order to improve the performance of the Pillars of the project.
- The specification of metrics that will enable the evaluation of the achieved improvements with the project results.