Special Issue on “Programming, Resource Management and Autotuning Tools for Heterogeneous HPC”
• 大类 : 工程技术 - 4区
• 小类 : 计算机：理论方法 - 4区
Future High Performance Computing (HPC) systems face complex challenges deriving from the push towards Exascale, the limits of the power grid to support such large infrastructures, and emerging classes of applications imposing quality of service requirements other than pure throughput.
To address such challenges, heterogeneous computing architectures have emerged as a solution to achieve both higher performance and lower energy consumption. Their expression in the form of GPGPU and other many-core accelerators coupled with traditional HPC processors dominate the current Green500 and Top500 lists. Even higher degrees of heterogeneity can be achieved by introducing reconfigurable fabrics and/or application- or domain-specific accelerators.
The cost of heterogeneity lays in the complexity of management. Writing and managing HPC application is already a challenging task, requiring the cooperation of domain experts and HPC experts. The introduction of heterogeneous architectures makes the development and runtime management even more complex. Furthermore, at Exascale levels, hardware failures become sufficiently likely that computations running on such large infrastructure need to take them into account.
As a result, challenges include the management of heterogeneous resources, energy efficiency of computation, as well as the capability to meet timing constraints in face of transient or long time hardware failures. To solve such issues, manual control of computing resource will not suffice. New programming, resource management and autotuning models and tools are needed to effectively tackle such challenges.
Papers submitted to the special issue should have a strong emphasis on multi-node parallelism. Topics to be covered in this special issue include, but are not limited to, the following:
- Runtime resource management for heterogeneous HPC systems;
- Power, thermal, and performance prediction and management;
- Programming models integrating parallelism at multi-node level with other aspects, including resource management and access to heterogeneous resources, access to advanced storage (e.g., converging Big Data and HPC), fault management;
- Strategies, frameworks and methodologies for autotuning and self-management of the application and system.