XcalableMP PGAS Programming Language : : From Programming Model to Applications.

Saved in:
Bibliographic Details
:
Place / Publishing House:Singapore : : Springer Singapore Pte. Limited,, 2020.
Ã2021.
Year of Publication:2020
Edition:1st ed.
Language:English
Online Access:
Physical Description:1 online resource (265 pages)
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Intro
  • Preface
  • Contents
  • XcalableMP Programming Model and Language
  • 1 Introduction
  • 1.1 Target Hardware
  • 1.2 Execution Model
  • 1.3 Data Model
  • 1.4 Programming Models
  • 1.4.1 Partitioned Global Address Space
  • 1.4.2 Global-View Programming Model
  • 1.4.3 Local-View Programming Model
  • 1.4.4 Mixture of Global View and Local View
  • 1.5 Base Languages
  • 1.5.1 Array Section in XcalableMP C
  • 1.5.2 Array Assignment Statement in XcalableMP C
  • 1.6 Interoperability
  • 2 Data Mapping
  • 2.1 nodes Directive
  • 2.2 template Directive
  • 2.3 distribute Directive
  • 2.3.1 Block Distribution
  • 2.3.2 Cyclic Distribution
  • 2.3.3 Block-Cyclic Distribution
  • 2.3.4 Gblock Distribution
  • 2.3.5 Distribution of Multi-Dimensional Templates
  • 2.4 align Directive
  • 2.5 Dynamic Allocation of Distributed Array
  • 2.6 template_fix Construct
  • 3 Work Mapping
  • 3.1 task and tasks Construct
  • 3.1.1 task Construct
  • 3.1.2 tasks Construct
  • 3.2 loop Construct
  • 3.2.1 Reduction Computation
  • 3.2.2 Parallelizing Nested Loop
  • 3.3 array Construct
  • 4 Data Communication
  • 4.1 shadow Directive and reflect Construct
  • 4.1.1 Declaring Shadow
  • 4.1.2 Updating Shadow
  • 4.2 gmove Construct
  • 4.2.1 Collective Mode
  • 4.2.2 In Mode
  • 4.2.3 Out Mode
  • 4.3 barrier Construct
  • 4.4 reduction Construct
  • 4.5 bcast Construct
  • 4.6 wait_async Construct
  • 4.7 reduce_shadow Construct
  • 5 Local-View Programming
  • 5.1 Introduction
  • 5.2 Coarray Declaration
  • 5.3 Put Communication
  • 5.4 Get Communication
  • 5.5 Synchronization
  • 5.5.1 Sync All
  • 5.5.2 Sync Images
  • 5.5.3 Sync Memory
  • 6 Procedure Interface
  • 7 XMPT Tool Interface
  • 7.1 Overview
  • 7.2 Specification
  • 7.2.1 Initialization
  • 7.2.2 Events
  • References
  • Implementation and Performance Evaluation of Omni Compiler
  • 1 Overview
  • 2 Implementation
  • 2.1 Operation Flow.
  • 2.2 Example of Code Translation
  • 2.2.1 Distributed Array
  • 2.2.2 Loop Statement
  • 2.2.3 Communication
  • 3 Installation
  • 3.1 Overview
  • 3.2 Get Source Code
  • 3.2.1 From GitHub
  • 3.2.2 From Our Website
  • 3.3 Software Dependency
  • 3.4 General Installation
  • 3.4.1 Build and Install
  • 3.4.2 Set PATH
  • 3.5 Optional Installation
  • 3.5.1 OpenACC
  • 3.5.2 XcalableACC
  • 3.5.3 One-Sided Library
  • 4 Creation of Execution Binary
  • 4.1 Compile
  • 4.2 Execution
  • 4.2.1 XcalableMP and XcalableACC
  • 4.2.2 OpenACC
  • 4.3 Cooperation with Profiler
  • 4.3.1 Scalasca
  • 4.3.2 tlog
  • 5 Performance Evaluation
  • 5.1 Experimental Environment
  • 5.2 EP STREAM Triad
  • 5.2.1 Design
  • 5.2.2 Implementation
  • 5.2.3 Evaluation
  • 5.3 High-Performance Linpack
  • 5.3.1 Design
  • 5.3.2 Implementation
  • 5.3.3 Evaluation
  • 5.4 Global Fast Fourier Transform
  • 5.4.1 Design
  • 5.4.2 Implementation
  • 5.4.3 Evaluation
  • 5.5 RandomAccess
  • 5.5.1 Design
  • 5.5.2 Implementation
  • 5.5.3 Evaluation
  • 5.6 Discussion
  • 6 Conclusion
  • References
  • Coarrays in the Context of XcalableMP
  • 1 Introduction
  • 2 Requirements from Language Specifications
  • 2.1 Images Mapped to XMP Nodes
  • 2.2 Allocation of Coarrays
  • 2.3 Communication
  • 2.4 Synchronization
  • 2.5 Subarrays and Data Contiguity
  • 2.6 Coarray C Language Specifications
  • 3 Implementation
  • 3.1 Omni XMP Compiler Framework
  • 3.2 Allocation and Registration
  • 3.2.1 Three Methods of Memory Management
  • 3.2.2 Initial Allocation for Static Coarrays
  • 3.2.3 Runtime Allocation for Allocatable Coarrays
  • 3.3 PUT/GET Communication
  • 3.3.1 Determining the Possibility of DMA
  • 3.3.2 Buffering Communication Methods
  • 3.3.3 Non-blocking PUT Communication
  • 3.3.4 Optimization of GET Communication
  • 3.4 Runtime Libraries
  • 3.4.1 Fortran Wrapper
  • 3.4.2 Upper-layer Runtime (ULR) Library.
  • 3.4.3 Lower-layer Runtime (LLR) Library
  • 3.4.4 Communication Libraries
  • 4 Evaluation
  • 4.1 Fundamental Performance
  • 4.2 Non-blocking Communication
  • 4.3 Application Program
  • 4.3.1 Coarray Version of the Himeno Benchmark
  • 4.3.2 Measurement Result
  • 4.3.3 Productivity
  • 5 Related Work
  • 6 Conclusion
  • References
  • XcalableACC: An Integration of XcalableMP and OpenACC
  • 1 Introduction
  • 1.1 Hardware Model
  • 1.2 Programming Model
  • 1.2.1 XMP Extensions
  • 1.2.2 OpenACC Extensions
  • 1.3 Execution Model
  • 1.4 Data Model
  • 2 XcalableACC Language
  • 2.1 Data Mapping
  • Example
  • 2.2 Work Mapping
  • Restriction
  • Example 1
  • Example 2
  • 2.3 Data Communication and Synchronization
  • Example
  • 2.4 Coarrays
  • Restriction
  • Example
  • 2.5 Handling Multiple Accelerators
  • 2.5.1 devices Directive
  • Example
  • 2.5.2 on_device Clause
  • 2.5.3 layout Clause
  • Example
  • 2.5.4 shadow Clause
  • Example
  • 2.5.5 barrier_device Construct
  • Example
  • 3 Omni XcalableACC Compiler
  • 4 Performance of Lattice QCD Application
  • 4.1 Overview of Lattice QCD
  • 4.2 Implementation
  • 5 Performance Evaluation
  • 5.1 Result
  • 5.2 Discussion
  • 6 Productivity Improvement
  • 6.1 Requirement for Productive Parallel Language
  • 6.2 Quantitative Evaluation by Delta Source Lines of Codes
  • 6.3 Discussion
  • References
  • Mixed-Language Programming with XcalableMP
  • 1 Background
  • 2 Translation by Omni Compiler
  • 3 Functions for Mixed-Language
  • 3.1 Function to Call MPI Program from XMP Program
  • 3.2 Function to Call XMP Program from MPI Program
  • 3.3 Function to Call XMP Program from Python Program
  • 3.3.1 From Parallel Python Program
  • 3.3.2 From Sequential Python Program
  • 4 Application to Order/Degree Problem
  • 4.1 What Is Order/Degree Program
  • 4.2 Implementation
  • 4.3 Evaluation
  • 5 Conclusion
  • References.
  • Three-Dimensional Fluid Code with XcalableMP
  • 1 Introduction
  • 2 Global-View Programming Model
  • 2.1 Domain Decomposition Methods
  • 2.2 Performance on the K Computer
  • 2.2.1 Comparison with Hand-Coded MPI Program
  • 2.2.2 Optimization for SIMD
  • 2.2.3 Optimization for Allocatable Arrays
  • 3 Local-View Programming Model
  • 3.1 Communications Using Coarray
  • 3.2 Performance on the K Computer
  • 4 Summary
  • References
  • Hybrid-View Programming of Nuclear Fusion Simulation Code in XcalableMP
  • 1 Introduction
  • 2 Nuclear Fusion Simulation Code
  • 2.1 Gyrokinetic PIC Simulation
  • 2.2 GTC
  • 3 Implementation of GTC-P by Hybrid-view Programming
  • 3.1 Hybrid-View Programming Model
  • 3.2 Implementation Based on the XMP-Localview Model: XMP-localview
  • 3.3 Implementation Based on the XMP-Hybridview Model: XMP-Hybridview
  • 4 Performance Evaluation
  • 4.1 Experimental Setting
  • 4.2 Results
  • 4.3 Productivity and Performance
  • 5 Related Research
  • 6 Conclusion
  • References
  • Parallelization of Atomic Image Reconstruction from X-ray Fluorescence Holograms with XcalableMP
  • 1 Introduction
  • 2 X-ray Fluorescence Holography
  • 2.1 Reconstruction of Atomic Images
  • 2.2 Analysis Procedure of XFH
  • 3 Parallelization
  • 3.1 Parallelization of Reconstruction of Two-Dimensional Atomic Images by OpenMP
  • 3.2 Parallelization of Reconstruction of Three-dimensional Atomic Images by XcalableMP
  • 4 Performance Evaluation
  • 4.1 Performance Results of Reconstruction of Two-Dimensional Atomic Images
  • 4.2 Performance Results of Reconstruction of Three-dimensional Atomic Images
  • 4.3 Comparison of Parallelization with MPI
  • 5 Conclusion
  • References
  • Multi-SPMD Programming Model with YML and XcalableMP
  • 1 Introduction
  • 2 Background: International Collaborations for the Post-Petascale and Exascale Computing
  • 3 Multi-SPMD Programming Model.
  • 3.1 Overview
  • 3.2 YML
  • 3.3 OmniRPC-MPI
  • 4 Application Development in the mSPMD Programming Environment
  • 4.1 Task Generator
  • 4.2 Workflow Development
  • 4.3 Workflow Execution
  • 5 Experiments
  • 6 Eigen Solver on the mSPMD Programming Model
  • 6.1 Implicitly Restarted Arnoldi Method (IRAM), Multiple Implicitly Restarted Arnoldi Method (MIRAM) and Their Implementations for the mSPMD Programming Model
  • 6.2 Experiments
  • 7 Fault-Tolerance Features in the mSPMD Programming Model
  • 7.1 Overview and Implementation
  • 7.2 Experiments
  • 8 Runtime Correctness Check for the mSPMD Programming Model
  • 8.1 Overview and Implementation
  • 8.2 Experiments
  • 9 Summary
  • References
  • XcalableMP 2.0 and Future Directions
  • 1 Introduction
  • 2 XcalableMP on Fugaku
  • 2.1 Performance of XcalableMP Global View Programming
  • 2.2 Performance of XcalableMP Local View Programming
  • 3 Global Task Parallel Programming
  • 3.1 OpenMP and XMP Tasklet Directive
  • 3.2 A Proposal for Global Task Parallel Programming
  • 3.3 Prototype Design of Code Transformation
  • 3.4 Preliminary Performance
  • 3.5 Communication Optimization for Manycore Clusters
  • 4 Retrospectives and Challenges for Future PGAS Models
  • 4.1 Low-Level Communication Layer for PGAS Model
  • 4.2 XcalableMP as a DSL for Stencil Applications
  • 4.3 XcalableMP API: Compiler-Free Approach
  • 4.4 Global Task Parallel Programming Model for Accelerators
  • References.