3 Cr. (Hrs.:3 Lec.)
Provides an overview of multi-core, multi-processor, heterogeneous computer architectures and their runtime systems. Students will implement applied computational models and simulations using an array of high-performance computing systems to explore notions of scalability, extensibility, heterogeneity, and performance in these environments. Software engineering issues of specification, maintainability, validation and verification, and versioning will be explored. Lastly, data modeling will be central to mapping large scale problem sets to differing hardware platforms. Topics include high-performance architectures, heterogeneous computing, parallel programming, software tools and packages (Python4, SciPy), algorithm design, characteristics of commonly used numerical methods, mapping of solution methods to modern multi-processor systems, and performance. Students may not take this course for both 400 and 500 level credit.
Prerequisite: CSCI 332 and (M 426 or CSCI 477)
E1. Know how to work in a UNIX/Linux environment to manipulate files, use and integrate existing software packages and libraries, and can compile/execute custom programs.
E2. Understand the basics of the algorithmic analysis – asymptotic Big-O complexity. (CSCI 332)
E3. Student should understand how the formal steps to create a mathematical or computation model. (M 426 or CSCI 477)
R1. Be familiar with basic computer architecture principles, including the SIMD & MIMD execution models, data cache, shared & distributed memory, multi-core processors, and graphical processing units (GPUs).
R2. Be able to set up a virtual machine and install multiple operating systems on it.
R3. Understand basic concepts of parallel programming, including local vs. shared data, data dependencies, race conditions, multi-threaded programming with OpenMP, multi-process programming with MPI, and GPU computing.
R4. Know how to develop, analyze (Big-O complexity), and code both serial and parallel algorithms to solve scientific problems.
R5. Know how to test and debug both serial and parallel programs.
R6. Know how to submit programs for execution on a multi-user HPC system through a job queuing system.
R7. Understand how to measure, interpret, and report the performance of their code, including the speedup on a multiprocessor system.
R8. Understand basic compiler optimization options and know how to use them to evaluate and improve code performance.
R9. Learn about cloud computing options and Map/Reduce computational paradigm.
R10. Design and implement a non-trivial serial and parallel program and analyze the algorithmic performance and identify performance barriers such as data contention, bottleneck, and dependency and discuss strategies for solving these performance barriers.