Parallel programming was once a niche field reserved only for government labs, research universities, and certain forward-looking industries, but today it is a requirement for most applications.
Up to 2006, CPU designers in fact have achieved performance gains improving clock speed, execution optimization and cache size. But now the performance improvement in new chips is fueled by hyperthreading, multicore and cache size. Hyperthreading and multicore CPUs have nearly no positive impact on most current software because it has been designed in a sequential fashion.
Therefore, the performance lunch isn’t free any more. Now is the time to analyse applications to identyfy the CPU-sensitive operations that could benefit from parallel computing.
The aim of this course is to provide an introduction on the architecture of parallel processing systems along with programming paradigms (openMP, MPI and CUDA) essential for exploitng them.
Learning the main aspects of modern, heterogeneous high-performance computing systems (pipeline/superscalar processors,shared-memory/message-passing multiprocessors, vector processors, GPUs) and basic programming skills for high-performance computing (cache optimization, vectorization, optimizations OpenMP, MPI, CUDA).
At the end of the course the student will be able to
Basic knowledge of computer architecture, fair programming skills in C/C++.
Lessons, practicals, homeworks and projects developed autonomously by students.
Some of these topics (6,7,10) are for students attending the 9 credits course.
Slides, tutorials, and code samples provided during the course.
Ricevimento: Appointment by email or via TEAMS.
DANIELE D'AGOSTINO (President)
ANNALISA BARLA
GIORGIO DELZANNO (President Substitute)
NICOLETTA NOCETI (Substitute)
In agreement with the calendar approved by the Degree Program Board of Computer Science.
The exam consists in the discussion of a project made individually or in a small group (2-3 students) plus a short, individual oral exam on key topics presented in the course.
The project will consist in the parallelization of a sequential algorithm using openMP, MPI+openMP and CUDA.
The project will be evaluated not only on the basis of the achievable performance of the parallel code, but also on how the code analysis, the adopted parallelization strategies and the achieved performance have been presented and discussed in the report.
This means for example that the parallel concepts are properly used, results are presented in a suitable and meaningful way, and overheads and issues have been correctly identified and discussed.