HIGH PERFORMANCE COMPUTING | Corsi di Studio UniGe

CODE	90535
ACADEMIC YEAR	2024/2025
CREDITS	9 cfu anno 2 COMPUTER SCIENCE 10852 (LM-18) - GENOVA 6 cfu anno 2 COMPUTER SCIENCE 10852 (LM-18) - GENOVA 6 cfu anno 1 COMPUTER ENGINEERING 11160 (LM-32) - GENOVA 6 cfu anno 2 COMPUTER ENGINEERING 11160 (LM-32) - GENOVA
SCIENTIFIC DISCIPLINARY SECTOR	INF/01
LANGUAGE	English
TEACHING LOCATION	GENOVA
SEMESTER	1° Semester
TEACHING MATERIALS	AULAWEB

OVERVIEW

Parallel programming was once a niche field reserved only for government labs, research universities, and certain forward-looking industries, but today it is a requirement for most applications.

Up to 2006, CPU designers in fact have achieved performance gains improving clock speed, execution optimization and cache size. But now the performance improvement in new chips is fueled by hyperthreading, multicore and cache size. Hyperthreading and multicore CPUs have nearly no positive impact on most current software because it has been designed in a sequential fashion.

Therefore, the performance lunch isn’t free any more. Now is the time to analyse applications to identyfy the CPU-sensitive operations that could benefit from parallel computing.

The aim of this course is to provide an introduction on the architecture of parallel processing systems along with programming paradigms (openMP, MPI and CUDA) essential for exploitng them.

AIMS AND CONTENT

LEARNING OUTCOMES

Learning the main aspects of modern, heterogeneous high-performance computing systems (pipeline/superscalar processors,shared-memory/message-passing multiprocessors, vector processors, GPUs) and basic programming skills for high-performance computing (cache optimization, vectorization, optimizations OpenMP, MPI, CUDA).

AIMS AND LEARNING OUTCOMES

At the end of the course the student will be able to

understand why and how the compiler is one of the most important tools for HPC;
identify program hostspots and possible strategies to improve the execution time;
vectorise the code by interacting with the compiler and profiling tools;
parallelise simple algorithms using openMP, MPI and CUDA;
evaluate and discuss the performance of parallel programs.

PREREQUISITES

Basic knowledge of computer architecture, fair programming skills in C/C++.

TEACHING METHODS

Lessons, practicals, homeworks and projects developed autonomously by students.

SYLLABUS/CONTENT

The processor architecture:
Performance of a pipeline system and its analytical evaluation. Overall structure of a pipeline processor. Pipeline hazards (structural, data, control) and their impact on performance. Reducing hazards and/or their impact: hardware techniques. Instruction-level parallelism in sequential programs. The importance of cache-aware software..
Multiprocessor computers:
The purpose of a parallel computer. Limits of parallel computers: Amdahl's law, communication delays. MIMD computers: shared memory, distributed memory with shared address space, distributed memory with message-passing.
High-level parallel programming on shared address space: the OpenMP directives. Practicals with OpenMP.
Message-passing MIMD computers:
Overall organization. Cooperation among processes: message-passing communication and synchronization. Blocking vs. non-blocking, point-to-point vs. collectives, implementation aspects. Non-blocking communication and instruction reordering.
High-level parallel programming with message-passing:
the SPMD paradigm, the Message Passing Interface (MPI) standard. Practicals with MPI.
SIMD parallel computers: vector processors. Modern descendants of SIMD computers: vector extensions, GPUs.
GPU programming with CUDA.
Architecture of large-scale computing platforms.

Some of these topics (6,7,10) are for students attending the 9 credits course.

TEACHERS AND EXAM BOARD

DANIELE D'AGOSTINO

Ricevimento: Appointment by email or via TEAMS.

Exam Board

DANIELE D'AGOSTINO (President)

ANNALISA BARLA

GIORGIO DELZANNO (President Substitute)

NICOLETTA NOCETI (Substitute)

LESSONS

LESSONS START

In agreement with the calendar approved by the Degree Program Board of Computer Science.

Class schedule

The timetable for this course is available here: Portale EasyAcademy

EXAMS

EXAM DESCRIPTION

The exam consists in the discussion of a project made individually or in a small group (2-3 students) plus a short, individual oral exam on key topics presented in the course.

The project will consist in the parallelization of a sequential algorithm using openMP, MPI+openMP and CUDA.

ASSESSMENT METHODS

The project will be evaluated not only on the basis of the achievable performance of the parallel code, but also on how the code analysis, the adopted parallelization strategies and the achieved performance have been presented and discussed in the report.

This means for example that the parallel concepts are properly used, results are presented in a suitable and meaningful way, and overheads and issues have been correctly identified and discussed.

Exam schedule

Data appello	Orario	Luogo	Degree type
08/01/2025	09:00	GENOVA	Esame su appuntamento
04/06/2025	09:00	GENOVA	Esame su appuntamento
02/07/2025	09:00	GENOVA	Esame su appuntamento
10/09/2025	09:00	GENOVA	Esame su appuntamento