Salta al contenuto principale della pagina

LARGE-SCALE COMPUTING

CODE 101799
ACADEMIC YEAR 2021/2022
CREDITS
  • 9 cfu during the 1st year of 10852 COMPUTER SCIENCE (LM-18) - GENOVA
  • 6 cfu during the 2nd year of 9011 MATEMATICA(LM-40) - GENOVA
  • 5 cfu during the 2nd year of 8732 INGEGNERIA ELETTRONICA (LM-29) - GENOVA
  • SCIENTIFIC DISCIPLINARY SECTOR INF/01
    LANGUAGE English
    TEACHING LOCATION
  • GENOVA
  • SEMESTER 1° Semester
    TEACHING MATERIALS AULAWEB

    OVERVIEW

    Large scale Computing generally refers to the capability of hardware and software systems to dynamically adapt to an increasing load typically employing multiple, distributed nodes to complete a given processing task. Since we are in the Big Data Era, Large Scale Computing models and frameworks are becoming necessary for Data-intensive computations, a class of  computing applications which use a data parallel approach to process large volumes of data based on the Map-Reduce paradigm.

    AIMS AND CONTENT

    LEARNING OUTCOMES

    Learning the theoretical, methodological, and technological fundamentals of advanced data processing architectures, large-scale distributed environments, and data intensive programming including Docker, HDFS, Hadoop, Spark, and Cloud/IoT platforms.

    AIMS AND LEARNING OUTCOMES

    The course has three specific aims: 

    1. to introduce students to the main concepts and methodologies used in Distributed Computing such as the CAP Theorem, Partitioning and Replication, Fault Tolerance and Coordination.
    2. to let students acquire knowledge and practical skills via practical assignments on functional, concurrent and distributed programming based on languages such as Python, Scala, etc.
    3. to let students test their acquired knowledge and skills in a final project to be developed in the Apache Hadoop/Spark cluster architecture using libraries for batch and streaming data processing such as dataframe, spark streaming, mllib, etc.

    PREREQUISITES

    Good programming skills and solid background on operating systems, databases, algorithms and data structures.

    TEACHING METHODS

    Frontal and online lectures, assignments, lab sessions, final project

    SYLLABUS/CONTENT

    • Introduction to Distributed Systems and Cloud Computing 
    • Distributed data systems  and shared nothing architectures
    • Partitioning & Replication
    • Fault Tolerance
    • CAP Theorem
    • Hadoop & MapReduce (incl. HDFS, Hadoop Runtime)
    • Spark (Internals, RDD Programming, Dataframes, Spark Streaming)

    RECOMMENDED READING/BIBLIOGRAPHY

    Material and reference in the aulaweb module of the course

    TEACHERS AND EXAM BOARD

    Exam Board

    GIORGIO DELZANNO (President)

    GIOVANNA GUERRINI

    BARBARA CATANIA (President Substitute)

    FEDERICO DASSERETO (Substitute)

    LESSONS

    LESSONS START

    Beginning of the first semester

    Class schedule

    All class schedules are posted on the EasyAcademy portal.

    EXAMS

    EXAM DESCRIPTION

    Evaluation of assignments submitted during the semester on the Github platform and Aulaweb

    Evaluation of material related to final project proposal (slides, presentation, source code on Github)

    Discussion of assignments and final project

    Exam schedule

    Date Time Location Type Notes
    17/01/2022 09:00 GENOVA Esame su appuntamento
    14/06/2022 09:00 GENOVA Esame su appuntamento
    05/09/2022 09:00 GENOVA Esame su appuntamento
    09/01/2023 09:00 GENOVA Esame su appuntamento