Skip to main content
CODE 61884
ACADEMIC YEAR 2018/2019
CREDITS
SCIENTIFIC DISCIPLINARY SECTOR INF/01
LANGUAGE English
TEACHING LOCATION
  • GENOVA
SEMESTER 1° Semester
TEACHING MATERIALS AULAWEB

OVERVIEW

When the size of structured and unstructured data exceeds the capacity of conventional database management systems,  advanced tools and methods are required for capturing, storing and managing data. Such huge amounts of data are usually stored in large-scale distributed environments, processed using specific advanced data processing environments, may be already available or arrive as a stream at processing time, and specific tools for their management are usually required.

AIMS AND CONTENT

LEARNING OUTCOMES

Students will be provided with a sound grounding on theoretical, methodological, and technological fundamentals concerning data management for advanced data processing architectures, with a specific reference to large-scale distributed environments. Students will learn key elements of NoSQL and stream-based systems as well as basic issues in parallel and distributed query processing, multi-query processing, and high-throughput transactional systems. Students will be involved in project activities.

AIMS AND LEARNING OUTCOMES

DESCRIBE the principles for data management in distributed systems, environments for large-scale data processing,  systems for large-scale data management and approaches for data stream management

UNDERSTAND the differences between traditional data processing and management and large-scale data processing and management 

UNDERSTAND the differences between the  presented approaches for large-scale data processing and management  

SELECT the system and the methodology for large-scale data processing or management, suitable in a given application context

USE some of the presented systems for large-scale data processing and management, for solving simple problems

USE at least one of the presented systems for large-scale data processing and management for solving non-trivial problems

ANSWER questions related to large-scale data processing and management 

SOLVE exercizes related to the design of large-scale data stores in one of the presented systems and the interaction with such systems, through the available languages

PREREQUISITES

Prerequisites correspond to basic notions of data management in traditional systems:

  • Data model, notion of schema and instance
  • Conceptual data model
  • Relational model (logical model)
  • Conceptual design 
  • Logical design
  • Basics of normalization theory)
  • Relational algebra
  • SQL
  • Index
  • Transaction 

TEACHING METHODS

For LM in Computer Science: Class, project and outside preparation

For LM in Computer Engineering:Class, project (optional) and outside preparation

SYLLABUS/CONTENT

Introduction to data management in distributed systems

Introduction to Big Data
Introduction to distributed archtectures
Principles of large scale data management
Architectural approaches for large scale data management

Environments for large scale data processing (data-intensive computing)

Batch processing and MapReduce paradigm
From (Hadoop) MapReduce to Spark
High level languages for large scale data processing

Systems for large-scale data management

Introduction to NoSQL systems
NoSQL data models
Column-family data stores
Graph-based data stores

Stream-based data management

Introduction to stream data management
Models and languages for stream-data management
Large-scale stream data management

RECOMMENDED READING/BIBLIOGRAPHY

Serge Abiteboul, Ioana Manolescu, Philippe Rigaux, Marie-Christine Rousset, Pierre Senellart. Web Data Management. Cambridge University Press, 2011.

Martin Kleppmann. Designing Data-Intensive Applications. O'Reilly, 2017.
+
Material and references provided by the instructors.

TEACHERS AND EXAM BOARD

Exam Board

BARBARA CATANIA (President)

GIOVANNA GUERRINI (President)

LAURA DI ROCCO

ELENA ZUCCA

LESSONS

LESSONS START

Tuesday, October 2nd (lesson 0), introduction to the course), 9.00, Room 216
From Tuesday, October 9th: schedule online

EXAMS

EXAM DESCRIPTION

Written examination, oral examination (including project discussion).

LM In Computer Science: written exam (part I and II) and oral exam (including project discussion)

LM in Computer Engineering: students can choose between:

  • written exam (part I and II) and 
  • written exam (parte I) and oral exam (including project discussion)

ASSESSMENT METHODS

Details on how to prepare for the examination and the required degree of knowledge for each topic will be provided during the lessons.

During the semester, we will propose some groupworks as well as a project, to be developed on one of the presented systems.
The project is mandatory for LM in Computer Science, optional for LM in Computer Engineering. In case a student enrolled in the LM in Computer Engineering chooses to work on the project, the written exam will be reduced to part I.

In case of positive rate of the exercizes:

  • the written exam consists of a set of questions (part I) and exercizes (part II) on basic topics of the course; the goal of this test is to verify the understanding of the main issues addressed during the lessons;
  • the oral exam consists of: for students that worked on the project, an in-depth discussion of the solutions developed by the student for the given project, in order to assess not only whether the student has reached an appropriate level of knowledge, but also whether she/he has acquired the ability to critically analyze issues related to data management in large scale environments.

In case of negative rate of the exercizes:

  • the written exam consists of a set of questions and exercises on the basic topics of the course; the goal of this test is to verify the understanding of the main issues addressed during the lessons;
  • the oral exam consists of: (i) for students that worked on the project, an in-depth discussion of the solutions developed by the student for the given project, in order to assess not only whether the student has reached an appropriate level of knowledge, but also whether she/he has acquired the ability to critically analyze issues related to data management in large scale environments; (Ii) theoretical questions and / or practices of the arguments in teaching, with particular reference to matters for which deficiencies were highlighted in the written test or in the project development.

Exam schedule

Data appello Orario Luogo Degree type Note
15/02/2019 09:00 GENOVA Esame su appuntamento
26/07/2019 09:00 GENOVA Esame su appuntamento
20/09/2019 09:00 GENOVA Esame su appuntamento