When the size of structured and unstructured data exceeds the capacity of conventional database management systems, advanced tools and methods are required for capturing, storing and managing data. Such huge amounts of data are usually stored in large-scale distributed environments, processed using specific advanced data processing environments, may be already available or arrive as a stream at processing time, and specific tools for their management are usually required.
Students will be provided with a sound grounding on theoretical, methodological, and technological fundamentals concerning data management for advanced data processing architectures, with a specific reference to large-scale distributed environments. Students will learn key elements of NoSQL and stream-based systems as well as basic issues in parallel and distributed query processing, multi-query processing, and high-throughput transactional systems. Students will be involved in project activities.
DESCRIBE the principles for data management in distributed systems, environments for large-scale data processing, systems for large-scale data management and approaches for data stream management
UNDERSTAND the differences between traditional data processing and management and large-scale data processing and management
UNDERSTAND the differences between the presented approaches for large-scale data processing and management
SELECT the system and the methodology for large-scale data processing or management, suitable in a given application context
USE some of the presented systems for large-scale data processing and management, for solving simple problems
USE at least one of the presented systems for large-scale data processing and management for solving non-trivial problems
ANSWER questions related to large-scale data processing and management
SOLVE exercizes related to the design of large-scale data stores in one of the presented systems and the interaction with such systems, through the available languages
Prerequisites correspond to basic notions of data management in traditional systems:
For LM in Computer Science: Class, project and outside preparation
For LM in Computer Engineering:Class, project (optional) and outside preparation
Introduction to data management in distributed systems
Introduction to Big Data Introduction to distributed archtectures Principles of large scale data management Architectural approaches for large scale data management
Environments for large scale data processing (data-intensive computing)
Batch processing and MapReduce paradigm From (Hadoop) MapReduce to Spark High level languages for large scale data processing
Systems for large-scale data management
Introduction to NoSQL systems NoSQL data models Column-family data stores Graph-based data stores
Stream-based data management
Introduction to stream data management Models and languages for stream-data management Large-scale stream data management
Serge Abiteboul, Ioana Manolescu, Philippe Rigaux, Marie-Christine Rousset, Pierre Senellart. Web Data Management. Cambridge University Press, 2011.
Martin Kleppmann. Designing Data-Intensive Applications. O'Reilly, 2017. + Material and references provided by the instructors.
Ricevimento: Appointment by email Office: Valle Puggia – 301
Ricevimento: Appointment by email Office: Valle Puggia – 328
BARBARA CATANIA (President)
GIOVANNA GUERRINI (President)
LAURA DI ROCCO
ELENA ZUCCA
Tuesday, October 2nd (lesson 0), introduction to the course), 9.00, Room 216 From Tuesday, October 9th: schedule online
ADVANCED DATA MANAGEMENT
Written examination, oral examination (including project discussion).
LM In Computer Science: written exam (part I and II) and oral exam (including project discussion)
LM in Computer Engineering: students can choose between:
Details on how to prepare for the examination and the required degree of knowledge for each topic will be provided during the lessons.
During the semester, we will propose some groupworks as well as a project, to be developed on one of the presented systems. The project is mandatory for LM in Computer Science, optional for LM in Computer Engineering. In case a student enrolled in the LM in Computer Engineering chooses to work on the project, the written exam will be reduced to part I.
In case of positive rate of the exercizes:
In case of negative rate of the exercizes: