CODE 61884 ACADEMIC YEAR 2024/2025 CREDITS 9 cfu anno 2 COMPUTER SCIENCE 10852 (LM-18) - GENOVA 6 cfu anno 1 COMPUTER ENGINEERING 11160 (LM-32) - GENOVA SCIENTIFIC DISCIPLINARY SECTOR INF/01 LANGUAGE English TEACHING LOCATION GENOVA SEMESTER 1° Semester TEACHING MATERIALS AULAWEB OVERVIEW When the size of structured and unstructured data exceeds the capacity of conventional database management systems, advanced tools and methods are required for capturing, storing and managing data. Such huge amounts of data are usually stored in large-scale distributed environments, processed using specific advanced data processing environments, and specific tools for their management are usually required. Semantic information plays a relevant role in this context. AIMS AND CONTENT LEARNING OUTCOMES Learning the theoretical, methodological, and technological fundamentals of data management for advanced data processing architectures, with a specific reference to large-scale distributed environments, like key elements of NoSQL and stream-based systems as well as basic issues in parallel and distributed query processing, multi-query processing, and high-throughput transactional systems. AIMS AND LEARNING OUTCOMES At the end of the course, students will be able to: 6 CFU DESCRIBE the principles for data management in distributed systems, environments for large-scale data processing, and systems for large-scale data management UNDERSTAND the differences between traditional data processing and management and large-scale data processing and management UNDERSTAND the differences between the presented approaches for large-scale data management SELECT the system and the methodology for large-scale data management, suitable in a given application context USE some of the presented systems for large-scale data management, for solving simple problems USE at least one of the presented systems for large-scale data management for solving non-trivial problems ANSWER questions related to large-scale data management SOLVE exercizes related to the data design in some of the presented systems and the interaction with such systems, through the available languages 9 CFU UNDERSTAND the differences between traditional data processing and management and large-scale (semantic) data processing and management UNDERSTAND the differences between the presented approaches for large-scale (semantic) data management SELECT the system and the methodology for large-scale (semantic) data management, suitable in a given application context USE some of the presented systems for large-scale (semantic) data management, for solving simple problems USE at least one of the presented systems for large-scale (semantic) data management for solving non-trivial problems ANSWER questions related to large-scale (semantic) data management SOLVE exercizes related to the data design in some of the presented systems and the interaction with such systems, through the available language PREREQUISITES Prerequisites correspond to basic notions of data management in traditional systems: Data model, notion of schema and instance Conceptual data model Relational model (logical model) Conceptual design Logical design Basics of normalization theory Relational algebra SQL Index Transaction TEACHING METHODS For LM in Computer Science: Class, lab, and project For LM in Computer Engineering: Class, lab, and project SYLLABUS/CONTENT Recap on large scale distributed architectures and data-intensive computing [only 6 CFU] Recap on big data and distributed archtectures Principles of large scale data management Architectural approaches for large scale data management Recap on environments for large scale data processing (MapReduce, Spark) Systems for large-scale data management [6-9 CFU] Introduction to NoSQL systems NoSQL data models Key-value data stores Document-based data stores Column-family data stores Graph-based data stores Semantic data management [only 9 CFU]] The role of semantics in data management Models, languages, and systems for semantic data management Knowledge graphs and ontologies for data integration RECOMMENDED READING/BIBLIOGRAPHY Serge Abiteboul, Ioana Manolescu, Philippe Rigaux, Marie-Christine Rousset, Pierre Senellart. Web Data Management. Cambridge University Press, 2011. P.J. Sadalage, M.Fowler. Nosql Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Addison Wesley, 2013 Jeff Carpenter, Eben Hewitt, Cassandra: The Definitive Guide, O'Reilly Media, 2016 Ian Robinson, Jim Webber & Emil Eifrem. Graph Databases, New Opportunities for Connected Data, 2nd Edition, O’Reilly, 2015 Additional material and references provided by the instructors. TEACHERS AND EXAM BOARD BARBARA CATANIA Ricevimento: Appointment by email or by Microsoft Teams Office: Valle Puggia – 327 GIOVANNA GUERRINI Ricevimento: Appointment by email or by Microsoft Teams Office: Valle Puggia – 301 Exam Board BARBARA CATANIA (President) DANIELE D'AGOSTINO GIOVANNA GUERRINI (President Substitute) LESSONS LESSONS START In agreement with the calendar approved by the Degree Program Board of Computer Science. Class schedule The timetable for this course is available here: Portale EasyAcademy EXAMS EXAM DESCRIPTION Written examination (a different number of questions will be proposed depending on the number of the CFU); project development (mandatory); oral examination During the semester, we will propose some groupworks, to be developed on one of the presented systems. ASSESSMENT METHODS Details on how to prepare for the examination and the required degree of knowledge for each topic will be provided during the lessons. The written exam consists of a set of open and closed questions and exercises on basic topics of the course: the goal of open and closed questions is to verify the understanding of the main issues addressed during the lessons, the aim of exercises is to check the ability to select the right system for a given scenario and solve simple problems related to data modeling and querying in large-scale data management systems. The project consists in the design and the implementation of a data store for one of the systems presented in the course; the presentation of the performed activity help us in checking the ability of using at least one system for large-scale data management for solving non trivial problems and the ability of communicating t(in a written way) he result of the activity in a clear and complete way. The oral exam consists of the presentation of the solutions developed by the student for the given project, in order to assess whether the student has reached an appropriate level of knowledge and the ability of communicating (in an oral way) the result of the activity in a clear and complete way. For students that do not successfully complete the assignments, the oral exam will also include theoretical questions and / or practices of the course topics to better understand the understanding of the main issues addressed during the lessons. Exam schedule Data appello Orario Luogo Degree type Note 09/01/2025 09:00 GENOVA Scritto 06/02/2025 09:00 GENOVA Scritto 11/06/2025 09:00 GENOVA Scritto 17/07/2025 09:00 GENOVA Scritto 11/09/2025 09:00 GENOVA Scritto