CODE 61884 ACADEMIC YEAR 2017/2018 CREDITS 9 cfu anno 1 INFORMATICA 9014 (LM-18) - SCIENTIFIC DISCIPLINARY SECTOR INF/01 LANGUAGE English TEACHING LOCATION SEMESTER 1° Semester TEACHING MATERIALS AULAWEB OVERVIEW When the size of structured and unstructured data exceeds the capacity of conventional database management systems, advanced tools and methods are required for capturing, storing and managing data. Such huge amounts of data are usually stored in large-scale distributed environments, processed using specific advanced data processing environments, may be already available or arrive as a stream at processing time, and specific tools for their management are usually required. AIMS AND CONTENT LEARNING OUTCOMES Students will be provided with a sound grounding on theoretical, methodological, and technological fundamentals concerning data management for advanced data processing architectures, with a specific reference to large-scale distributed environments. Students will learn key elements of NoSQL and stream-based systems as well as basic issues in parallel and distributed query processing, multi-query processing, and high-throughput transactional systems. Students will be involved in project activities. TEACHING METHODS Class, project and outside preparation SYLLABUS/CONTENT Introduction to data management in distributed systems Introduction to Big Data Introduction to distributed archtectures Principles of large scale data management Architectural approaches for large scale data management Environments for large scale data processing (data-intensive computing) Batch processing and MapReduce paradigm From (Hadoop) MapReduce to Spark High level languages for large scale data processing Systems for large-scale data management Introduction to NoSQL systems NoSQL data models Column-family data stores Graph-based data stores Stream-based data management Introduction to stream data management Models and languages for stream-data management Large-scale stream data management RECOMMENDED READING/BIBLIOGRAPHY Serge Abiteboul, Ioana Manolescu, Philippe Rigaux, Marie-Christine Rousset, Pierre Senellart. Web Data Management. Cambridge University Press, 2011. Martin Kleppmann. Designing Data-Intensive Applications. O'Reilly, 2017. + Material and references provided by the instructors. TEACHERS AND EXAM BOARD BARBARA CATANIA Ricevimento: Appointment by email Office: Valle Puggia – 301 GIOVANNA GUERRINI Ricevimento: Appointment by email Office: Valle Puggia – 328 Exam Board BARBARA CATANIA (President) LAURA DI ROCCO GIOVANNA GUERRINI ELENA ZUCCA LESSONS LESSONS START Tuesday, October 17th 2017 Class schedule ADVANCED DATA MANAGEMENT EXAMS EXAM DESCRIPTION Written examination, oral examination (including project discussion). ASSESSMENT METHODS Details on how to prepare for the examination and the required degree of knowledge for each topic will be provided during the lessons. During the semester, we will propose some groupworks as well as a project, whose development should be delivered just before the written examination. In case of positive rate of the exercizes: the written exam consists of a set of questions and exercizes on basic topics of the course; the goal of this test is to verify the understanding of the main issues addressed during the lessons; the oral exam consists of: (i) in-depth discussion of the solutions developed by the student for the given project, in order to assess not only whether the student has reached an appropriate level of knowledge, but also whether she/he has acquired the ability to critically analyze issues related to data management in large scale environments. In case of negative rate of the exercizes: the written exam consists of a set of questions and exercises on the basic topics of the course; the goal of this test is to verify the understanding of the main issues addressed during the lessons; the oral exam consists of: (i) in-depth discussion of the solutions developed by the student for the given project, in order to assess not only whether the student has reached an appropriate level of knowledge, but also whether she/he has acquired the ability to critically analyze issues related to data management in large scale environments; (Ii) theoretical questions and / or practices of the arguments in teaching, with particular reference to matters for which deficiencies were highlighted in the written test or in the project development. Exam schedule Data appello Orario Luogo Degree type Note 16/02/2018 09:00 GENOVA Esame su appuntamento 27/07/2018 09:00 GENOVA Esame su appuntamento 21/09/2018 09:00 GENOVA Esame su appuntamento 28/02/2019 09:00 GENOVA Esame su appuntamento