Skip to main content
CODE 101798
ACADEMIC YEAR 2023/2024
CREDITS
SCIENTIFIC DISCIPLINARY SECTOR INF/01
LANGUAGE English
TEACHING LOCATION
  • GENOVA
SEMESTER 1° Semester
TEACHING MATERIALS AULAWEB

OVERVIEW

The course introduces data engineering skills that are at the basis of structured data analysis and Business Intelligence. Students will llearn the basics of structured data modeling for analysis, develop an in-depth understanding of data warehouse design and data manipulation and gain practical experience in formulating OLAP (online analytical processing) queries in SQL. Large scale data analysis frameworks will also be introduced. In the practical activities, students will work with large data sets in a data warehouse environment to design and populate a data warehouse, query it and create dashboards, using BI and ETL tools as well as OLAP servers. In the final project, such skills will be applied to build a small, basic data warehouse, populate it with data, and create dashboards and other visualizations to analyze and communicate the data to a broad audience.

AIMS AND CONTENT

LEARNING OUTCOMES

Learning the theoretical, methodological, and technological fundamentals of data management and analysis in decision support systems, with a specific reference to data warehousing architectural and design issues, as well as key elements of data integration and governance, data quality and cleaning, ExtractionTransformation-Loading processes, conceptual, logical, and physical design of data warehouses, storage architectures and scalable parallel processing, use of data warehouses for business reporting and online analytical processing.

AIMS AND LEARNING OUTCOMES

DESCRIBE the principles for data analysis and large-scale data analysis
UNDERSTAND the differences between data management (OLTP) systems and data analysis (OLAP) systems
UNDERSTAND the differences between design issues and methodologies for databases and for datawarehouses
UNDERSTAND the main issues in data quality, data integration, data cleaning, and data governance
UNDERSTAND the main issues in data warehouse design, with specific reference to conceptual design, ROLAP logical design, view selection, physical design and ETL design
UNDERSTAND the main issues in large scale data analysis 
UNDERSTAND the use of precomputation and materialized views in data warehousing, and SELECT the most appropriate set of views 
UNDERSTAND the main OLAP operators and the SQL extensions to express them, and APPLY them to formulate analytical queries
APPLY the methodology to design a data warehouse, starting from operational data and business questions through conceptual design, logical design, view selection and physical design
SELECT the most adequate systems and languages for a given analysis context
USE some of the presented systems for data exploration, data reconciliation, data warehouse storage, data reporting and OLAP querying
USE some of the presented systems for constructing a datawarehouse from a given operational dataset and for performing non-trivial analyses on it

PREREQUISITES

Fundamentals of database models, languages, and systems.

  • Entity-relationship (conceptual) model
  • Relational (logical) model
  • Schema normalization
  • Relational algebra & SQL
  • Indexes
  • Transactions

TEACHING METHODS

Class, hands-on lab activities, and outside preparation.

SYLLABUS/CONTENT

The course will present the main architectural and design issues related to data management and analysis in data support systems (data warehousing), comparing them with traditional transactional systems.

  • Introduction. IT technologies to support decisions. Differences between OLAP and OLTP. Data warehousing and Data mining. Business Intelligence.
  • Data integration and data quality.
  • Data models for data warehouses. Conceptual data model. Dimensions, measures and hierarchies. Multidimensional data model. ROLPAP models: star and snowflake schemas.
  • Back-end.Architectures. Storage structures and indexes in OLAP. Materialized views. Optimization of OLAP queries.
  • Data warehouse design. Conceptual, logical and physical design.
  • ETL functionalities and approaches.
  • Front-end. OLAP queries and reporting. SQL OLAP extensions. 
  • Large scale data analysis: Hive and SparkSQL

RECOMMENDED READING/BIBLIOGRAPHY

  • ​M. Golfarelli, S. Rizzi. Data Warehouse Design. Mc-Graw Hill 2009. 
  • R. Kimball, M. Ross. The Data Warehouse Toolkit. Wiley, 2013.
  • C. Jensen, T. Bach Pedersen, C. Thomsen. Multidimensional Databases and Data Warehousing. Morgan&Claypool, 2010.
  • A.Vaisman, E. Zimányi. Data Warehouse Systems: Design and Implementation. Springer, 2014

TEACHERS AND EXAM BOARD

Exam Board

GIOVANNA GUERRINI (President)

GIANNA REGGIO

BARBARA CATANIA (President Substitute)

LESSONS

LESSONS START

In agreement with the calendar approved by the Degree Program Board of Computer Science.

Class schedule

L'orario di tutti gli insegnamenti è consultabile all'indirizzo EasyAcademy.

EXAMS

EXAM DESCRIPTION

Written examination, oral examination (including project discussion).

ASSESSMENT METHODS

Details on how to prepare for the examination and the required degree of knowledge for each topic will be provided during the lessons.

The written exam consists of a set of open and closed questions and exercises on basic topics of the course: the goal of open and closed questions is to verify the understanding of the main issues addressed during the lessons, the aim of exercises is to check the ability to apply the notions, the languages and the methodology, and to solve simple problems related to data modeling and querying in data warehousing.

The assignments consist in the design and the implementation of a data warehouse (back and front-end); the presentation of the performed activity help us in checking the ability of using at least one system for data  for non-trivial data analysis tasks and the ability of communicating (in a written way) he result of the activity in a clear and complete way. 

The oral exam consists of the presentation of the solutions developed by the student for the assignments, in order to assess whether the student has reached an appropriate level of knowledge and the ability of communicating (in an oral way) the result of the activity in a clear and complete way.

Exam schedule

Data Ora Luogo Degree type Note
09/01/2024 14:00 GENOVA Scritto
09/02/2024 14:00 GENOVA Scritto
11/06/2024 09:00 GENOVA Scritto
16/07/2024 09:00 GENOVA Scritto
11/09/2024 09:00 GENOVA Scritto