Information updated until 30/06/2026 CODE 101747 ACADEMIC YEAR 2026/2027 CREDITS 6 cfu anno 3 INFORMATICA 8759 (L-31) - GENOVA SCIENTIFIC DISCIPLINARY SECTOR INF/01 LANGUAGE Italian TEACHING LOCATION GENOVA SEMESTER 2° Semester TEACHING MATERIALS AULAWEB OVERVIEW The course introduces the fundamental principles of Data Science, with particular emphasis on Machine Learning methodologies and algorithms for data analysis and interpretation. After providing an overview of the main stages that characterize a Data Science project—from data collection and preprocessing to the development and evaluation of predictive models—the course presents some of the most widely used Machine Learning approaches for classification and prediction tasks. Particular attention is devoted to understanding the mathematical and statistical foundations of the methods studied, as well as their practical implementation using software tools and libraries commonly employed in Data Science. Laboratory sessions and a final project will enable students to apply the acquired knowledge to real-world problems, developing both theoretical understanding and practical skills. AIMS AND CONTENT LEARNING OUTCOMES The course aims to provide students with the theoretical and practical knowledge required to address Data Science problems using Machine Learning techniques. By the end of the course, students will have gained familiarity with the main machine learning models and algorithms, understanding their mathematical and statistical foundations, underlying assumptions, limitations, and potential applications. Students will also be able to implement, evaluate, and compare these methodologies using software tools and libraries commonly employed in Data Science and Machine Learning. AIMS AND LEARNING OUTCOMES The course introduces the fundamental tools for the formulation and analysis of Data Science problems, with particular emphasis on the mathematical and statistical methodologies that underpin the most widely used Machine Learning techniques. It provides the knowledge required to understand the different stages of a Data Science pipeline, from data collection and preprocessing to the development, validation, and comparison of predictive models. Particular attention is devoted to understanding the theoretical principles underlying the methods studied and to their implementation through computational tools and widely adopted software libraries. Upon successful completion of the course, students will be able to: understand and describe the main stages of a Data Science workflow; interpret the probabilistic and statistical foundations of the Machine Learning methods presented in the course; formulate basic classification and prediction problems in mathematical and computational terms; apply and compare the main Machine Learning algorithms introduced during the course; evaluate model performance using appropriate validation and model selection techniques; use Python libraries for Data Science and Machine Learning to implement, test, and analyze models on real-world datasets. PREREQUISITES Basic knowledge of mathematical analysis, probability and statistics, and linear algebra. Basic programming skills, preferably in Python. TEACHING METHODS Lectures will be complemented by laboratory sessions. SYLLABUS/CONTENT The course will cover the following topics: Introduction to Data Science and Machine Learning Definitions, objectives, and main applications The data analysis process and the development of predictive models. Probability and Statistics Review Probability and conditional probability. Discrete random variables. Mean and variance. Common probability distributions. Joint and conditional distributions. Covariance and correlation. Statistical Estimation Sample mean and its properties. Median estimator. Estimation as a prediction problem. Empirical risk and performance evaluation. Training and test set splitting. Linear Classification Methods Formulation of classification problems. Least-squares classification. Optimal theoretical estimators and empirical estimates. Logistic regression. Optimization through gradient descent. Nonlinear Classification Methods k-Nearest Neighbors algorithms. Parzen window method. Histogram-based classification. Decision trees. Model Evaluation and Selection Overfitting and underfitting. Model comparison. Validation strategies. Typical Machine Learning pipelines. Software Tools for Data Science and Machine Learning Introduction to Python libraries for data analysis. Use of specialized libraries such as NumPy, Pandas, Matplotlib, and Scikit-learn. Implementation and evaluation of Machine Learning models. Applied Project Development of a Data Science and Machine Learning project. Practical application of the methodologies and tools presented during the course. Analysis of results and discussion of modeling choices. RECOMMENDED READING/BIBLIOGRAPHY Lecture notes provided by the instructor. Additional supporting material may be made available throughout the course as needed. TEACHERS AND EXAM BOARD LUCA CALATRONI Ricevimento: By appointment. LESSONS LESSONS START According to the calendar approved by the Degree Program Board: https://corsi.unige.it/en/corsi/8759/studenti-orario Class schedule The timetable for this course is available here: Portale EasyAcademy EXAMS EXAM DESCRIPTION The examination consists of an oral assessment divided into two parts: discussion of a project on a topic assigned by the instructor; questions covering the theoretical contents of the course and the laboratory activities carried out during the semester. Information for students with certified specific learning disorders (SLD), disabilities, or other special educational needs is available at: https://corsi.unige.it/corsi/8759/studenti-disabilita-dsa ASSESSMENT METHODS Students are expected to demonstrate a solid understanding of the fundamental concepts covered in the course, with particular emphasis on the formulation of Data Science problems using Machine Learning techniques, the mathematical and statistical models underlying these methods, and the computational tools employed for their implementation and evaluation. In both components of the examination, assessment will take into account the accuracy and completeness of the acquired knowledge, the ability to apply the presented methods appropriately, the clarity of exposition, the quality of the proposed solutions, and the capacity for critical analysis and independent reasoning. FURTHER INFORMATION For further information, please refer to the course’s AulaWeb module or contact the instructor.