Skip to main content
CODE 41601
ACADEMIC YEAR 2025/2026
CREDITS
SCIENTIFIC DISCIPLINARY SECTOR SECS-S/01
LANGUAGE English
TEACHING LOCATION
  • GENOVA
SEMESTER 2° Semester
PREREQUISITES
Propedeuticità in ingresso
Per sostenere l'esame di questo insegnamento è necessario aver sostenuto i seguenti esami:
  • ECONOMICS AND DATA SCIENCE 11267 (coorte 2025/2026)
  • SOFTWARE R 106839 2025
  • ECONOMICS AND DATA SCIENCE 11937 (coorte 2025/2026)
  • SOFTWARE R 106839 2025

OVERVIEW

The course “Statistical learning” covers the fundamental elements of both supervised and unsupervised statistical learning. A broad overview of learning techniques is provided: regression models, generalized linear models, non-parametric regression and classification algorithms, model selection methods, unsupervised techniques (cluster analysis and principal component analysis).

AIMS AND CONTENT

LEARNING OUTCOMES

The course aims to provide a comprehensive overview of the main statistical learning techniques, both supervised and unsupervised. To ensure a solid understanding of the methodology underlying statistical learning techniques, the course begins with an introduction to statistical inference, covering both parametric approaches (likelihood-based) and non-parametric approaches (simulation and bootstrap methods). In the area of supervised learning, the course covers the main techniques for regression and classification, including both parametric methods (regression and generalized linear models) and non-parametric methods. In the area of unsupervised learning, topics include clustering and dimensionality reduction techniques. The topics covered and the examples presented enable students to have a solid knowledge of statistical methodology, to be autonomous in addressing data analysis and forecasting problems, and to apply the techniques learned in various contexts, particularly in the economic sciences.

AIMS AND LEARNING OUTCOMES

The course is divided into three parts:

  1. Elements of parametric and non-parametric inferential statistics: maximum likelihood inference and exponential class models; estimation through Monte Carlo simulation techniques and bootstrap techniques.

  2. Supervised learning: Multiple regression, introduction to the theory of generalized linear models (logistic and counting regression), some non-parametric regression and classification techniques. Diagnostic techniques and model selection methods.

  3. Unsupervised learning: Cluster analysis, principal component analysis.

All the topics will be accompanied by practical exercises in R, so that the student can also combine the understanding of the theory with the ability to apply correct statistical analyses in real contexts and to read correctly the output of the statistical procedures.

Knowledge and understanding: Students will know the main techniques and the main tools of statistical learning. They must be able to frame these tools in general terms (both theoretical and applied), and to analyze the underlying mathematical and statistical background.

Ability to apply knowledge and understanding: Students will be able to identify, when faced with problems from different contexts, the correct analysis. Moreover, they will be able to evaluate the results obtained through statistical software.

Making judgments: Students will have to become aware of the potential and limits of the statistical techniques, through the analysis of examples and case studies.

Communication skills: Students must be able to use the correct technical statistical language for the communication of the results and for the description of the techniques.

Learning skills: Students will develop adequate learning skills to continue with further studies about other aspects of the subject and different fields of application than those illustrated. Furthermore, they must also be able to use the R software in a general context.

PREREQUISITES

The typical skills of the introductory courses in Mathematics and Statistics for Economics and Business. Operational skills in the following topics will also be required: (a) computation of maxima and minima for functions of several variables; (b) basic matrix algebra and computation of eigenvalues ​​and eigenvectors; (c) basics of the R software.

TEACHING METHODS

Lectures and computer lab tutorials with R. Discussion of case studies. 24 hours (approx. 1/3 of the total) will be held in the computer lab.

Students who have valid certification of physical or learning disabilities  and who wish to discuss possible accommodations or other circumstances regarding lectures, coursework and exams, should speak both with the instructor and with Professor Serena Scotto (scotto@economia.unige.it), the Department’s disability liaison.

 

 

 

SYLLABUS/CONTENT

0. Introduction and review of probability.

1. Likelihood. Maximum likelihood estimation. Information. The exponential family. Examples for discrete and continuous parametric distributions.

2. Multivariate distributions: the multivariate normal.

3. Monte Carlo simulation and bootstrap.

4. Multiple linear regression. Diagnostics for regression models.

5. Generalized linear models: Logistic regression and regression for counts.

6. Non-parametric regression and classification techniques.

7. Model selection.

8. Cluster analysis.

9. Principal component analysis.

RECOMMENDED READING/BIBLIOGRAPHY

Evans, Rosenthal. Probability and Statistics. The Science of Uncertainty, Second edition, 2023 (available from the Authors’ webpage).

James, Witten, Hastie and Tibshirani. An Introduction to Statistical Learning. With Applications in R. Springer, 2023 (available from the Authors’ webpage).

Hastie, Tibshirani, Friedman. The Elements of Statistical Learning. Springer Springer, 2017 (available from the Authors’ webpage).

Additional course materials will be available on AulaWeb.

 

TEACHERS AND EXAM BOARD

LESSONS

LESSONS START

This class follows the Department calendar for the 2nd semester.

Class schedule

The timetable for this course is available here: Portale EasyAcademy

EXAMS

EXAM DESCRIPTION

Attending students:

The exam consists of written reports on three data analysis exercises in R and a final discussion. One exercise focuses on Monte Carlo and bootstrap methods, one on supervised learning, and one on unsupervised learning.

Non-attending students:

The exam is written and consists of three questions related to the exam program and referred to the textbooks indicated in the bibliography. The syllabus with the correspondence between the topics in the program and the textbooks will be made available on AulaWeb before the course starts.

The complete exam rules will be available on the AulaWeb page before the course starts.

 

ASSESSMENT METHODS

The exercises in R for attending students must prove: a) mastery of the statistical techniques used; b) the adequacy and correctness in the interpretation of the outputs obtained; c) the ability to frame the statistical analysis within the scope of the proposed application; d) where necessary, the ability to properly use statistical methods not presented during the lessons. During the discussion of the reports, students must prove: a) the correct use of the technical language of statistics; b) the knowledge of the methodology underlying the techniques used for data analysis.

The questions in the written exam for non-attending students are chosen in order to cover, as far as possible, all the topics of the exam program. The proposed questions are intended to evaluate the degree of knowledge of the subject, the acquisition of the correct technical language, the critical capacity of the student and the ability to frame the statistical results within the scope of practical data analyses.

FURTHER INFORMATION

Students are invited to regularly check the AulaWeb page of the course, where all the material concerning both the theoretical part and the R lab exercises will be made available.