Skip to main content
CODE 52480
SEMESTER 2° Semester


The course introduces the student to the exploratory statistical analysis of multivariate data by pointing out the mathematical aspects and by developing the essential skills for the interpretation of the data under investigation. Laboratory sessions provide students with the opportunity to analyse, discuss, and solve real problems.



The course introduces the student to the exploratory statistical analysis of multivariate data by pointing out the mathematical aspects and by developing the essential skills for the interpretation of the data under investigation. Laboratory sessions provide students with the opportunity to analyse, discuss, and solve real problems.


The descriptive (or exploratory) data analysis consists of techniques for summarizing the main characteristics of the dataset, frequently with graphical instruments. This is the first and fundamental step of any statistical analysis. The proposed methods, especially the multivariate ones, require mathematical tools, in particular algebraic and geometric ones, which are developed in concurrent courses.

Therefore, the following are the course's primary learning goals:

  • apply the main methodologies for the analysis of univariate and multivariate data in a descriptive perspective
  • identify the data set's properties by selecting and calculating the relevant statistical indices
  • analyze a dataset using suitable more complex statistical methods, such as Cluster Analysis (CA), Principal Components Analysis (PCA), and Classical Linear Model (LM).

Cross-cutting skills like teamwork, the ability to relate to the results of statistical analysis, and the development of a flexible mindset that enables you to quickly adjust to new conditions are among the non-secondary aims of education. These competencies are developed through computer lab activities, where using R software for data analysis helps enhance computer skills.

At the end of the course, the student will also have mastered the following basic skills by taking part in the planned group activities:

  • alphabetical-functional (compilation of reports prepared in group)
  • design creation proficiency (selection of the appropriate statistical and mathematical tools and the presentation style of the analysis carried out on the proposed datasets)
  • aptitude for learning (deepening understanding of crucial analytical themes) 
  • social competence (implicated by the volume and the variety of decisions the group must make).


The course consists of approximately 50 hours of classroom lessons (including theory and exercises) and about 24 hours of computer laboratory practices. There are also approximately 8 hours of supervised classroom exercises.

The purposes of the computer laboratory lessons are:

  • learning a programming language oriented to statistics (R software),
  • applying the statistical methodologies presented in class
  • interpretate the data and the results of the analysis.

The lab activities should also enable the student to verify the level of understanding of statistical theory learned and its practical use.

The supervised classroom exercises activities are lessons where the students can make some proposed exercises (alone or in groups) in the classroom; the presence of teachers and tutors should allow to clarify doubts and overcome any difficulties.


Exploratory analysis of uni- and bi-variate data.
Qualitative/categorical variables. Counts and frequencies, distribution of a variable. Joint and marginal distributions of two variables, conditional distributions (row and column profiles). Independence. Graphical representations. 
Quantitative variables. Distribution and cumulative distribution functions, quantile function, and their graphical representations. Measures of centrality and dispersion based on moments and quantiles; their properties and L1 and L2 metrics. Covariance and correlation between two quantitative variables. Geometrical interpretation of variance, covariance and correlation.

Exploratory analysis of multivariate data.
Cluster analysis. Hierarchical clustering: linkages based on distance and inertia; dendogram; induced ultra-metric; variable clustering. K-means clustering: initialization and stop of algorithm, stable clusters.
Principal component analysis. ``Best’’ representation of multivariate data (row points of data matrix) in a vector space with lower dimension; accuracy of representation. Change of base (eigenvectors of the correlation matrix). Properties of principal components. Geometrical representation of correlations.
Multiple regression. Vector space generated by the explanatory variables (column points of data matrix).  Linear least square method and geometrical meaning of residual minimization.  Variance decomposition of the response variable. Descriptive goodness-of-fit: residual plots and R-sq index (with geometrical interpretation). One-way ANOVA (analysis of variance) and between/within variance decomposition.

Pratical sections in lab using software R


M. P. Rogantin (2016) Statistica descrittiva
(available on AulaWeb and at

Maindonald J., Braun W. J, (2010). Data analysis and graphics using R: an example-based approach. 3. ed. Cambridge University press

I.T. Jolliffe (2002). Principal Component Analysis. Springer Series in Statistics


Exam Board



EVA RICCOMAGNO (President Substitute)

ALBERTO SORRENTINO (President Substitute)



The class will start according to the academic calendar.

Class schedule

The timetable for this course is available here: Portale EasyAcademy



The exam is divided into three parts:

  • written test
  • two reports about the last two computer laboratory sessions
  • oral exam.

The maximum score of the written test is 20 points, of the reports 10 points.

The oral exam can be accessed if the score of the written test is greater than or equal to 12 and if the sum of the scores of the reports and the written test is greater than or equal to 18.


Written test

The written test is aimed to verify the ability to carry out exercises, to calculate and interprete data analysis. The comment of parts of the output obtained with the R software is also required. The written test can be replaced by two intermediate tests if the scores of both are positive (≥ 12). The first intermediate test is scheduled during the lessons, the second one can be done simultaneously with the first final written test (June).

Reports of the last two computer laboratory sessions

The reports are aimed to evaluate the ability to carry out descriptive analyzes of datasets by using the mathematical-statistical methods presented in the course. The assessment takes into account:

  • the appropriate use of such methods
  • the ability to critically analyze problems
  • the mastery of presentation techniques with particular attention to language properties.

The reports contribute to the final score with the average score of the reports only if the score of the written test is greater than or equal to 12. Attending students must attend laboratory exercises at least 80%.
The reports must be delivered to the teachers about ten days after the end of the laboratory activities (the exact dates are shown in AulaWeb).
Students who cannot attend must still carry out the reports (agreeing the delivery times with the teachers) and must carry out a supplementary topic (the specific modalities must be agreed with the teachers).

Oral exam
It is aimed to verify the level of understanding of the topics covered and of the specific methodologies, including the mastery of the demonstrations of the obtained results.

Exam schedule

Data appello Orario Luogo Degree type Note
15/01/2024 09:00 GENOVA Scritto + Orale
05/02/2024 09:00 GENOVA Scritto + Orale
19/06/2024 09:00 GENOVA Scritto
21/06/2024 09:00 GENOVA Orale
08/07/2024 09:00 GENOVA Scritto
10/07/2024 09:00 GENOVA Orale
18/09/2024 09:00 GENOVA Scritto
19/09/2024 09:00 GENOVA Orale

Agenda 2030 - Sustainable Development Goals

Agenda 2030 - Sustainable Development Goals
Quality education
Quality education
Gender equality
Gender equality