Salta al contenuto principale della pagina

MULTIVARIATE EXPLORATORY DATA ANALYSIS

CODE 52480
ACADEMIC YEAR 2021/2022
CREDITS
  • 8 cfu during the 1st year of 8766 STATISTICA MATEM. E TRATTAM. INFORMATICO DEI DATI (L-35) - GENOVA
  • 8 cfu during the 1st year of 8760 MATEMATICA (L-35) - GENOVA
  • SCIENTIFIC DISCIPLINARY SECTOR SECS-S/01
    LANGUAGE Italian
    TEACHING LOCATION
  • GENOVA
  • SEMESTER 2° Semester
    TEACHING MATERIALS AULAWEB

    OVERVIEW

    The course introduces the student to the exploratory statistical analysis of multivariate data by pointing out the mathematical aspects and by developing the essential skills for the interpretation of the data under investigation. Laboratory sessions provide students with the opportunity to analyse, discuss, and solve real problems.

    AIMS AND CONTENT

    LEARNING OUTCOMES

    To provide the main concepts and methodologies for the exploratory analysis of univariate and multivariate data.

    SYLLABUS/CONTENT

    Exploratory analysis of uni- and bi-variate data.
    Qualitative/categorical variables. Counts and frequencies, distribution of a variable. Joint and marginal distributions of two variables, conditional distributions (row and column profiles). Independence. Graphical representations.
    Quantitative variables. Distribution and cumulative distribution functions, quantile function, and their graphical representations. Measures of centrality and dispersion based on moments and quantiles; their properties and L1 and L2 metrics. Covariance and correlation between two quantitative variables. Geometrical interpretation of variance, covariance and correlation.

    Exploratory analysis of multivariate data.
    Cluster analysis. Hierarchical clustering: linkages based on distance and inertia; dendogram; induced ultra-metric; variable clustering. K-means clustering: initialization and stop of algorithm, stable clusters.
    Principal component analysis. ``Best’’ representation of multivariate data (row points of data matrix) in a vector space with lower dimension; accuracy of representation. Change of base (eigenvectors of the correlation matrix). Properties of principal components. Geometrical representation of correlations.
    Multiple regression. Vector space generated by the explanatory variables (column points of data matrix).  Linear least square method and geometrical meaning of residual minimization.  Variance decomposition of the response variable. Descriptive goodness-of-fit: residual plots and R-sq index (with geometrical interpretation). One-way ANOVA (analysis of variance) and between/within variance decomposition.

    Pratical sections in lab using software R

    RECOMMENDED READING/BIBLIOGRAPHY

    M. P. Rogantin (2016) Statistica descrittiva
    (available on AulaWeb and at http://www.dima.unige.it/~rogantin/StDescrittiva2/StatDescrittiva.pdf)

    Maindonald J., Braun W. J, (2010). Data analysis and graphics using R: an example-based approach. 3. ed. Cambridge University press

    I.T. Jolliffe (2002). Principal Component Analysis. Springer Series in Statistics

    TEACHERS AND EXAM BOARD

    Exam Board

    FRANCESCO PORRO (President)

    SARA SOMMARIVA

    ALBERTO SORRENTINO (President Substitute)

    LESSONS

    LESSONS START

    The class will start according to the academic calendar.

    EXAMS

    Exam schedule

    Date Time Location Type Notes
    19/01/2022 09:00 GENOVA Scritto + Orale solo per gli studenti che hanno frequentato l'insegnamento nell'a.a.2020/21 o in a.a. precedenti
    04/02/2022 09:00 GENOVA Scritto + Orale solo per gli studenti che hanno frequentato l'insegnamento nell'a.a.2020/21 o in a.a. precedenti
    10/06/2022 09:00 GENOVA Scritto
    14/06/2022 09:00 GENOVA Orale
    13/07/2022 09:00 GENOVA Scritto
    15/07/2022 09:00 GENOVA Orale
    01/09/2022 09:00 GENOVA Scritto
    02/09/2022 09:00 GENOVA Orale