CODE 52480 ACADEMIC YEAR 2024/2025 CREDITS 8 cfu anno 1 MATEMATICA 8760 (L-35) - GENOVA 8 cfu anno 1 STATISTICA MATEM. E TRATTAM. INFORMATICO DEI DATI 8766 (L-35) - GENOVA SCIENTIFIC DISCIPLINARY SECTOR SECS-S/01 LANGUAGE Italian TEACHING LOCATION GENOVA SEMESTER 2° Semester TEACHING MATERIALS AULAWEB OVERVIEW The course introduces the student to the exploratory statistical analysis of multivariate data by pointing out the mathematical aspects and by developing the essential skills for the interpretation of the data under investigation. Laboratory sessions provide students with the opportunity to analyse, discuss, and solve real problems. AIMS AND CONTENT LEARNING OUTCOMES The course introduces the student to the exploratory statistical analysis of multivariate data by pointing out the mathematical aspects and by developing the essential skills for the interpretation of the data under investigation. Laboratory sessions provide students with the opportunity to analyse, discuss, and solve real problems. AIMS AND LEARNING OUTCOMES The descriptive (or exploratory) data analysis consists of techniques for summarizing the main characteristics of the dataset, frequently with graphical instruments. This is the first and fundamental step of any statistical analysis. The proposed methods, especially the multivariate ones, require mathematical tools, in particular algebraic and geometric ones, which are developed in concurrent courses. Therefore, the following are the course's primary learning goals: apply the main methodologies for the analysis of univariate and multivariate data in a descriptive perspective identify the data set's properties by selecting and calculating the relevant statistical indices analyze a dataset using suitable more complex statistical methods, such as Cluster Analysis (CA), Principal Components Analysis (PCA), and Classical Linear Model (LM). Cross-cutting skills like teamwork, the ability to relate to the results of statistical analysis, and the development of a flexible mindset that enables you to quickly adjust to new conditions are among the non-secondary aims of education. These competencies are developed through computer lab activities, where using R software for data analysis helps enhance computer skills. At the end of the course, the student will also have mastered the following basic skills by taking part in the planned group activities: alphabetical-functional (compilation of reports prepared in group) design creation proficiency (selection of the appropriate statistical and mathematical tools and the presentation style of the analysis carried out on the proposed datasets) aptitude for learning (deepening understanding of crucial analytical themes) social competence (implicated by the volume and the variety of decisions the group must make). TEACHING METHODS The course consists of approximately 50 hours of classroom lessons (including theory and exercises) and about 24 hours of computer laboratory practices. There are also approximately 8 hours of supervised classroom exercises. The purposes of the computer laboratory lessons are: learning a programming language oriented to statistics (R software), applying the statistical methodologies presented in class interpretate the data and the results of the analysis. The lab activities should also enable the student to verify the level of understanding of statistical theory learned and its practical use. The supervised classroom exercises activities are lessons where the students can make some proposed exercises (alone or in groups) in the classroom; the presence of teachers and tutors should allow to clarify doubts and overcome any difficulties. SYLLABUS/CONTENT Exploratory analysis of uni- and bi-variate data. Qualitative/categorical variables. Counts and frequencies, distribution of a variable. Joint and marginal distributions of two variables, conditional distributions (row and column profiles). Independence. Graphical representations. Quantitative variables. Distribution and cumulative distribution functions, quantile function, and their graphical representations. Measures of centrality and dispersion based on moments and quantiles; their properties and L1 and L2 metrics. Covariance and correlation between two quantitative variables. Geometrical interpretation of variance, covariance and correlation. Exploratory analysis of multivariate data. Cluster analysis. Hierarchical clustering: linkages based on distance and inertia; dendogram; induced ultra-metric; variable clustering. K-means clustering: initialization and stop of algorithm, stable clusters. Principal component analysis. ``Best’’ representation of multivariate data (row points of data matrix) in a vector space with lower dimension; accuracy of representation. Change of base (eigenvectors of the correlation matrix). Properties of principal components. Geometrical representation of correlations. Multiple regression. Vector space generated by the explanatory variables (column points of data matrix). Linear least square method and geometrical meaning of residual minimization. Variance decomposition of the response variable. Descriptive goodness-of-fit: residual plots and R-sq index (with geometrical interpretation). One-way ANOVA (analysis of variance) and between/within variance decomposition. Pratical sections in lab using software R RECOMMENDED READING/BIBLIOGRAPHY M. P. Rogantin (2016) Statistica descrittiva (available on AulaWeb and at http://www.dima.unige.it/~rogantin/StDescrittiva2/StatDescrittiva.pdf) Maindonald J., Braun W. J, (2010). Data analysis and graphics using R: an example-based approach. 3. ed. Cambridge University press I.T. Jolliffe (2002). Principal Component Analysis. Springer Series in Statistics TEACHERS AND EXAM BOARD ALBERTO SORRENTINO Ricevimento: By appointment FRANCESCO PORRO CRISTINA CAMPI Ricevimento: By appointment via email. Exam Board FRANCESCO PORRO (President) CRISTINA CAMPI EVA RICCOMAGNO (President Substitute) ALBERTO SORRENTINO (President Substitute) LESSONS LESSONS START The class will start according to the academic calendar. Class schedule The timetable for this course is available here: Portale EasyAcademy EXAMS EXAM DESCRIPTION The exam is divided into three parts: written test two reports about the last two computer laboratory sessions oral exam. The maximum score of the written test is 20 points, of the reports 10 points. The oral exam can be accessed if the score of the written test is greater than or equal to 12 and if the sum of the scores of the reports and the written test is greater than or equal to 18. ASSESSMENT METHODS Written test The written test is aimed to verify the ability to carry out exercises, to calculate and interprete data analysis. The comment of parts of the output obtained with the R software is also required. The written test can be replaced by two intermediate tests if the scores of both are positive (≥ 12). The first intermediate test is scheduled during the lessons, the second one can be done simultaneously with the first final written test (June). Reports of the last two computer laboratory sessions The reports are aimed to evaluate the ability to carry out descriptive analyzes of datasets by using the mathematical-statistical methods presented in the course. The assessment takes into account: the appropriate use of such methods the ability to critically analyze problems the mastery of presentation techniques with particular attention to language properties. The reports contribute to the final score with the average score of the reports only if the score of the written test is greater than or equal to 12. Attending students must attend laboratory exercises at least 80%. The reports must be delivered to the teachers about ten days after the end of the laboratory activities (the exact dates are shown in AulaWeb). Students who cannot attend must still carry out the reports (agreeing the delivery times with the teachers) and must carry out a supplementary topic (the specific modalities must be agreed with the teachers). Oral exam It is aimed to verify the level of understanding of the topics covered and of the specific methodologies, including the mastery of the demonstrations of the obtained results. Exam schedule Data appello Orario Luogo Degree type Note 13/01/2025 09:00 GENOVA Scritto + Orale 03/02/2025 09:00 GENOVA Scritto + Orale 19/06/2025 09:00 GENOVA Scritto 09/07/2025 09:00 GENOVA Scritto 17/09/2025 09:00 GENOVA Scritto FURTHER INFORMATION Students who have valid certification of physical or learning disabilities on file with the University and who wish to discuss possible accommodations or other circumstances regarding lectures, coursework and exams, should speak both with the instructor and with Professor Sergio Di Domizio (sergio.didomizio@unige.it), the Department’s disability liaison. Agenda 2030 - Sustainable Development Goals Quality education Gender equality