CODE 52509 ACADEMIC YEAR 2017/2018 CREDITS 6 cfu anno 3 STATISTICA MATEM. E TRATTAM. INFORMATICO DEI DATI 8766 (L-35) - SCIENTIFIC DISCIPLINARY SECTOR SECS-S/01 TEACHING LOCATION SEMESTER 2° Semester TEACHING MATERIALS AULAWEB OVERVIEW Experts introduce or present advances on statistical techniques that they use in their work by illustrating their applications through concrete examples. AIMS AND CONTENT LEARNING OUTCOMES Provide statistical tools relevant to specific applications and the experience of on-field experts. AIMS AND LEARNING OUTCOMES Pattern recognition and applications (24 hours of in-presence lectures) The module introduces the fundamental concepts and algorithms of statistical pattern recognition, with a focus on their industrial applications (e.g. predictive maintenance, process optimization, quality control…), their development cycle and performance evaluation. Examples are mainly from Computer Vision that often provides the data sets on which methods of pattern recognition are applied. The module starts from statistical decision theory and parametric estimation and provides a quick review of the different viewpoints and conceptual approaches to the subject matter. The common reference is in the ability to realise industrial systems capable of making statistically optimal decisions on the basis of experience. In the light of the Industry 4.0 plan, which should lead to a strong renewal in production processes, it is considered useful to give greater visibility to this area of use of statistical skills. Measurement models in psychometrics (14 hours of in-presence lectures) The course introduces to statistical issues in psychometric theory and to the use of statistical software (R) for carrying out basic psychometric analyses. Demography (4 hours of in-presence lectures) To illustrate via a complex example the issues related to the communication of demographic data to the general population. Further seminarial activities (non evaluated) could be organised each year. Usually they are presented by data scientists who work in applied contexts such as companies, consumer companies, public bodies. TEACHING METHODS Combination of traditional lectures and lab sessions with the softwaresMatlab and R. SYLLABUS/CONTENT Pattern recognition and applications After an overview on pattern recognition and the criteria for applications, the following topics are addressed. Bayesian decision theory. Maximum a posterior probability. Classification and regression. Naïve Bayes. Construction of optimal classifier. Parameter estimation. Performance evaluation. Cross-validation. General statistical classifiers. Gaussian mixtures and EM algorithm. Outlier detection. Some simple non parametric techniques. Introduction to Bayesian networks and inference on graphs. Dimensionality reduction. Feature selection. Genetic methods. Linear transformation of the sample space: PCA/LDA/ICA. Non linear maps (t-SNE). Decision trees. The CART method. Bagging and random forest. Boosting. Statistical modelling with trees. Neural nets for classification. Multi-strata models and learning algorithm. Devising a neural classifier. Neural nets as generalised approximators. Introduction to deep learning (convolutional networks and stacked autoencoders) Theoretical lectures are interwinedwith examples of applications such as Optical character recognition. Construction of classifiers with different levels of complexity (from Naive Bayes to convolutional network) for the recognition of handwritten or printed text. Automatic counting systems and event detectors. Imagine analysis for the detection of faces, people, vehicles, … Identification of key features and definition of an optimal binary acceptance test via boosting techniques. Statistical modelling of complex machineries. Definition of non-linear input-output relationships for the forecasting of a target variable (e.g. energy consumption) from instrumental data with random forest and neural nets. Quality control and predictive maintenance. Probabilistic distribution of sensor data andanomaly/outlier detection. Estimation of the system residual lifetime (TTF). The application will be illustrate with the aid of suitable Matlab toolbox and original data during guided hands-onsessions. Psychometrics Classical test theory Psychological variables or constructs Definition of the content domain of a construct and its operationalizations Measurement models in psychology: reflective indicators models and formative indicators models Item analysis and reliability Exploratory factor analysis Confirmatory factor analysis Structural Equation Models Applications in R (packages 'psych', 'lavaan' e 'semPlot') will be shown. Demography The course is based on the careful reading and analysis of the volume Tutto quello che non vi hanno mai detto sull'immigrazione (2015, Laterza) by Gianpiero Dalla Zuanna and Stefano Allievi. Collecting, analyzing and presenting data for helping the society tot ransform opportunities into new realities. RECOMMENDED READING/BIBLIOGRAPHY Pattern recognition and applications Handouts available at the web site http://www.onairweb.com/corsoPR/ Further reading: R.Duda, P.Hart, D.Stork, Pattern Classifcation, Wiley, (2001) S.Theodoridis, K.Koutroumbas, Pattern Recognition, Academic Press, (2006) C.Bishop, Pattern Recognition and Machine Learning, Springer, (2007) S.Theodoridis, Machine Learning, a Bayesian and Optimization Perspective, Academic Press, (2015) Psychometry Rust, J. & Golombok, S. (2009). Modern psychometrics, 3rd ed. Hove: Routledge (chapters 1, 2, 3, 4, and 7). Handouts and other teaching material (e.g., R codes) will be shared online. Demografia Gianpiero Dalla Zuanna e Stefano Allievi (2015). Tutto quello che non vi hanno mai detto sull'immigrazione, Laterza. TEACHERS AND EXAM BOARD EVA RICCOMAGNO Ricevimento: By appointment arranged by email with Luca Oneto luca.oneto@unige.it and Fabrizio Malfanti <fabrizio.malfanti@intelligrate.it> For organizational issues contact by email Eva Riccomagno <riccomagno@dima.unige.it> CARLO CHIORRI Ricevimento: Tuesdays, 12pm-1pm, Dipartimento di Scienze della Formazione, floor 4, room 4A3, Corso A. Podestà, 2, 16128 Genova. If the teacher is not available, this will be notified as soon as possible on the Aulaweb website and on the student online forum. The teacher cannot guarantee his availability for students outside office hours. However, students that cannot meet him during office hours can make an appointment in another date/time by e-mail. A Skype call can also be scheduled on e-mail request Teachers' contacts Phone +39 010 209 53709 E-mail: carlo.chiorri[chioc]unige.it or carlo.chiorri[chioc]gmail.com Skype: chiorri.psicometria (by appointment only) ENNIO OTTAVIANI Exam Board EVA RICCOMAGNO (President) MARIA PIERA ROGANTIN (President) CARLO CHIORRI ENNIO OTTAVIANI LESSONS LESSONS START The class will start according to the academic calendar. Class schedule APPLIED STATISTICS 2 EXAMS EXAM DESCRIPTION Pattern recognition and applications Written exam with multiple-choice questions and its discussion Psychometrics Written exam and its discussion. Demography Written exam with multiple choice or open questions The final mark is the weighted average of the marks of the three parts. The weights are proportional to the hours of classroom lectures. ASSESSMENT METHODS Pattern recognition and applications The exam consists of 25 questions with multiple-choice answers, regarding all topics discussed during the course. Answers can be numeric, true/false and might require elementary calculations. Lecture notes or other material are not allowed. A pocket calculator may be useful but not essential. The duration is 45 minutes. The correction takes place just after the exam. It is possible to motivate some answers by providing a suitable reasoning scheme. Psychometrics: Students will be presented with the R output of some statistical analyses carried out on real data. In the written part of the exam the ability of the students to apply what they have learnt through the lectures and the course materials will be tested, as they will be asked to interpret and comment the results and detect flaws of the statistical analyses. In the oral discussion issues with the answers to the written exam will be reviewed and discussed and knowledge of psychometric theory will be tested. Demography: The acquired stills to identify in a complex text specific information and data as well as the supporting statistical analysis underlying them. Exam schedule Data appello Orario Luogo Degree type Note 14/05/2018 09:00 GENOVA Laboratorio 14/05/2018 09:00 GENOVA Orale 14/05/2018 09:00 GENOVA Scritto 18/06/2018 09:00 GENOVA Laboratorio 18/06/2018 09:00 GENOVA Orale 18/06/2018 09:00 GENOVA Scritto 19/07/2018 09:00 GENOVA Laboratorio 19/07/2018 09:00 GENOVA Orale 19/07/2018 09:00 GENOVA Scritto FURTHER INFORMATION Web pages: http://www.onairweb.com/corsoPR/ https://www.dropbox.com/s/groq642v7rbviha/Lezioni%20SMID%202016.zip?dl=0 Prerequisites: Applied Statistics 1 Attendance is highly recommended.