CODE 52507 ACADEMIC YEAR 2019/2020 CREDITS 6 cfu anno 3 STATISTICA MATEM. E TRATTAM. INFORMATICO DEI DATI 8766 (L-35) - GENOVA 6 cfu anno 3 INFORMATICA 8759 (L-31) - GENOVA SCIENTIFIC DISCIPLINARY SECTOR SECS-S/01 LANGUAGE Italian TEACHING LOCATION GENOVA SEMESTER 2° Semester TEACHING MATERIALS AULAWEB OVERVIEW Provide the students with the basic skills for extracting knowledge from large data sets. AIMS AND CONTENT LEARNING OUTCOMES Develop the basic skills for extracting knowledge and knowledge from large data sets, in particular by forming an understanding of the value of data mining in solving real-world problems understanding of foundational concepts underlying data mining understanding of algorithms commonly used in data mining tools ability to apply data mining tools to real-world problems AIMS AND LEARNING OUTCOMES At the end of the course students will be able to understand and handle the main concepts and techniques of data mining be able to apply autonomously the main techniques of data mining to solve real-world problems to develop further knowledge about data mining techniques and applications TEACHING METHODS Combination of traditional lectures and lab sessions SYLLABUS/CONTENT First part: introduction to aata mining and applications in fraud detection Introduction to Data Mining, Data science and big data analytics Main techniques The Data Mining Process - CRISP Seven Class of Algorithms Supervised Learning – Classification Unsupervised Learnimg – Clustering Outliers detection Regression Reinforced Learning Ranking Deep Learning Top ten data mining algorithms Examples and application using WEKA Application to marketing, finance and medicine Big Data and Hadoop The NOSql paradigm Second part: Machine Learning Algorithms for Data mining Introduction to Data Mining and Machine Learning. Taxonomy of the Data Mining problems Statistical Inference Support Vector Machines (extension to kernels) Support Vector Regression (extension to kernels) K-means and Spectral Clustering Decision Trees and Random Forests Model Selection and Error Estimation RECOMMENDED READING/BIBLIOGRAPHY Aggarwal, C- C. Data mining: the textbook. Springer, 2015. Shalev-Shwartz, S., and Shai B. D. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014. Ian H. Witten, Eibe Frank, Mark A. Hall (2000). Data Mining: Practical Machine Learning Tools and Techniques (The Morgan Kaufmann Series in Data Management Systems) ISBN-13: 978-0123748560. Disponibile presso il CSB di Ingegneria 006.312 WIT --> disponibile anche online a http://www.sciencedirect.com/science/book/9780123748560 Clifton Phua, Vincent Lee, Kate Smith and Ross Gayler (2005). A Comprehensive Survey of Data Mining-based Fraud Detection Research, Computing Research Repository, abs/1009.6119. Disponibile online --> http://arxiv.org/abs/1009.6119 N. Cristianini, J. Shawe-Taylor, An introduction to support Vector Machine and other kernel-based learning methods, Cambridge University Press, 2006 disponibile ING e ECO A. Ng, M. Jordan, Y. Weiss, On spectral clustering: Analysis and an algorithm, NIPS 2001. --> disponibile anche online a http://papers.nips.cc/paper/2092-on-spectral-clustering-analysis-and-an-algorithm.pdf Dispense/Handouts TEACHERS AND EXAM BOARD ENNIO OTTAVIANI FABRIZIO MALFANTI Exam Board FABRIZIO MALFANTI (President) ENNIO OTTAVIANI (President) EVA RICCOMAGNO (President) LESSONS LESSONS START The class will start according to the academic calendar. Class schedule DATA MINING EXAMS EXAM DESCRIPTION To take the exam, you must sign up online. The examination of the first part consists of the discussion of a group project on a topic agreed with the lecturer and of a written examination on which the oral examination can be based. The examination of the second part consists of the discussion of a project on a topic agreed with the lecturer and developed autonomously by the student. The final mark is the weighted average of the marks of the two parts with weights the number of ECTS of each part, namely 3 ECTS for each part. ASSESSMENT METHODS The exam will check if the student has learned the methodologies and techniques for extracting knowledge from a big set of data through a small project which requires the solution of a real world data mining problem. Exam schedule Data appello Orario Luogo Degree type Note 28/05/2020 09:00 GENOVA Laboratorio 18/06/2020 09:00 GENOVA Laboratorio 21/07/2020 09:00 GENOVA Laboratorio FURTHER INFORMATION By appointment arranged by email with Luca Oneto luca.oneto@unige.it and Fabrizio Malfanti <fabrizio.malfanti@intelligrate.it> For organizational issues contact by email Eva Riccomagno <riccomagno@dima.unige.it> The web page of the second part of the course is https://sites.google.com/view/lucaoneto/teaching/dm-smid