Salta al contenuto principale della pagina
##
DATA MINING

## OVERVIEW

## AIMS AND CONTENT

### LEARNING OUTCOMES

### AIMS AND LEARNING OUTCOMES

### TEACHING METHODS

### SYLLABUS/CONTENT

### RECOMMENDED READING/BIBLIOGRAPHY

## TEACHERS AND EXAM BOARD

### Exam Board

## LESSONS

### TEACHING METHODS

### LESSONS START

### Class schedule

## EXAMS

### EXAM DESCRIPTION

### ASSESSMENT METHODS

### Exam schedule

### FURTHER INFORMATION

CODE | 52507 |
---|---|

ACADEMIC YEAR | 2018/2019 |

CREDITS | 6 credits during the 3nd year of 8766 Mathematical Statistics and Data Management (L-35) GENOVA |

SCIENTIFIC DISCIPLINARY SECTOR | SECS-S/01 |

LANGUAGE | Italian |

TEACHING LOCATION | GENOVA (Mathematical Statistics and Data Management) |

SEMESTER | 2° Semester |

TEACHING MATERIALS | AULAWEB |

Provide the students with the basic skills for extracting knowledge from large data sets.

Develop the basic skills for extracting knowledge and knowledge from large data sets, in particular by forming an

- understanding of the value of data mining in solving real-world problems
- understanding of foundational concepts underlying data mining
- understanding of algorithms commonly used in data mining tools
- ability to apply data mining tools to real-world problems

At the end of the course students will

- be able to understand and handle the main concepts and techniques of data mining
- be able to apply autonomously the main techniques of data mining to solve real-world problems
- to develop further knowledge about data mining techniques and applications

Combination of traditional lectures and lab sessions

**First part: introduction to aata mining and applications in fraud detection**

Introduction to Data Mining, Data science and big data analytics

Main techniques

The Data Mining Process - CRISP

Seven Class of Algorithms

Supervised Learning – Classification

Unsupervised Learnimg – Clustering

Outliers detection

Regression

Reinforced Learning

Ranking

Deep Learning

Top ten data mining algorithms

Examples and application using WEKA

Application to marketing, finance and medicine

Big Data and Hadoop

The NOSql paradigm

**Second part**: **Machine Learning Algorithms for Data mining**

Introduction to Data Mining and Machine Learning.

Taxonomy of the Data Mining problems

Statistical Inference

Support Vector Machines (extension to kernels)

Support Vector Regression (extension to kernels)

K-means and Spectral Clustering

Decision Trees and Random Forests

Model Selection and Error Estimation

- Aggarwal, C- C. Data mining: the textbook. Springer, 2015.
- Shalev-Shwartz, S., and Shai B. D. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.
- Ian H. Witten, Eibe Frank, Mark A. Hall (2000). Data Mining: Practical Machine Learning Tools and Techniques (The Morgan Kaufmann Series in Data Management Systems) ISBN-13: 978-0123748560. Disponibile presso il CSB di Ingegneria 006.312 WIT --> disponibile anche online a http://www.sciencedirect.com/science/book/9780123748560
- Clifton Phua, Vincent Lee, Kate Smith and Ross Gayler (2005). A Comprehensive Survey of Data Mining-based Fraud Detection Research, Computing Research Repository, abs/1009.6119. Disponibile online --> http://arxiv.org/abs/1009.6119
- N. Cristianini, J. Shawe-Taylor, An introduction to support Vector Machine and other kernel-based learning methods, Cambridge University Press, 2006 disponibile ING e ECO
- A. Ng, M. Jordan, Y. Weiss, On spectral clustering: Analysis and an algorithm, NIPS 2001. --> disponibile anche online a http://papers.nips.cc/paper/2092-on-spectral-clustering-analysis-and-an-algorithm.pdf
- Dispense/Handouts

**Office hours:** By appointment arranged by email with Luca Oneto luca.oneto@unige.it and Fabrizio Malfanti <fabrizio.malfanti@intelligrate.it>
For organizational issues contact by email Eva Riccomagno <riccomagno@dima.unige.it>

FABRIZIO MALFANTI (President)

EVA RICCOMAGNO (President)

LUCA ONETO

Combination of traditional lectures and lab sessions

The class will start according to the academic calendar.

To take the exam, you must sign up online.

The examination of the first part consists of the discussion of a group project on a topic agreed with the lecturer and of a written examination on which the oral examination can be based.

The examination of the second part consists of the discussion of a project on a topic agreed with the lecturer and developed autonomously by the student.

The final mark is the weighted average of the marks of the two parts with weights the number of ECTS of each part, namely 3 ECTS for each part.

The exam will check if the student has learned the methodologies and techniques for extracting knowledge from a big set of data through a small project which requires the solution of a real world data mining problem.

Date | Time | Location | Type | Notes |
---|---|---|---|---|

21/01/2019 | 09:00 | GENOVA | registrazione | per gli studenti iscritti all'insegnamento nell'a.a.2017/18 e precedenti |

30/05/2019 | 09:00 | GENOVA | Laboratorio | |

20/06/2019 | 09:00 | GENOVA | Laboratorio | |

23/07/2019 | 09:00 | GENOVA | Laboratorio |

By appointment arranged by email with Luca Oneto luca.oneto@unige.it and Fabrizio Malfanti <fabrizio.malfanti@intelligrate.it>

For organizational issues contact by email Eva Riccomagno <riccomagno@dima.unige.it>

The web page of the second part of the course is https://sites.google.com/view/lucaoneto/teaching/dm-smid