BIG DATA TECHNIQUES

CODICE	106847
ANNO ACCADEMICO	2025/2026
CFU	6 cfu anno 2 RELAZIONI INTERNAZIONALI 11162 (LM-52) - GENOVA 6 cfu anno 2 ECONOMICS AND DATA SCIENCE 11267 (LM-56) - GENOVA
SETTORE SCIENTIFICO DISCIPLINARE	MAT/08
LINGUA	Inglese
SEDE	GENOVA
PERIODO	2° Semestre
MATERIALE DIDATTICO	AULAWEB

PRESENTAZIONE

This course will help students understand how to deal with large amounts of data. The idea is that students with no technical background in data science or machine learning can start understanding the mechanisms behind this world. We will begin with key ideas on data analysis and describe modern techniques used to make predictions, find patterns, and generate new data. A special focus will be given to regularization, a method used to improve the performance of models and avoid overfitting in machine and deep learning. Students will work in interactive coding environment and explore a new way of programming and learning to program, by exploiting the capabilities of transforming natural to formal language of LLMs. This ``vibe coding'' will make students able of programming with production ability, even without a standard basis of programming skills. Therefore, students will be asked to create your own data analysis project, based on real-world data, and present it at the exam.

OBIETTIVI E CONTENUTI

OBIETTIVI FORMATIVI

L'obiettivo generale di questo insegnamento è quello di fornire agli studenti le capacità di comprendere e utilizzare i principali strumenti, concettuali e computazionali, per l'interpretazione di grandi quantità di dati e per il loro utilizzo a fini predittivi.

OBIETTIVI FORMATIVI (DETTAGLIO) E RISULTATI DI APPRENDIMENTO

By the end of the course, students will be able to:

· understand the basic principles behind machine learning: how computers can learn from data.

· learn about regularization techniques, especially those based on L2 (Tikhonov) and L1 (sparsity) norms, and how they help improve prediction and pattern discovery.

· get familiar with key types of machine learning: supervised, unsupervised, and deep learning.

· learn to understand the meaning (semantics) of the code, recognize how instructions in formal language correspond to what you intended in natural language, learn by experimenting and refining your questions and code interactively.

· develop this kind of “semantic literacy” — the ability to understand and guide what the code is doing, even without writing it all by hand. This skill is becoming essential in the age of AI-powered development.

· use formal tools to analyze real datasets — for example, detecting trends, making predictions, or clustering data.

· apply regularization techniques to build better models and understand what they reveal about your data.

· decide which tools and methods are best suited for the kind of data they are working with, small or large, noisy or clean.

· share their work using interactive notebooks and a final presentation that explains their project clearly to both tech-savvy and general audiences.

PREREQUISITI

No need to master the formal rules of programming, but a general knowledge about programming language is an advantage.

MODALITA' DIDATTICHE

- Lectures (24h) on learning, deep learning and regularization theory
- Practical guided lab sessions (24h) on AI-powered coding applied to data analysis and real-world data experiments

Inclusivity: Students with valid certifications for Specific Learning Disorders (SLDs), disabilities or other educational needs are invited to contact the teacher and the School's contact person for disability at the beginning of teaching to agree on possible teaching arrangements that, while respecting the teaching objectives, take into account individual learning patterns. Contacts of the teacher and the School's disability contact person can be found at the following link: https://unige.it/commissioni/comitatoperlinclusionedeglistudenticondisabilita

Students who have valid certification of physical or learning disabilities and who wish to discuss possible accommodations or other circumstances regarding lectures, coursework and exams, should speak both with the instructor and with Professor Elena Lagomarsino elena.lagomarsino@unige.it, the Department's disability liaison

PROGRAMMA/CONTENUTO

Foundations of Learning and Regularization
- Regression, classification, clustering
- Regularized formulation of inverse problems: Tikhonov principle
- L2 (ridge regression) and L1 (lasso, sparsity) penalizations
- Geometric and statistical interpretation of regularization

Machine Learning and Deep Learning Techniques
- Supervised algorithms: SVM, neural networks
- Validation strategies, overfitting, underfitting, and parameter selection
- Generative techniques (text, image) and modern models (e.g. transformers, GANs)
- Unsupervised learning and dimensionality reduction: PCA, autoencoders

“AI-powered / vibe coded” project
- Introduction to vibe coding: coding driven by intuition, visualization, and interactivity
- Progressive development of personal (or group) projects
- Presentation of a project using an interactive notebook

TESTI/BIBLIOGRAFIA

- T. Hastie, R. Tibshirani, J. Friedman, “The Elements of Statistical Learning”, Springer
- C. Bishop, “Pattern Recognition and Machine Learning”

DOCENTI E COMMISSIONI

MICHELE PIANA

Ricevimento: su appuntamento via email (michele.piana@unige.it)

FEDERICO BENVENUTO

Ricevimento: Tramite appuntamento per email.

Commissione d'esame

FEDERICO BENVENUTO (Presidente)

MICHELE PIANA (Presidente)

CRISTINA CAMPI

SABRINA GUASTAVINO

LEZIONI

INIZIO LEZIONI

Secondo semestre

Orari delle lezioni

L'orario di questo insegnamento è consultabile all'indirizzo: Portale EasyAcademy

ESAMI

MODALITA' D'ESAME

The exam consists of two parts:

A “vibe coding” project for data analysis
An oral presentation of the project

MODALITA' DI ACCERTAMENTO

- Final project and oral presentation (50%)

50% of the final evaluation will be based on an individual or group "AI-powered" project and its presentation, for which each candidate will have a maximum of 20 minutes. The evaluation will consider correctness, creativity, analytical rigor, clarity, and critical thinking. In the case of group projects, each candidate must present a self-consistent part of the project.

- Oral discussion (50%)

During each candidate oral presentation, theoretical concepts will be discussed, and a more in-depth analysis of the final project will be conducted. Evaluation criteria: linguistic clarity, theoretical mastery and connection between theory and practice.

ALTRE INFORMAZIONI

Prerequisiti: Gli unici prerequisiti sostanziali sono una conoscenza di base del Python, dei principali formati dei dati e degli aspetti di base dell'analisi numerica e della statistica

Modalità di frequenza: in presenza (e fortemente consigliata)

Modalità di iscrizione esami: da concordare con il docente