CODICE 106847 ANNO ACCADEMICO 2025/2026 CFU 6 cfu anno 2 RELAZIONI INTERNAZIONALI 11162 (LM-52) - GENOVA 6 cfu anno 2 ECONOMICS AND DATA SCIENCE 11267 (LM-56) - GENOVA SETTORE SCIENTIFICO DISCIPLINARE MAT/08 LINGUA Inglese SEDE GENOVA PERIODO 2° Semestre MATERIALE DIDATTICO AULAWEB PRESENTAZIONE This course will help students understand how to deal with large amounts of data. The idea is that students with no technical background in data science or machine learning can start understanding the mechanisms behind this world. We will begin with key ideas on data analysis and describe modern techniques used to make predictions, find patterns, and generate new data. A special focus will be given to regularization, a method used to improve the performance of models and avoid overfitting in machine and deep learning. Students will work in interactive coding environment and explore a new way of programming and learning to program, by exploiting the capabilities of transforming natural to formal language of LLMs. This ``vibe coding'' will make students able of programming with production ability, even without a standard basis of programming skills. Therefore, students will be asked to create your own data analysis project, based on real-world data, and present it at the exam. OBIETTIVI E CONTENUTI OBIETTIVI FORMATIVI L'obiettivo generale di questo insegnamento è quello di fornire agli studenti le capacità di comprendere e utilizzare i principali strumenti, concettuali e computazionali, per l'interpretazione di grandi quantità di dati e per il loro utilizzo a fini predittivi. OBIETTIVI FORMATIVI (DETTAGLIO) E RISULTATI DI APPRENDIMENTO By the end of the course, students will be able to: · understand the basic principles behind machine learning: how computers can learn from data. · learn about regularization techniques, especially those based on L2 (Tikhonov) and L1 (sparsity) norms, and how they help improve prediction and pattern discovery. · get familiar with key types of machine learning: supervised, unsupervised, and deep learning. · learn to understand the meaning (semantics) of the code, recognize how instructions in formal language correspond to what you intended in natural language, learn by experimenting and refining your questions and code interactively. · develop this kind of “semantic literacy” — the ability to understand and guide what the code is doing, even without writing it all by hand. This skill is becoming essential in the age of AI-powered development. · use formal tools to analyze real datasets — for example, detecting trends, making predictions, or clustering data. · apply regularization techniques to build better models and understand what they reveal about your data. · decide which tools and methods are best suited for the kind of data they are working with, small or large, noisy or clean. · share their work using interactive notebooks and a final presentation that explains their project clearly to both tech-savvy and general audiences. PREREQUISITI No need to master the formal rules of programming, but a general knowledge about programming language is an advantage. MODALITA' DIDATTICHE - Lectures (24h) on learning, deep learning and regularization theory - Practical guided lab sessions (24h) on AI-powered coding applied to data analysis and real-world data experiments Inclusivity: Students with valid certifications for Specific Learning Disorders (SLDs), disabilities or other educational needs are invited to contact the teacher and the School's contact person for disability at the beginning of teaching to agree on possible teaching arrangements that, while respecting the teaching objectives, take into account individual learning patterns. Contacts of the teacher and the School's disability contact person can be found at the following link: https://unige.it/commissioni/comitatoperlinclusionedeglistudenticondisabilita PROGRAMMA/CONTENUTO Foundations of Learning and Regularization - Regression, classification, clustering - Regularized formulation of inverse problems: Tikhonov principle - L2 (ridge regression) and L1 (lasso, sparsity) penalizations - Geometric and statistical interpretation of regularization Machine Learning and Deep Learning Techniques - Supervised algorithms: SVM, neural networks - Validation strategies, overfitting, underfitting, and parameter selection - Generative techniques (text, image) and modern models (e.g. transformers, GANs) - Unsupervised learning and dimensionality reduction: PCA, autoencoders “AI-powered / vibe coded” project - Introduction to vibe coding: coding driven by intuition, visualization, and interactivity - Progressive development of personal (or group) projects - Presentation of a project using an interactive notebook TESTI/BIBLIOGRAFIA - T. Hastie, R. Tibshirani, J. Friedman, “The Elements of Statistical Learning”, Springer - C. Bishop, “Pattern Recognition and Machine Learning” DOCENTI E COMMISSIONI MICHELE PIANA Ricevimento: su appuntamento via email (michele.piana@unige.it) FEDERICO BENVENUTO Ricevimento: Tramite appuntamento per email. LEZIONI INIZIO LEZIONI Secondo semestre Orari delle lezioni L'orario di questo insegnamento è consultabile all'indirizzo: Portale EasyAcademy ESAMI MODALITA' D'ESAME The exam consists of two parts: A “vibe coding” project for data analysis An oral presentation of the project MODALITA' DI ACCERTAMENTO - Final project and oral presentation (50%) 50% of the final evaluation will be based on an individual or group "AI-powered" project and its presentation, for which each candidate will have a maximum of 20 minutes. The evaluation will consider correctness, creativity, analytical rigor, clarity, and critical thinking. In the case of group projects, each candidate must present a self-consistent part of the project. - Oral discussion (50%) During each candidate oral presentation, theoretical concepts will be discussed, and a more in-depth analysis of the final project will be conducted. Evaluation criteria: linguistic clarity, theoretical mastery and connection between theory and practice. ALTRE INFORMAZIONI Prerequisiti: Gli unici prerequisiti sostanziali sono una conoscenza di base del Python, dei principali formati dei dati e degli aspetti di base dell'analisi numerica e della statistica Modalità di frequenza: in presenza (e fortemente consigliata) Modalità di iscrizione esami: da concordare con il docente Agenda 2030 Imprese, innovazione e infrastrutture