CODE 106847 ACADEMIC YEAR 2025/2026 CREDITS 6 cfu anno 2 RELAZIONI INTERNAZIONALI 11162 (LM-52) - GENOVA 6 cfu anno 2 ECONOMICS AND DATA SCIENCE 11267 (LM-56) - GENOVA SCIENTIFIC DISCIPLINARY SECTOR MAT/08 LANGUAGE English TEACHING LOCATION GENOVA SEMESTER 2° Semester TEACHING MATERIALS AULAWEB OVERVIEW This course will help students understand how to deal with large amounts of data. The idea is that students with no technical background in data science or machine learning can start understanding the mechanisms behind this world. We will begin with key ideas on data analysis and describe modern techniques used to make predictions, find patterns, and generate new data. A special focus will be given to regularization, a method used to improve the performance of models and avoid overfitting in machine and deep learning. Students will work in interactive coding environment and explore a new way of programming and learning to program, by exploiting the capabilities of transforming natural to formal language of LLMs. This ``vibe coding'' will make students able of programming with production ability, even without a standard basis of programming skills. Therefore, students will be asked to create your own data analysis project, based on real-world data, and present it at the exam. AIMS AND CONTENT LEARNING OUTCOMES The aim of these lectures is to provide students with a fair understanding of the main conceptual and computational tools concerned with the interpretation of big amount of data and with the use of such data for predictive purposes. AIMS AND LEARNING OUTCOMES By the end of the course, students will be able to: · understand the basic principles behind machine learning: how computers can learn from data. · learn about regularization techniques, especially those based on L2 (Tikhonov) and L1 (sparsity) norms, and how they help improve prediction and pattern discovery. · get familiar with key types of machine learning: supervised, unsupervised, and deep learning. · learn to understand the meaning (semantics) of the code, recognize how instructions in formal language correspond to what you intended in natural language, learn by experimenting and refining your questions and code interactively. · develop this kind of “semantic literacy” — the ability to understand and guide what the code is doing, even without writing it all by hand. This skill is becoming essential in the age of AI-powered development. · use formal tools to analyze real datasets — for example, detecting trends, making predictions, or clustering data. · apply regularization techniques to build better models and understand what they reveal about your data. · decide which tools and methods are best suited for the kind of data they are working with, small or large, noisy or clean. · share their work using interactive notebooks and a final presentation that explains their project clearly to both tech-savvy and general audiences. PREREQUISITES No need to master the formal rules of programming, but a general knowledge about programming language is an advantage. TEACHING METHODS - Lectures (24h) on learning, deep learning and regularization theory - Practical guided lab sessions (24h) on AI-powered coding applied to data analysis and real-world data experiments Inclusivity: Students with valid certifications for Specific Learning Disorders (SLDs), disabilities or other educational needs are invited to contact the teacher and the School's contact person for disability at the beginning of teaching to agree on possible teaching arrangements that, while respecting the teaching objectives, take into account individual learning patterns. Contacts of the teacher and the School's disability contact person can be found at the following link: https://unige.it/commissioni/comitatoperlinclusionedeglistudenticondisabilita SYLLABUS/CONTENT Foundations of Learning and Regularization - Regression, classification, clustering - Regularized formulation of inverse problems: Tikhonov principle - L2 (ridge regression) and L1 (lasso, sparsity) penalizations - Geometric and statistical interpretation of regularization Machine Learning and Deep Learning Techniques - Supervised algorithms: SVM, neural networks - Validation strategies, overfitting, underfitting, and parameter selection - Generative techniques (text, image) and modern models (e.g. transformers, GANs) - Unsupervised learning and dimensionality reduction: PCA, autoencoders “AI-powered / vibe coded” project - Introduction to vibe coding: coding driven by intuition, visualization, and interactivity - Progressive development of personal (or group) projects - Presentation of a project using an interactive notebook RECOMMENDED READING/BIBLIOGRAPHY - T. Hastie, R. Tibshirani, J. Friedman, “The Elements of Statistical Learning”, Springer - C. Bishop, “Pattern Recognition and Machine Learning” TEACHERS AND EXAM BOARD MICHELE PIANA Ricevimento: By appointment via e-mail (michele.piana@unige.it) FEDERICO BENVENUTO LESSONS LESSONS START Second semester Class schedule The timetable for this course is available here: Portale EasyAcademy EXAMS EXAM DESCRIPTION The exam consists of two parts: A “vibe coding” project for data analysis An oral presentation of the project ASSESSMENT METHODS - Final project and oral presentation (50%) 50% of the final evaluation will be based on an individual or group "AI-powered" project and its presentation, for which each candidate will have a maximum of 20 minutes. The evaluation will consider correctness, creativity, analytical rigor, clarity, and critical thinking. In the case of group projects, each candidate must present a self-consistent part of the project. - Oral discussion (50%) During each candidate oral presentation, theoretical concepts will be discussed, and a more in-depth analysis of the final project will be conducted. Evaluation criteria: linguistic clarity, theoretical mastery and connection between theory and practice. FURTHER INFORMATION Prerequisites: The only substantial prerequisites are a basic knowledge of Python, the main data formats, and fundamental concepts of numerical analysis and statistics. Attendance modality: In-person attendance (strongly recommended) Exam registration: To be arranged with the instructor Agenda 2030 - Sustainable Development Goals Industry, innovation and infrastructure