BIG DATA ANALYSIS FOR LIFE SCIENCES | Corsi di Studio UniGe

CODE	121745
ACADEMIC YEAR	2026/2027
CREDITS	2 cfu anno 2 BIOLOGIA APPLICATA E SPERIMENTALE 11932 (LM-6 R) - GENOVA
SCIENTIFIC DISCIPLINARY SECTOR	BIOS-09/A
TEACHING LOCATION	GENOVA
SEMESTER	2° Semester

OVERVIEW

The course "Big Data Analysis for Life Sciences" was developed to train modern biologists to manage and interpret the massive amounts of data generated by today's high-throughput technologies. The course aims to bridge the gap between raw computational data and its true biological meaning, providing crucial skills for contemporary research.

AIMS AND CONTENT

LEARNING OUTCOMES

The course aims to provide students with the basic theoretical and practical knowledge to analyze, interpret, and visualize large biological and biomedical datasets, with a focus on omics data from transcriptomics, proteomics, metabolomics, and lipidomics. The course will introduce the main approaches to data management, normalization, statistical analysis, and biological interpretation, including multivariate analysis, clustering, classification, and machine learning methods. Open-access tools and databases useful for analyzing biological processes, building interaction networks, and functional enrichment will also be presented. Upon completion of the course, students will be able to understand the structure of biological big data, apply basic analytical approaches, critically interpret the results, and connect computational evidence to underlying biological processes. The course will adopt a theoretical-practical approach and be based on real datasets from published studies.

AIMS AND LEARNING OUTCOMES

The course aims to provide students with theoretical knowledge and basic application skills for analyzing, interpreting, and visualizing large biological and biomedical datasets. The course will introduce the main approaches to Big Data Analysis in the life sciences, with particular emphasis on data derived from omics technologies, such as transcriptomics, proteomics, metabolomics, and lipidomics. The fundamental concepts of data organization, normalization, exploration, and statistical analysis will be introduced, with a focus on identifying the most relevant variables for distinguishing between experimental groups or biological conditions. Particular emphasis will be placed on multivariate analysis methods, clustering, classification, and machine learning approaches to identify molecular signatures, biomarkers, and biological processes associated with specific experimental or pathological conditions. The course also aims to provide students with the tools necessary to query and use major open-access biological databases and to integrate the results of computational analyses with functional information, biological pathways, gene, protein, and metabolite interaction networks, and functional enrichment analyses.
Upon completion of the course, students should be able to:
• understand the main characteristics of biological and biomedical big data;
• recognize the challenges associated with managing, normalizing, and interpreting complex datasets;
• apply basic statistical and bioinformatic approaches to the analysis of omics data;
• interpret results from clustering, classification, and machine learning analyses;
• use biological databases and open-access tools for data annotation and functional interpretation;
• understand the biological significance of interaction networks, pathways, and enrichment analyses;
• critically visualize and communicate the results obtained from the analysis of complex biological datasets.

The course will use an integrated theoretical-practical approach and, where possible, real datasets from published studies, with the aim of introducing students to concrete data analysis problems in the biological and biomedical fields.

PREREQUISITES

Basic notions of Statistics.

TEACHING METHODS

Theoretical lectures (20 hours of training) will focus on the topics covered in the program, using an interactive approach and providing ample time for practical exercises in the classroom. In the event of an emergency, activities may be conducted online, subject to University regulations.

Students with a physical disability or learning disability certification submitted to the University can find information on support services on the webpage https://unige.it/disabilita-dsa, prepared by the “Office for Inclusion Services for Students with Disabilities and SLD”.

Students may also contact Professor Cristina Carbone (cristina.carbone@unige.it), the DISTAV disability contact person.

SYLLABUS/CONTENT

• INTRODUCTION TO BIG DATA IN THE LIFE SCIENCES
o Definition of Big Data, challenges, and opportunities in contemporary biological and biomedical research.
o Overview of high-throughput technologies.
• OMICS TECHNOLOGIES AND DATA STRUCTURE
o Fundamentals of Transcriptomics (RNA-Seq), Proteomics, Metabolomics, and Lipidomics.
o File formats, expression matrices, and initial exploration of raw data.
• DATA MANAGEMENT AND PRE-PROCESSING
o Data quality control.
o Filtering strategies and normalization techniques for omics data to remove technical biases and batch effects.
• STATISTICAL ANALYSIS AND BIOMARKERS
o Univariate statistical analysis applied to omics data (identification of differentially expressed genes, proteins, or metabolites).
o Correction for multiple testing (FDR, Bonferroni).
• MULTIVARIATE ANALYSIS AND CLUSTERING
o Dimensionality reduction: Principal Component Analysis (PCA).
o Unsupervised clustering methods: hierarchical clustering and k-means applied to molecular profiles.
• CLASSIFICATION AND MACHINE LEARNING FUNDAMENTALS
o Introduction to supervised learning (Machine Learning).
o Classification methods for the discovery of molecular signatures (biomarkers) and prediction of pathological/experimental status.
• SYSTEMS BIOLOGY AND FUNCTIONAL BIOINFORMATICS
o Querying and use of major open-access biological databases (e.g., NCBI, UniProt, KEGG, Reactome).
o Functional enrichment analysis (Gene Ontology, Pathway enrichment analysis).
o Reconstruction and biological interpretation of molecular interaction networks (gene-gene networks, protein-protein interactions).
• DATA VISUALIZATION
o Graphical tools for critically communicating complex biological data (Volcano plot, Heatmap, PCA plot).

TEACHERS AND EXAM BOARD

MAURIZIO BRUSCHI

Ricevimento: On appointment, Prof. Maurizio Bruschi, IRCCS Istituto Giannina Gaslini. Laboratorio di Nefrologia Molecolare (Padiglione 12, fondi) Via Gerolamo Gaslini, 5 – 16147 Genova (GE) E-mail: maurizio.bruschi@unige.it

LESSONS

LESSONS START

Consult the detailed timetable at the following link: https://easyacademy.unige.it/portalestudenti/

Class schedule

The timetable for this course is available here: Portale EasyAcademy

EXAMS

EXAM DESCRIPTION

The exam will consist of a written test (a multiple-choice quiz), administered in person or online.

The test is passed with a score of at least 18/30.

ASSESSMENT METHODS

The exam will assess the achievement of the learning objectives. Specifically, it will assess the student's theoretical knowledge of biological big data analysis flows (from normalization to functional interpretation) and his or her ability to critically interpret the output of the main statistical and bioinformatics approaches presented in class, correctly connecting computational evidence to the underlying biological processes.