We will study principles and algorithms, architectures and technologies, programming models and frameworks needed to support data-intensive applications
Learning the theoretical, methodological, and technological fundamentals of advanced data processing architectures, large-scale distributed environments, and data intensive programming including Docker, HDFS, Hadoop, Spark, and Cloud/IoT platforms.
Labs will be focused on map-reduce architectures and libraries Technology: HDFS, Hadoop, Spark using Python and Java/Scala Intermediate lab activities (5/6 labs) Final project
Sequential, Concurrent and Distributed Programming Database Theory and Practice Basic notions of Data Analysis
Frontal lectures Lab sessions
Distributed Systems and Distributed Programming Virtualization and containers Parallel Python Distributed data systems and shared nothing architectures Partitioning Replication Fault Tolerance CAP Theorem Map/Filter/Reduce and Generators in Python Map in Multiprocessing Introduction to Hadoop and HDFS Map Reduce Map Reduce: Simple Design Patterns and Relational Algebra Operators Hadoop Runtime System Apache Spark Apache Spark Internals PySpark, Java/Scala Spark Streaming Data Streaming Spark
Ricevimento: Su appuntamento in presenza o Teams
Ricevimento: Su appuntamento, via email o Microsoft Teams Stanza: Valle Puggia – 327
GIORGIO DELZANNO (Presidente)
GIOVANNA GUERRINI
BARBARA CATANIA (Presidente Supplente)
FEDERICO DASSERETO (Supplente)
Final online test with open and closed questions Project presentation and discussion Bonusf for at least 70% attendance rate (if in presence) and lab assignments successfully delivered
The proposed exercises, project and final test cover both conceptual and practical aspects presented in the course