CS 560: Large-Scale Data Management (HWS 2021)
Organization
- Lecturer: Prof. Dr. Rainer Gemulla
- Tutor: Adrian Kochsiek
- Type of course: Lecture, exercises (6 ECTS points)
- Prerequisites: Database Systems I or equivalent, programming experience
- Registration: Enroll in Portal 2
The lecture will be held digitally, the tutorial once digitally and once in presence (when possible). Details are discussed in the kickoff lecture (Tuesday, Sep 7, 10:15).
Content
This course introduces the fundamental concepts and computational paradigms of large-scale data management and Big Data. This includes methods for storing, updating, querying, and analyzing large dataset as well as for data-intensive computing. The course covers concept, algorithms, and system issues; accompanying exercises provide hands-on experience. Topics include:
- Parallel and distributed databases
- MapReduce and its ecosystem
- Spark and dataflows
- NoSQL databases
- Stream processing (tentative)
- Graph processing (tentative)
Lecture Notes
Lecture recordings, lecture notes, exercises, and supplementary material can be found in ILIAS.
Literature
- H. Garcia-Molina, J. D. Ullman, J. Widom. Database Systems: The Complete Book. Prentice Hall, 2nd ed., 2008
- T. Öszu, P. Valduriez. Principles of Distributed Database Systems. Springer, 4th ed., 2020
- L. Wiese. Advanced Data Management: For SQL, NoSQL, Cloud and Distributed Databases. De Gruyter, 2015
- T. White. Hadoop – The Definitive Guide. O’Reilly, 4th ed., 2015
- J. Lin, C. Dyer. Data-Intensive Text Processing with MapReduce. Morgan and Claypool, 1st ed., 2010
- E. Redmond, J. R. Wilson- Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement. Pragmatic Bookshelf, 2nd ed., 2018
- P. J. Sadalage, M. Fowler. NoSQL Distilled. Addison-Wesley, 2012
- C. Strauch. NoSQL databases. Stuttgart Media University, 2011
- More in lecture notes