Web Data Integration (HWS2019)

Data integration is one of the key challenges in most IT projects and it is estimated that data scientists spend about 80% of their time on data integration. Within the enterprise context, data integration problems arise whenever data from separate sources needs to be combined as the basis for new applications or data analysis projects. Within the context of the Web, data integration techniques form the foundation for taking advantage of the ever growing number of publicly-accessible data sources and for enabling applications such as product comparison portals, job portals, location-based mashups, or data search engines.

In the course, students will learn techniques for integrating and cleansing data from large sets of heterogeneous data sources. The course will cover the following topics:

  1. Heterogeneity and Distributedness

  2. The Data Integration Process

  3. Structured Data on the Web

  4. Data Exchange Formats

  5. Schema Mapping and Data Translation

  6. Identity Resolution

  7. Data Quality Assessment

  8. Data Fusion

The course consists of a lecture as well as accompanying practical projects. The lecture (IE670) covers the theory and methods of web data integration and is concluded by a written exam (3 ECTS). In the projects (IE683), students will gain experience with web data integration methods by applying them within a real-world use case of their choise. Students will work on their projects in teams and will report the results of their projects in the form of a written report as well as an oral presentation (together 3 ECTS). While the lecture and the project can be attended in seperate years, it is highly recommended to attend both in the same semester as the schedule of the lecture and project are aligned to each other.

Exam Review

  • The exam review will take place on the 28th of February 2020 at 14:00 Uhr in room B6 C101. Please register for the view by email with Anna and Ralph.

Time and Location

  • Wednesday, 15:30–17:00. Building: B6, Room: A 101 (Starting: 4.9.2019)
  • Thursday, 10:15–11:45. Building: A5, Room: C 014 (Starting: 5.9.2019)

ECTS

  • 3 ECTS: Lecture with written exam (IE670)
  • 3 ECTS: Project with report and presentation (IE683)

Outline

Week WednesdayThursday
4.9.2019Lecture: Introduction to Web Data IntegrationLecture: Structured Data on the Web
11.9.2019Lecture: Data Exchange FormatsLecture: Data Exchange Formats
18.9.2019Lecture: Schema MappingLecture: Schema Mapping
25.9.2019Project: Introduction to Student ProjectsProject: Introduction to MapForce
2.10.2019Project: Feedback about Project Outlines- Holiday -
9.10.2019Project Work: Data TranslationLecture: Identity Resolution
16.10.2019Lecture: Identity ResolutionProject: Identity Resolution
23.10.2019Project Work: Identity ResolutionProject Work: Identity Resolution
30.10.2019Project Work: Identity Resolution- Holiday -
6.11.2019Lecture: Data Quality and Data FusionLecture: Data Quality and Data Fusion
13.11.2019Project: Data FusionProject Work: Data Fusion
20.11.2019Project Work: Data Fusion Project Work: Data Fusion
27.11.2019Project Work: Data FusionProject Work: Data Fusion
4.12.2019Presentation of project resultsPresentation of project results