Focus Group: Web-based Systems

(Prof. Bizer)

The Web-based Systems group conducts research on methods for integrating data from large numbers of data sources in the context of the open Web and in corporate data lakes. Our research includes areas such as entity matching, schema matching, table annotation, information extraction, and data discovery. Our current work focuses on utilizing large language models and LLM-based agents for data integration tasks. We apply the developed methods to integrate product data from large numbers of e-shops and to construct knowledge graphs such as DBpedia. The empirical research of the group includes monitoring the adoption of schema.org annotations on the public Web by regularly extracting structured data from large Web corpora.

People

Current Team:

Alumni:

Awards

Publications

2025

  • Peeters, R., Steiner, A. and Bizer, C. (2025). Entity matching using large language models. In , Proceedings 28th International Conference on Extending Database Technology (EDBT 2025), Barcelona, Spain, March 25-March 28 (S. 529–541). OpenProceedings, OpenProceedings.org: Konstanz.

2024

2023

  • Bizer, C. (2023). GPT-4 versus BERT: Which foundation model is more suitable for integrating data from the web? WEBIST 2023, 19th International Conference on Web Information Systems and Technologies, Roma, Italy.
  • Bizer, C., Heath, T. and Berners-Lee, T. (2023). Linked data – the story so far. In Linking the world’s information: Essays on Tim Berners-Lee’s Invention of the World Wide Web (S. 115–143). New York: ACM Digital Library.
  • Brinkmann, A. (2023). Neural data search for table augmentation. In , Proceedings of the Workshops of the EDBT/ICDT 2023 Joint Conference, Ioannina, Greece, March, 28, 2023 (S. 1–4). CEUR Workshop Proceedings, RWTH Aachen: Aachen, Germany.
  • Brinkmann, A., Primpeli, A. and Bizer, C. (2023). The Web Data Commons Schema.Org Data Set Series. In , The ACM Web Conference : Companion of the World Wide Web Conference WWW 2023 (S. 136–139). , Association for Computing Machinery: New York, NY.
  • Hassanzadeh, O., Abdelmageed, N., Efthymiou, V., Chen, J., Cutrona, V., Hulsebos, M., Jiménez-Ruiz, E., Khatiwada, A., Korini, K., Kruit, B., Sequeda, J. and Srinivas, K. (2023). Results of SemTab 2023. In , Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching, SemTab 2023, co-located with the 22nd International Semantic Web Conference, ISWC 2023, Athens, Greece, November 6–10, 2023 (S. 1–14). CEUR Workshop Proceedings, RWTH Aachen: Aachen, Germany.
  • Korini, K. and Bizer, C. (2023). Column type annotation using ChatGPT. In , Joint proceedings of workshops at the 49th International Conference on Very Large Data Bases (VLDB 2023), Vancouver, Canada, August 28 – September 1, 2023, VLDBW 2023 (S. 1–12). CEUR Workshop Proceedings, RWTH Aachen: Aachen, Germany.
  • Peeters, R. and Bizer, C. (2023). Using ChatGPT for Entity Matching. In , New Trends in Database and Information Systems : ADBIS 2023 short papers, doctoral consortium and workshops: AIDMA, DOING, K-Gals, MADEISD, PeRS, Barcelona, Spain, September 4–7, 2023, Proceedings (S. 221–230). Communications in Computer and Information Science, Springer: Cham.
  • Peeters, R., Der, R. C. and Bizer, C. (2023). WDC products: A multi-dimensional entity matching benchmark. In , Proceedings 27th International Conference on Extending Database Technology (EDBT 2024), Paestum, Italy, March 25 – March 28 (S. 22–33). OpenProceedings, OpenProceedings.org: Konstanz.

2022

2021

2020

2019

2018

  • Bizer, C., Vidal, M.-E. and Skaf-Molli, H. (2018). Linked Open Data. In , Encyclopedia of Database Systems (S. 2096-2101). New York, NY: Springer.
  • Bizer, C., Vidal, M.-E. and Weiss, M. (2018). RDF Technology. In , Encyclopedia of Database Systems (S. 3106-3109). New York, NY: Springer.
  • Bizer, C., Vidal, M.-E. and Weiss, M. (2018). Resource Description Framework. In , Encyclopedia of Database Systems (S. 3221-3224). New York, NY: Springer.
  • Kleppmann, B., Bizer, C., Yaqub, E., Temme, F., Schlunder, P., Arnu, D. and Klinkenberg, R. (2018). Density- and correlation-based table extension. In , LWDA 2018 : Proceedings of the Conference “Lernen, Wissen, Daten, Analysen” Mannheim, Germany, August 22–24, 2018 (S. 191–194). CEUR Workshop Proceedings, RWTH Aachen: Aachen, Germany.
  • Ristoski, P., Petrovski, P., Mika, P. and Paulheim, H. (2018). A machine learning approach for product matching and categorization. Semantic Web, 9, 707–728.

2017

2016

2015

2014