WDC JSON-LD/Microdata/RDFa Data Corpus and WDC Schema.org Table Corpus 2023 published
We are happy to announce the 2023 release of the WebDataCommons Microdata, JSON-LD and RDFa Data Corpus as well as the release of the WebDataCommons schema.org Table Corpus.
Prof. Bizer gives keynote at WebIST conference comparing GPT4 and BERT for Web Data Integration
Prof. Christian Bizer has given a keynote talk comparing the utility of GPT4 and BERT for Web Data Integration at the 19th International Conference on Web Information Systems and Technologies (WEBIST) in Rome.
Best Paper Award at VLDB Tabular Data Analysis Workshop
We are happy to announce that the paper “Column Type Annotation using ChatGPT” by Keti Korini and Christian Bizer has won the best paper award of the Tabular Data Analysis (TaDa) workshop at VLDB 2023 in Vancouver, Canada.
Paper accepted at EDBT 2024
The paper “WDC Products: A Multi-Dimensional Entity Matching Benchmark” by Ralph Peeters, Reng Chiz Der and Christian Bizer has been accepted at EDBT2024.
WDC Block: A large Blocking Benchmark released
We are happy to announce the release of Web Data Commons Block (WDC-Block), a large Blocking Benchmark. WDC Block is based on product data that has been extracted in 2020 from 3,259 e-shops that marked up product offers within their HTML pages using the schema.org vocabulary. The benchmark is ...
Paper accepted at ADBIS 2023
The paper “Using ChatGPT for Entity Matching” by Ralph Peeters and Christian Bizer was accepted at ADBIS 2023.
WDC Products: Multi-Dimensional Entity Matching Benchmark released
We are happy to announce the release of the multi-dimensional WDC Products Benchmark for entity matching. WDC Products is based on product data that has been extracted in 2020 from 3259 e-shops that mark up product offers within their HTML pages using the schema.org vocabulary. It contains overall ...
WebDataCommons releases 86.4 billion quads Microdata, Embedded JSON-LD, RDFa, and Microformat data originating from 14.2 million websites
The DWS group is happy to announce the new release of the WebDataCommons Microdata, JSON-LD, RDFa and Microformat data corpus.
SOTAB wins Dataset Track of SemTab Challenge at ISWC 2022
We are happy to announce that the Web Data Commons – Schema.org Table Annotation Benchmark (WDC SOTAB) has won the Dataset Track of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) at the International Semantic Web Conference 2022.
Anna Primpeli has successfully defended her PhD Thesis
Anna Primpeli has successfully defended her PhD thesis titled “Reducing the Labeling Effort for Entity Resolution using Distant Supervision and Active Learning” today.