Paper accepted at ISWC 2021
The paper „Graph-boosted Active Learning for Multi-Source Entity Resolution“ by Anna Primpeli and Christian Bizer was accepted at ISWC 2021.
Paper accepted for PVLDB 2021
The paper „Dual-Objective Fine-Tuning of BERT for Entity Matching“ by Ralph Peeters and Christian Bizer has been accepted for publication by the Proceedings of the VLDB Endowment (PVLDB) 2021. The paper will be presented at the VLDB 2021 conference in Copenhagen, Denmark in August.
WDC Schema.org Table Corpus released
We are happy to announce the release of the WDC Schema.org Table Corpus.
WInte.r Web Data Integration Framework Version 1.4 released
We are happy to announce the release of Version 1.4 of the Web Data Integration Framework (WInte.r).
WebDataCommons releases 86.3 billion quads Microdata, Embedded JSON-LD, RDFa, and Microformat data originating from 15.3 million websites
The DWS group is happy to announce the new release of the WebDataCommons Microdata, JSON-LD, RDFa and Microformat data corpus.
Paper accepted for DI2KG
The paper „Intermediate Training of BERT for Product Matching“ by Ralph Peeters, Christian Bizer and Goran Glavaš has been accepted for the 2nd International Workshop on Challenges and Experiences from Data Integration to Knowledge Graphs (DI2KG) held in conjunction with VLDB 2020.
Paper accepted for CIKM
The paper „Profiling Entity Matching Benchmark Tasks“ by Anna Primpeli and Christian Bizer has been accepted for the 29th International Conference on Information and Knowledge Management (CIKM) which will be held online this year.
Yaser Oulabi has successfully defended his PhD thesis
Yaser Oulabi has successfully defended his PhD thesis on „Augmenting Cross-Domain Knowledge Bases Using Web Tables“ today.
CfP: Benchmark Competition on Product Data Integration at ISWC 2020
Together with the University of Sheffield, we are organizing a benchmark competition on product data integration at the 19th International Semantic Web Conference (ISWC 2020). The competition consists of two tasks: Product Offer Matching and Product Classification. Submissions to both tasks are ...
44.2 billion quads Microdata, Embedded JSON-LD, RDFa, and Microformat data originating from 11.9 million websites published
The DWS group is happy to announce the new release of the WebDataCommons Microdata, JSON-LD, RDFa and Microformat data corpus. The data has been extracted from the November 2019 version of the Common Crawl covering 2.4 billion HTML pages which originate from 32 million websites (pay-level domains).