The DWS group is happy to announce the new release of the WebDataCommons Microdata, JSON-LD, RDFa and Microformat data corpus. The data has been extracted from the November 2019 version of the Common Crawl covering 2.4 billion HTML pages which originate from 32 million websites (pay-level domains).
Professor Bizer was invited to give the keynote speech at the 9th Joint International Semantic Technology Conference (JIST2019) in Hangzhou, China.
We are happy to announce that Professor Christian Bizer has received the SWSA Ten-Year Award at the 18th International Semantic Web Conference (ISWC2019) in Aukland, New Zealand.
We are happy to announce the release of Version 2.0 of the Web Data Commons Product Data Corpus and Gold Standard for Large-Scale Product Matching. The product data corpus consits of 26 million product offers (16 million English language offers) originating from 79 thousand different e-shops. The ...
Oliver Lehmberg has successfully defended his PhD thesis on „Web Table Integration and Profiling for Knowledge Base Augmentation“ today.
The paper „Robust Active Learning of Expressive Linkage Rules“ by Anna Primpeli and Christian Bizer has won the best paper award at the 9th International Conference on Web Intelligence, Mining and Semantics (WIMS) in South Korea.
Prof. Christian Bizer has given the keynote speech at the Language Data and Knowledge (LDK 2019) conference in Leipzig Germany.
Anna Primpeli has presented today the Web Data Commons - Training Dataset and Gold Standard for Large-Scale Product Matching at the Workshop on e-Commerce and NLP held at The Web Conference (WWW2019) in San Francisco.
A current research question in the area of entity resolution (also ...
The article „Using the Semantic Web as a Source of Training Data“ by Christian Bizer, Anna Primpeli, Ralph Peeters has been accpeted for the upcoming special issue on „Data and Repeatability“ of Datenbank Spektrum.
The DWS group is happy to announce the new release of the WebDataCommons Microdata, JSON-LD, RDFa and Microformat data corpus. The data has been extracted from the November 2018 version of the Common Crawl covering 2.5 billion HTML pages which originate from 32 million websites (pay-level domains).
Tracking cookies are currently allowed.
Tracking cookies are currently not allowed.