Abstract: Millions of websites have started to annotate data describing products, local business, events, jobs, places, recipes, and reviews within their HTML pages using the schema.org vocabulary. These annotations are widely used by search engines to render rich snippets within search results. Surprisingly, the annotations are hardly used by the research community. In the talk, Christian Bizer investigates the potential of schema.org annotations for being used as training data for tasks such as entity matching, information extraction, and sentiment analysis. Web pages that offer semantic annotations often also contain additional structured data in the form of HTML tables. In the second part of the talk, Christian Bizer discusses the interplay of semantic annotations and web tables for information extraction as well as the general potential of relational HTML tables for complementing knowledge bases such as DBpedia, focusing on the discovery of formerly unknown long tail entities as well as the extraction of n-ary relations.
More information about the LDK2019 conference.