OPIEC is an Open Information Extraction (OIE) corpus, constructed from the entire English Wikipedia. It containing more than 341M triples. Each triple from the corpus is composed of rich meta-data: each token from the subj / obj / rel along with NLP annotations (POS tag, NER tag, ...), provenance sentence (along with its dependency parse, sentence order relative to the article), original (golden) links contained in the Wikipedia articles, space / time, etc (for more detailed explanation of the meta-data, see here ).
Links for downloading the OPIEC corpus:
As a bonus corpus, we offer WikipediaNLP: the entire English Wikipedia with NLP annotations (dependency parse, POS tags, NER tags, ...).
Tracking cookies are currently allowed.
Tracking cookies are currently not allowed.