RapidMiner Linked Open Data Extension

Winner of the Semantic Web Challenge 2014

The RapidMiner Linked Open Data Extension is an extension to the open source data mining software RapidMiner. It allows using data from Linked Open Data both as an input for data mining as well as for enriching existing datasets with background knowledge. The RapidMiner Linked Open Data Extension is based on the earlier FeGeLOD framework (which is discontinued now).

Possible usages include (click to see details):

Unlike many related approaches, the RapidMiner Linked Open Data Extension may work in a completely unsupervised fashion, which means that almost no knowledge about the data source used and about technologies such as RDF and SPARQL is required to use it.

Download

The RapidMiner Linked Open Data Extension is available from the RapidMiner marketplace.

To install the extension, go to the “Help”->“Updates and Extensions” menu in RapidMiner, and search for “Linked Open Data”.

Operators

The extension provides three main categories of operators:

Data importers that load data from Linked Open Data into RapidMiner for further processing
Linkers that create links from a given dataset to a dataset in Linked Open Data (e.g., linking a CSV file to DBpedia)
Generators that gather data from Linked Open Data and add it as attributes in the data set at hand

There are different kinds of generators in the extension, such as

Adding data attributes, such as population
Adding types, such as “G20 country”
Adding aggregated relations, such as number of companies located in a city

Adding arbitrary data using customizable SPARQL statementsThe operators provided by the Linked Open Data Extension can be used in conjunction with built-in RapidMiner operators as well as other extensions to build powerful Data Mining processes.

Documentation

All operators, as well as example workflows, are described in the user manual (PDF, 995 kB).

Publications

The extension itself, as well as the underlying algorithms, are described in:

Petar Ristoski: Towards Linked Open Data enabled Data Mining: Strategies for Feature Generation, Propositionalization, Selection, and Consolidation (PDF, 362 kB). In: Extended Semantic Web Conference, 2015.
Heiko Paulheim, Petar Ristoski, Evgeny Mitichkin, and Christian Bizer. Data Mining with Background Knowledge from the Web (PDF). In: RapidMiner World, 2014.
Heiko Paulheim. Exploiting Linked Open Data as Background Knowledge in Data Mining (PDF). In: CEUR workshop proceedings DMoLD 2013 : Proceedings of the International Workshop on Data Mining on Linked Data, with Linked Data Mining Challenge collocated with ECMLPKDD 2013; 1–10. RWTH, Aachen, 2013.
Heiko Paulheim and Johannes Fürnkranz: Unsupervised Generation of Data Mining Features from Linked Open Data. (PDF) In: International Conference on Web Intelligence, Mining, and Semantics (WIMS), 2012.

The following publications discuss various applications that use the RapidMiner LOD Extension (or its predecessor FeGeLOD):

Identifying wrong links in Linked Open Data: Heiko Paulheim. Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection (PDF). In: Third International Workshop on Debugging Ontologies and Ontology Mappings (WoDOOM 2014).
Explaining statistical data:
- Heiko Paulheim. Generating Possible Interpretations for Statistics from Linked Open Data (PDF). In: 9th Extended Semantic Web Conference, ESWC 2012; 560–574. Springer, Berlin [u. a.], 2012.
- Petar Ristoski and Heiko Paulheim. Analyzing Statistics with Background Knowledge from Linked Open Data (PDF). In: First International Workshop on Semantic Statistics (SemStats 2013).
- Petar Ristoski and Heiko Paulheim. Visual Analysis of Statistical Data on Maps using Linked Open Data (PDF, 896 kB). In: 12th Extended Semantic Web Conference, ESWC 2015; Posters and Demos.
Classifying Tweets: Axel Schulz, Petar Ristoski and Heiko Paulheim. I See a Car Crash: Real-time Detection of Small Scale Incidents in Microblogs (PDF). In: Workshop on Social Media and Linked Data for Emergency Response (SMILE), 2013.
Predicting the location of Twitter users: Axel Schulz, Aristotelis Hadjakos, Heiko Paulheim, Johannes Nachtwey, Max Mühlhäuser. A Multi-Indicator Approach for Geolocalization of Tweets (PDF). In: International AAAI Conference on Weblogs and Social Media (ICWSM 2013).
Classifying event information extracted from Wikipedia: Daniel Hienert, Dennis Wegener and Heiko Paulheim. Automatic Classification and Relationship Extraction for Multi-Lingual and Multi-Granular Events from Wikipedia (PDF). In: Proceedings of the Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE 2012); 1–10. RWTH, Aachen, 2012.
Schema Matching for Linked Data: Frederik Janssen, Faraz Fallahi, Jan Nößner and Heiko Paulheim. Towards Rule Learning Approaches to Instance-based Ontology Matching (PDF). In: Proceedings of the First International Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data; 13–18. RWTH, Aachen, 2012.

Support & Community

If you use the RapidMiner LOD extension, you may want to join the Google Group at

https://groups.google.com/forum/#!forum/rmlod

or contact the user community via its mailing list:

Team

Project lead:

Heiko Paulheim

Current team:

Christian Bizer
Evgeny Mitichkin
Petar Ristoski

Past contributors:

Raad Bahmani
Johannes Fürnkranz
Alexander Gabriel
Simon Holthausen