Photo credit: Anna Logue

WInte.r Web Data Integration Framework Version 1.4 released

We are happy to announce the release of Version 1.4 of the Web Data Integration Framework (WInte.r).

WInte.r is a Java framework for end-to-end data integration. The framework implements a wide variety of different methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation. The methods are designed to be easily customizable by exchanging pre-defined building blocks, such as blockers, matching rules, similarity functions, and conflict resolution functions.

The following features have been added to the framework for the new release:

  • Additional Similarity Measures: New TF-IDF cosine similarity measure to calculate similarity of strings. New geo coordinate similarity measure that calculates the distance of geo coordinates based on the haversine formula.
  • Missing Value Enabled Matching Rule: A new linear combination matching rule is introduced that can adapt its similarity calculation if a missing value is detected.
  • Improved Debug Reports: The new release improves the user’s experience through additional debugging of the data fusion on record level and an extended logging of events.
  • Step-by-Step Tutorial: All new features are explained in detail in the WInte.r Wiki and a step-by-step tutorial that describes how to use WInte.r for identity resolution and data fusion as well as how to debug and fine-tune the different steps of the integration process.

Detailed information about the WInte.r framework is found at

https://github.com/olehmberg/winter

The WInte.r framework can be downloaded from the same web site. The framework can be used under the terms of the Apache 2.0 License.

Lots of thanks to Alexander Brinkmann, Anna Primpeli, and Oliver Lehmberg for their work on the new release as well as on the extended documentation in the WInte.r wiki.

Back