a free, open source power tool for working with messy data and improving it

OpenRefine is an established data cleaning tool, popular in a broad range of communities: journalism, digital humanities, libraries, linked open data, and many more. Its design has influenced many other tools, not counting forks and rebranded versions.

Scope of the tool

The tool is used to perform data transformations on small to medium-scale datasets, by interactively building workflows which mix automated transforms and human review. The transformations are reproducible: they can be replayed on datasets in the same format, with updated data. The focus is on usability through a web UI, reducing the need to learn a programming or query language.

 Stack

OpenRefine is built in Java (server-side) and uses a web UI (jQuery). It is easy to work on isolated parts of the code without being familiar with the entire architecture. We try to maintain good quality standards, by testing all our changes, but remain flexible and un-opiniated.

lightbulb_outline View ideas list

Technologies

  • java
  • javascript

Topics

comment IRC Channel
email Mailing list
mail_outline Contact email

OpenRefine 2020 Projects

  • Lu Liu
    Enhancements for the Wikidata extension
    Add OAuth support for the Wikidata extension (#1612). Extend the Wikidata extension to support arbitrary Wikibase instances (#1640).
  • Lisa Chandra
    Replace row pagination by infinite scrolling
    OpenRefine is a powerful Open Source tool that provides its users the ability to work with and clean messy data. It currently uses a pagination...
close

2020