Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsMay 23, 2026·3 min de lecture

OpenRefine — Data Cleaning Power Tool for Messy Datasets

A desktop application for exploring, transforming, and reconciling messy data. Handles CSV, TSV, JSON, XML, and spreadsheet files with powerful clustering, faceting, and batch-editing capabilities.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
OpenRefine Overview
Commande CLI universelle
npx tokrepo install 21166506-563e-11f1-9bc6-00163e2b0d79

Introduction

OpenRefine (formerly Google Refine) is a desktop application for cleaning and transforming data. It runs a local server and provides a browser-based interface where you can explore datasets, find inconsistencies, and apply bulk transformations. It is widely used in data journalism, library science, and data engineering to prepare messy data for analysis or import into databases.

What OpenRefine Does

  • Imports data from CSV, TSV, JSON, XML, Excel, and Google Sheets
  • Facets and filters data to quickly identify patterns, outliers, and errors
  • Clusters similar values using multiple algorithms to merge inconsistent entries
  • Applies GREL, Jython, or Clojure expressions for custom cell transformations
  • Reconciles records against external knowledge bases like Wikidata

Architecture Overview

OpenRefine is a Java application that runs an embedded Jetty web server. The frontend is a single-page application served to the browser. Data is loaded into an in-memory model with a full operation history that supports unlimited undo. All transformations are recorded as a reproducible JSON operation log that can be exported and replayed on other datasets.

Self-Hosting & Configuration

  • Runs on Windows, macOS, and Linux with Java 11+ (bundled in the download)
  • No installation required: unpack the archive and run the executable
  • Configure memory allocation by editing the refine.ini file (default is 1.4 GB)
  • Data is stored locally in the workspace directory; no external database needed
  • Extend functionality with community extensions for RDF export, NER, and more

Key Features

  • Clustering algorithms (key collision, nearest neighbor) for fuzzy deduplication
  • Full undo/redo history with exportable operation logs for reproducibility
  • Wikidata reconciliation service for entity matching and enrichment
  • GREL expression language for flexible cell-level transformations
  • Extension system supporting RDF/Linked Data, named entity extraction, and more

Comparison with Similar Tools

  • pandas (Python) — code-based; OpenRefine provides an interactive visual interface for non-programmers
  • Trifacta / Alteryx — commercial data wrangling tools; OpenRefine is free and open source
  • Excel Power Query — tied to the Microsoft ecosystem; OpenRefine is cross-platform and handles larger datasets
  • csvkit — CLI toolkit for CSV; OpenRefine offers a richer visual exploration experience

FAQ

Q: How large a dataset can OpenRefine handle? A: OpenRefine works well with datasets up to a few hundred thousand rows. For millions of rows, increase the Java heap size or consider splitting the dataset.

Q: Is my data uploaded to any server? A: No. OpenRefine runs entirely on your local machine. The browser interface connects to localhost only.

Q: Can I automate transformations? A: Yes. You can export the operation history as JSON and apply it to new datasets programmatically using the OpenRefine API or CLI batch mode.

Q: Does OpenRefine support databases directly? A: OpenRefine imports and exports flat files. For database integration, export cleaned data as CSV and use a database import tool, or use the SQL exporter extension.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires