Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsMay 23, 2026·3 min de lecture

doccano — Open-Source Text Annotation Tool for Machine Learning

A web-based annotation platform for creating labeled datasets for NLP tasks including text classification, sequence labeling, and sequence-to-sequence problems.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
doccano Overview
Commande CLI universelle
npx tokrepo install 51dcb118-563e-11f1-9bc6-00163e2b0d79

Introduction

doccano is a self-hosted annotation tool for building labeled datasets for natural language processing. It supports text classification (sentiment, topic), sequence labeling (NER, POS tagging), and sequence-to-sequence tasks (translation, summarization). Teams can collaborate on annotation projects with built-in user management and inter-annotator agreement metrics.

What doccano Does

  • Annotates text documents for classification, named entity recognition, and seq-to-seq tasks
  • Supports multi-label and multi-class annotation with customizable label sets
  • Provides keyboard shortcuts for fast annotation workflows
  • Imports data from JSON, JSONL, CSV, TSV, and CoNLL formats
  • Exports labeled datasets in formats compatible with spaCy, Hugging Face, and other ML frameworks

Architecture Overview

doccano is a Python application built with Django on the backend and Vue.js on the frontend. It uses PostgreSQL (or SQLite for small deployments) for storing projects, documents, and annotations. The application runs as a single process with Celery for background tasks like data import and export. The REST API enables programmatic access to all annotation operations.

Self-Hosting & Configuration

  • Install via pip or run with Docker: docker compose up
  • SQLite works for evaluation; use PostgreSQL for production multi-user setups
  • Configure authentication backends including LDAP and social login providers
  • Set up role-based access control with admin, annotator, and reviewer roles
  • Back up the database and media directory for data persistence

Key Features

  • Three annotation modes: text classification, sequence labeling, and seq-to-seq
  • Auto-labeling integration for pre-annotating documents with ML models
  • Inter-annotator agreement metrics to measure label consistency across team members
  • REST API for programmatic project creation, data upload, and annotation retrieval
  • Collaborative features with user assignment, annotation review, and commenting

Comparison with Similar Tools

  • Label Studio — supports more data types (images, audio, video); doccano focuses exclusively on text
  • Prodigy — commercial tool by the spaCy team; doccano is free and open source
  • CVAT — specializes in computer vision annotation; doccano handles text-only tasks
  • Argilla — newer tool with tighter Hugging Face integration; doccano has a simpler setup

FAQ

Q: Can doccano handle multiple annotators on the same dataset? A: Yes. You can assign documents to specific annotators and measure inter-annotator agreement to identify labeling inconsistencies.

Q: Does doccano support pre-annotation? A: Yes. The auto-labeling feature lets you connect ML models to generate initial annotations that human annotators can then correct.

Q: What export formats are available? A: doccano exports in JSONL, CSV, and CoNLL formats. The JSONL format is directly compatible with spaCy and Hugging Face datasets.

Q: How does doccano compare to commercial annotation platforms? A: doccano covers the core annotation workflow well for small to medium teams. Commercial platforms may offer more advanced features like active learning and workforce management.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires