Cette page est affichée en anglais. Une traduction française est en cours.

SkillsApr 16, 2026·3 min de lecture

Apache NiFi — Visual Dataflow Automation & Integration Platform

Apache NiFi is a powerful dataflow management system that lets you design, control, and monitor data pipelines through a drag-and-drop web interface. Built for enterprise data routing, transformation, and system mediation with provenance tracking and guaranteed delivery.

Apache Software Foundation · Community

Prêt pour agents

Installation avec revue préalable

Cet actif nécessite une revue. Le prompt copié demande un dry-run, affiche les écritures, puis continue seulement après confirmation.

Needs Confirmation · 64/100Policy : confirmer

Surface agent

Tout agent MCP/CLI

Type

Skill

Installation

Single

Confiance

Confiance : Community

Point d'entrée

Apache NiFi Overview

Commande avec revue préalable

npx -y tokrepo@latest install 45f70684-39ec-11f1-9bc6-00163e2b0d79 --target codex

Dry-run d'abord, confirmez les écritures, puis lancez cette commande.

TL;DR

Apache NiFi provides a visual drag-and-drop interface for designing enterprise data pipelines.

§01

What it is

Apache NiFi was originally developed by the NSA and donated to the Apache Foundation. It automates the movement of data between disparate systems with a visual flow-based programming interface. NiFi excels at complex enterprise integration scenarios where data provenance, backpressure, and guaranteed delivery are non-negotiable.

NiFi provides a web-based drag-and-drop interface for designing dataflow pipelines, routes and transforms data between hundreds of source and destination systems, and tracks full data provenance from origin to destination.

§02

How it saves time or tokens

NiFi eliminates the need to write custom ETL code for data integration tasks. Instead of coding data pipelines in Python or Java, you drag processors onto a canvas, connect them, and configure routing rules through the UI. Changes to pipelines take effect immediately without restarting. Backpressure handling means downstream slowdowns are managed automatically. Data provenance tracking provides a complete audit trail for compliance.

§03

How to use

Download and start NiFi:

wget https://downloads.apache.org/nifi/2.1.0/nifi-2.1.0-bin.zip
unzip nifi-2.1.0-bin.zip && cd nifi-2.1.0
./bin/nifi.sh start
# Access UI at https://localhost:8443/nifi
# Default credentials in logs/nifi-app.log

Create your first dataflow by dragging processors onto the canvas.

Common pipeline pattern:

GetFile -> SplitText -> EvaluateJsonPath -> PutDatabaseRecord
# Reads files, splits into records, extracts fields, writes to database

§04

Example

A NiFi pipeline configuration in XML for fetching and transforming API data:

<!-- NiFi flow snippet: API to Database -->
<processors>
  <processor>
    <name>Fetch API Data</name>
    <type>InvokeHTTP</type>
    <config>
      <property name="HTTP Method">GET</property>
      <property name="Remote URL">https://api.example.com/data</property>
      <property name="Schedule">5 min</property>
    </config>
  </processor>
  <processor>
    <name>Transform JSON</name>
    <type>JoltTransformJSON</type>
  </processor>
  <processor>
    <name>Write to PostgreSQL</name>
    <type>PutDatabaseRecord</type>
  </processor>
</processors>

§05

Related on TokRepo

Automation tools — More data pipeline and automation tools on TokRepo.
Database tools — Browse database integration tools.

§06

Common pitfalls

Running NiFi with default heap settings causes OutOfMemory errors under load. Set java.arg.Xms and java.arg.Xmx in bootstrap.conf based on your data volume.
Not configuring backpressure thresholds on connections leads to memory exhaustion. Set object and size thresholds on every connection.
NiFi's default single-user authentication is not suitable for production. Configure LDAP, OpenID Connect, or client certificate authentication before deploying.

Questions fréquentes

What is data provenance in NiFi?+

NiFi tracks every event that happens to every piece of data (FlowFile): creation, modification, routing, cloning, and delivery. You can trace any byte from its origin to its final destination, which is critical for compliance and debugging.

How does NiFi handle backpressure?+

Each connection between processors has configurable thresholds for object count and data size. When a downstream processor falls behind, NiFi stops the upstream processor from sending more data, preventing memory exhaustion.

Does NiFi support clustering?+

Yes. NiFi supports clustering for horizontal scaling and high availability. ZooKeeper manages cluster coordination, and the flow design is replicated across all nodes. Data is distributed across the cluster for parallel processing.

How many data sources and destinations does NiFi support?+

NiFi ships with 300+ processors covering HDFS, S3, Kafka, databases (JDBC), HTTP APIs, FTP, SFTP, email, Elasticsearch, Solr, and many more. Custom processors can be built in Java.

Is NiFi suitable for real-time streaming?+

NiFi handles both batch and streaming data. It processes FlowFiles as they arrive, making it suitable for near-real-time use cases. For true event streaming with strict ordering, pair NiFi with Apache Kafka.

Sources citées (3)

Apache NiFi— Apache NiFi dataflow management system
NiFi Documentation— NiFi documentation and user guide
Apache NiFi Overview— Flow-based programming paradigm

En lien sur TokRepo

Automation tools Database tools Featured workflows

Fil de discussion

Connectez-vous pour rejoindre la discussion.

Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires

Apache Camel — Enterprise Integration Framework for Java

Apache Camel is an open-source integration framework that implements the Enterprise Integration Patterns. It provides a routing and mediation engine with connectors for over 300 protocols and data formats, enabling developers to integrate systems using a concise Java or YAML DSL.

Skills

Script Depot

Apache SkyWalking — Distributed APM & Observability Platform

Apache-licensed APM platform unifying distributed tracing, metrics, logs, and eBPF profiling for microservices and service meshes.

Skills

Apache Software Foundation

Apache OpenWhisk — Open Source Serverless Cloud Platform

Apache OpenWhisk is a serverless functions platform that lets you deploy event-driven code in any language without managing servers, with support for composable action sequences and rich trigger integrations.

Skills

Script Depot

Apache Answer — Self-Hosted Q&A Platform for Teams

Apache Answer is an open-source Q&A platform built with Go and React. It lets teams run their own Stack Overflow-style knowledge base with voting, tagging, reputation, and plugin support.

Skills

AI Open Source