Esta página se muestra en inglés. Una traducción al español está en curso.
SkillsApr 16, 2026·3 min de lectura

Apache NiFi — Visual Dataflow Automation & Integration Platform

Apache NiFi is a powerful dataflow management system that lets you design, control, and monitor data pipelines through a drag-and-drop web interface. Built for enterprise data routing, transformation, and system mediation with provenance tracking and guaranteed delivery.

Listo para agents

Instalación con revisión previa

Este activo requiere revisión. El prompt copiado pide dry-run, muestra escrituras y continúa solo tras confirmación.

Needs Confirmation · 64/100Política: confirmar
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Community
Entrada
Apache NiFi Overview
Comando con revisión previa
npx -y tokrepo@latest install 45f70684-39ec-11f1-9bc6-00163e2b0d79 --target codex

Primero dry-run, confirma las escrituras y luego ejecuta este comando.

TL;DR
Apache NiFi provides a visual drag-and-drop interface for designing enterprise data pipelines.
§01

What it is

Apache NiFi was originally developed by the NSA and donated to the Apache Foundation. It automates the movement of data between disparate systems with a visual flow-based programming interface. NiFi excels at complex enterprise integration scenarios where data provenance, backpressure, and guaranteed delivery are non-negotiable.

NiFi provides a web-based drag-and-drop interface for designing dataflow pipelines, routes and transforms data between hundreds of source and destination systems, and tracks full data provenance from origin to destination.

§02

How it saves time or tokens

NiFi eliminates the need to write custom ETL code for data integration tasks. Instead of coding data pipelines in Python or Java, you drag processors onto a canvas, connect them, and configure routing rules through the UI. Changes to pipelines take effect immediately without restarting. Backpressure handling means downstream slowdowns are managed automatically. Data provenance tracking provides a complete audit trail for compliance.

§03

How to use

  1. Download and start NiFi:
wget https://downloads.apache.org/nifi/2.1.0/nifi-2.1.0-bin.zip
unzip nifi-2.1.0-bin.zip && cd nifi-2.1.0
./bin/nifi.sh start
# Access UI at https://localhost:8443/nifi
# Default credentials in logs/nifi-app.log
  1. Create your first dataflow by dragging processors onto the canvas.
  1. Common pipeline pattern:
GetFile -> SplitText -> EvaluateJsonPath -> PutDatabaseRecord
# Reads files, splits into records, extracts fields, writes to database
§04

Example

A NiFi pipeline configuration in XML for fetching and transforming API data:

<!-- NiFi flow snippet: API to Database -->
<processors>
  <processor>
    <name>Fetch API Data</name>
    <type>InvokeHTTP</type>
    <config>
      <property name="HTTP Method">GET</property>
      <property name="Remote URL">https://api.example.com/data</property>
      <property name="Schedule">5 min</property>
    </config>
  </processor>
  <processor>
    <name>Transform JSON</name>
    <type>JoltTransformJSON</type>
  </processor>
  <processor>
    <name>Write to PostgreSQL</name>
    <type>PutDatabaseRecord</type>
  </processor>
</processors>
§05

Related on TokRepo

§06

Common pitfalls

  • Running NiFi with default heap settings causes OutOfMemory errors under load. Set java.arg.Xms and java.arg.Xmx in bootstrap.conf based on your data volume.
  • Not configuring backpressure thresholds on connections leads to memory exhaustion. Set object and size thresholds on every connection.
  • NiFi's default single-user authentication is not suitable for production. Configure LDAP, OpenID Connect, or client certificate authentication before deploying.

Preguntas frecuentes

What is data provenance in NiFi?+

NiFi tracks every event that happens to every piece of data (FlowFile): creation, modification, routing, cloning, and delivery. You can trace any byte from its origin to its final destination, which is critical for compliance and debugging.

How does NiFi handle backpressure?+

Each connection between processors has configurable thresholds for object count and data size. When a downstream processor falls behind, NiFi stops the upstream processor from sending more data, preventing memory exhaustion.

Does NiFi support clustering?+

Yes. NiFi supports clustering for horizontal scaling and high availability. ZooKeeper manages cluster coordination, and the flow design is replicated across all nodes. Data is distributed across the cluster for parallel processing.

How many data sources and destinations does NiFi support?+

NiFi ships with 300+ processors covering HDFS, S3, Kafka, databases (JDBC), HTTP APIs, FTP, SFTP, email, Elasticsearch, Solr, and many more. Custom processors can be built in Java.

Is NiFi suitable for real-time streaming?+

NiFi handles both batch and streaming data. It processes FlowFiles as they arrive, making it suitable for near-real-time use cases. For true event streaming with strict ordering, pair NiFi with Apache Kafka.

Referencias (3)

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados