Course Outline

Fundamentals of NiFi and Data Flow

  • Data in motion vs data at rest: concepts and challenges
  • NiFi architecture: cores, flow controller, provenance, and bulletin
  • Key components: processors, connections, controllers, and provenance

Big Data Context and Integration

  • Role of NiFi in Big Data ecosystems (Hadoop, Kafka, cloud storage)
  • Overview of HDFS, MapReduce, and modern alternatives
  • Use cases: stream ingestion, log shipping, event pipelines

Installation, Configuration & Cluster Setup

  • Installing NiFi on single node and cluster mode
  • Cluster configuration: node roles, zookeeper, and load balancing
  • Orchestrating NiFi deployments: using Ansible, Docker, or Helm

Designing and Managing Dataflows

  • Routing, filtering, splitting, merging flows
  • Processor configuration (InvokeHTTP, QueryRecord, PutDatabaseRecord, etc.)
  • Handling schema, enrichment, and transformation operations
  • Error handling, retry relationships, and backpressure

Integration Scenarios

  • Connecting to databases, messaging systems, REST APIs
  • Streaming to analytics systems: Kafka, Elasticsearch, or cloud storage
  • Integrating with Splunk, Prometheus, or logging pipelines

Monitoring, Recovery & Provenance

  • Using NiFi UI, metrics, and provenance visualizer
  • Designing autonomous recovery and graceful failure handling
  • Backup, flow versioning, and change management

Performance Tuning & Optimization

  • Tuning JVM, heap, thread pools, and clustering parameters
  • Optimizing flow design to reduce bottlenecks
  • Resource isolation, flow prioritization, and throughput control

Best Practices & Governance

  • Flow documentation, naming standards, modular design
  • Security: TLS, authentication, access control, data encryption
  • Change control, versioning, role-based access, audit trails

Troubleshooting & Incident Response

  • Common issues: deadlocks, memory leaks, processor errors
  • Log analysis, error diagnostics and root cause investigation
  • Recovery strategies and flow rollback

Hands-on Lab: Realistic Data Pipeline Implementation

  • Building an end-to-end flow: ingestion, transformation, delivery
  • Implement error handling, backpressure, and scaling
  • Performance test and tune the pipeline

Summary and Next Steps

Requirements

  • Experience with Linux command line
  • Basic understanding of networking and data systems
  • Exposure to data streaming or ETL concepts

Audience

  • System administrators
  • Data engineers
  • Developers
  • DevOps professionals
 21 Hours

Delivery Options

Private Group Training

Our identity is rooted in delivering exactly what our clients need.

  • Pre-course call with your trainer
  • Customisation of the learning experience to achieve your goals -
    • Bespoke outlines
    • Practical hands-on exercises containing data / scenarios recognisable to the learners
  • Training scheduled on a date of your choice
  • Delivered online, onsite/classroom or hybrid by experts sharing real world experience

Private Group Prices RRP from €6840 online delivery, based on a group of 2 delegates, €2160 per additional delegate (excludes any certification / exam costs). We recommend a maximum group size of 12 for most learning events.

Contact us for an exact quote and to hear our latest promotions


Public Training

Please see our public courses

Testimonials (7)

Provisional Upcoming Courses (Contact Us For More Information)

Related Categories