Nitin Magdum

Nitin Magdum

Data Engineer

--:--:--

...

thumbnail

Real-Time Streaming Pipeline with Kafka & Spark

Apache KafkaSpark StreamingDelta LakePythonDatabricks

Designed and deployed a real-time event streaming pipeline using Apache Kafka for message brokering and Spark Structured Streaming for micro-batch processing. Streams transaction events into Delta Lake with exactly-once semantics and sub-second latency.

A fault-tolerant, low-latency streaming pipeline that processes 10,000+ events/second with exactly-once delivery guarantees.

Stack

  • Producer: Python Kafka producer simulating transaction events at 10K msg/s
  • Broker: 3-node Kafka cluster with replication factor 3 and topic partitioning
  • Consumer: Spark Structured Streaming with Delta Lake sink (checkpointing)
  • Monitoring: Grafana dashboards for consumer lag and throughput metrics
GitHub