Real-Time Streaming Pipeline with Kafka & Spark
Apache KafkaSpark StreamingDelta LakePythonDatabricks
Designed and deployed a real-time event streaming pipeline using Apache Kafka for message brokering and Spark Structured Streaming for micro-batch processing. Streams transaction events into Delta Lake with exactly-once semantics and sub-second latency.
A fault-tolerant, low-latency streaming pipeline that processes 10,000+ events/second with exactly-once delivery guarantees.
Stack
- ›Producer: Python Kafka producer simulating transaction events at 10K msg/s
- ›Broker: 3-node Kafka cluster with replication factor 3 and topic partitioning
- ›Consumer: Spark Structured Streaming with Delta Lake sink (checkpointing)
- ›Monitoring: Grafana dashboards for consumer lag and throughput metrics