Azure End-to-End Data Pipeline — IPL Analytics
Azure Data FactoryDatabricksPySparkDelta LakeAzure Synapse
Built a production-grade Azure Medallion Architecture pipeline ingesting IPL cricket data from GitHub via Azure Data Factory into Blob Storage, transforming it through Bronze → Silver → Gold layers in Databricks using PySpark and Delta Lake, then surfacing insights in SQL Analytics.
An end-to-end cloud data pipeline built on Azure, structured around the Medallion Architecture for reliable, high-quality analytics.
Architecture
- ›Ingestion: ADF pipeline extracts IPL JSON from GitHub REST API → Azure Blob Storage (Bronze)
- ›Transform: Databricks PySpark jobs clean, validate, and enrich data (Silver layer)
- ›Serve: Delta Lake Gold tables exposed via Azure Synapse SQL pools
- ›Governance: Delta Lake time-travel for auditing, schema evolution support
Outcomes
Processing time reduced by 40% vs manual ETL. Pipeline runs daily on a schedule with ADF triggers and email alerting on failure.