Nitin Magdum

Data Engineer

Navigation

Socials

--:--:--

...

thumbnail

Azure End-to-End Data Pipeline — IPL Analytics

Azure Data FactoryDatabricksPySparkDelta LakeAzure Synapse

Built a production-grade Azure Medallion Architecture pipeline ingesting IPL cricket data from GitHub via Azure Data Factory into Blob Storage, transforming it through Bronze → Silver → Gold layers in Databricks using PySpark and Delta Lake, then surfacing insights in SQL Analytics.

An end-to-end cloud data pipeline built on Azure, structured around the Medallion Architecture for reliable, high-quality analytics.

Architecture

›Ingestion: ADF pipeline extracts IPL JSON from GitHub REST API → Azure Blob Storage (Bronze)
›Transform: Databricks PySpark jobs clean, validate, and enrich data (Silver layer)
›Serve: Delta Lake Gold tables exposed via Azure Synapse SQL pools
›Governance: Delta Lake time-travel for auditing, schema evolution support

Outcomes

Processing time reduced by 40% vs manual ETL. Pipeline runs daily on a schedule with ADF triggers and email alerting on failure.