Nitin Magdum

Nitin Magdum

Data Engineer

--:--:--

...

thumbnail

ML Churn Prediction API with FastAPI & PySpark

PySparkFastAPIscikit-learnPythonDocker

Built an end-to-end ML pipeline for customer churn prediction: PySpark for feature engineering on 1M+ rows, scikit-learn for model training (XGBoost, 89% accuracy), and a FastAPI REST API serving real-time predictions with sub-100ms response time.

An ML system that predicts customer churn probability in real-time, enabling proactive retention campaigns.

Pipeline

  • Feature engineering on 1M+ customer records with PySpark (20+ features)
  • Model: XGBoost classifier — 89% accuracy, 0.91 AUC-ROC
  • FastAPI endpoint: POST /predict returns churn probability in < 100ms
  • Dockerised with Docker Compose for consistent dev/prod parity
  • Retrained weekly via Databricks Workflow cron trigger
GitHub