ML Churn Prediction API with FastAPI & PySpark
PySparkFastAPIscikit-learnPythonDocker
Built an end-to-end ML pipeline for customer churn prediction: PySpark for feature engineering on 1M+ rows, scikit-learn for model training (XGBoost, 89% accuracy), and a FastAPI REST API serving real-time predictions with sub-100ms response time.
An ML system that predicts customer churn probability in real-time, enabling proactive retention campaigns.
Pipeline
- ›Feature engineering on 1M+ customer records with PySpark (20+ features)
- ›Model: XGBoost classifier — 89% accuracy, 0.91 AUC-ROC
- ›FastAPI endpoint: POST /predict returns churn probability in < 100ms
- ›Dockerised with Docker Compose for consistent dev/prod parity
- ›Retrained weekly via Databricks Workflow cron trigger