Nitin Magdum

Nitin Magdum

Data Engineer

--:--:--

...

thumbnail

Paintings & Museum Analytics — SQL, PySpark & Pandas

SQLPySparkPandasPython

Solved 20+ competitive data modeling challenges using SQL, PySpark, and Pandas. Built JOIN-heavy queries, window functions, and aggregation pipelines on a multi-table museum dataset — reducing query execution time by 35% through indexing and DataFrame caching.

This project tackled 20+ competitive analytics challenges on a Famous Paintings & Museum dataset using three complementary tools to compare their strengths.

Key Techniques

  • INNER / LEFT / FULL OUTER JOINs across 8 tables
  • Window functions: ROW_NUMBER, RANK, DENSE_RANK, LAG/LEAD
  • Aggregations with GROUP BY + HAVING filters
  • String operations: UPPER, TRIM, SUBSTRING, REGEXP
  • Data cleaning: NULL handling, deduplication, type casting

Impact

Reduced average query time by 35% through strategic indexing and PySpark DataFrame caching. Produced a reusable comparison guide adopted by 3 team members.

GitHub