A comprehensive side-by-side guide covering SELECT, filter, aggregation, string ops, joins, window functions, and data cleaning across SQL, PySpark, and Pandas. Used by data engineering teams as an onboarding reference.

A practical, engineer-first guide comparing three data processing paradigms — ideal for teams choosing the right tool per workload type.

Topics Covered

›SELECT & projection: SQL vs df.select() vs df[col]
›Filtering: WHERE/HAVING vs .filter() vs boolean indexing
›String ops: UPPER/TRIM vs pyspark.sql.functions vs .str accessor
›Sorting: ORDER BY vs .orderBy() vs .sort_values()
›Window functions: OVER(PARTITION BY) vs Window spec vs .rolling()

GitHub