Pyspark is the python API to interact with apache spark

Pyspark is completely open source and free to use

Pyspark is very similar to the python pandas library as both share similar syntax.

The primary data type used in PySpark is the Spark dataframe. There is no dataset in pyspark.

Pyspark also has libraries for SparkSQL, MLlib, and GraphFrames.

Pyspark is the preferred language for data scientists as most of them are already familiar with python.

Debugging spark applications written in PySpark is quite difficult compared to an application written in scala.