Pyspark is the
python
API to interact with apache spark
Pyspark is completely open source and
free
to use
Pyspark is very similar to the python
pandas
library as both share similar syntax.
The primary data type used in PySpark is the Spark
dataframe
. There is no dataset in pyspark.
Pyspark also has
libraries
for SparkSQL, MLlib, and GraphFrames.
Pyspark is the preferred language for
data
scientists
as most of them are already familiar with python.
Debugging
spark applications written in PySpark is quite difficult compared to an application written in scala.
Advantages of Apache spark