Welcome to Spark Python API Docs!¶
Contents:
- pyspark package
- pyspark.sql module
- pyspark.streaming module
- pyspark.ml package
- ML Pipeline APIs
- pyspark.ml.param module
- pyspark.ml.feature module
- pyspark.ml.classification module
- pyspark.ml.clustering module
- pyspark.ml.linalg module
- pyspark.ml.recommendation module
- pyspark.ml.regression module
- pyspark.ml.stat module
- pyspark.ml.tuning module
- pyspark.ml.evaluation module
- pyspark.ml.fpm module
- pyspark.ml.util module
- pyspark.mllib package
- pyspark.mllib.classification module
- pyspark.mllib.clustering module
- pyspark.mllib.evaluation module
- pyspark.mllib.feature module
- pyspark.mllib.fpm module
- pyspark.mllib.linalg module
- pyspark.mllib.linalg.distributed module
- pyspark.mllib.random module
- pyspark.mllib.recommendation module
- pyspark.mllib.regression module
- pyspark.mllib.stat module
- pyspark.mllib.tree module
- pyspark.mllib.util module
Core classes:¶
Main entry point for Spark functionality.
A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.
pyspark.streaming.StreamingContext
Main entry point for Spark Streaming functionality.
A Discretized Stream (DStream), the basic abstraction in Spark Streaming.
Main entry point for DataFrame and SQL functionality.
A distributed collection of data grouped into named columns.