Getting Started
User Guide
API Reference
Development
Migration Guide
Spark SQL
Pandas API on Spark
Input/Output
General functions
Series
DataFrame
Index objects
Window
GroupBy
Machine Learning utilities
Extensions
Structured Streaming
MLlib (DataFrame-based)
Spark Streaming
MLlib (RDD-based)
Spark Core
Resource Management
Pandas API on Spark
ΒΆ
This page gives an overview of all public pandas API on Spark.
Input/Output
Data Generator
Spark Metastore Table
Delta Lake
Parquet
ORC
Generic Spark I/O
Flat File / CSV
Clipboard
Excel
JSON
HTML
SQL
General functions
Working with options
Data manipulations and SQL
Top-level missing data
Top-level dealing with numeric data
Top-level dealing with datetimelike data
Series
Constructor
Attributes
Conversion
Indexing, iteration
Binary operator functions
Function application, GroupBy & Window
Computations / Descriptive Stats
Reindexing / Selection / Label manipulation
Missing data handling
Reshaping, sorting, transposing
Combining / joining / merging
Time series-related
Spark-related
Accessors
Date Time Handling
String Handling
Categorical accessor
Plotting
Serialization / IO / Conversion
Pandas-on-Spark specific
DataFrame
Constructor
Attributes and underlying data
Conversion
Indexing, iteration
Binary operator functions
Function application, GroupBy & Window
Computations / Descriptive Stats
Reindexing / Selection / Label manipulation
Missing data handling
Reshaping, sorting, transposing
Combining / joining / merging
Time series-related
Serialization / IO / Conversion
Spark-related
Plotting
Pandas-on-Spark specific
Index objects
Index
Spark-related
Numeric Index
CategoricalIndex
MultiIndex
MultiIndex Spark-related
DatatimeIndex
TimedeltaIndex
Window
Standard moving window functions
Standard expanding window functions
GroupBy
Indexing, iteration
Function application
Computations / Descriptive Stats
Machine Learning utilities
MLflow
Extensions
Accessors
pyspark.sql.avro.functions.to_avro
Input/Output