Getting Started
User Guide
API Reference
Development
Migration Guide
Spark SQL
Core Classes
Spark Session
Configuration
Input/Output
DataFrame
Column
Data Types
Row
Functions
Window
Grouping
Catalog
Observation
Avro
Pandas API on Spark
Structured Streaming
MLlib (DataFrame-based)
Spark Streaming
MLlib (RDD-based)
Spark Core
Resource Management
Spark SQL
ΒΆ
This page gives an overview of all public Spark SQL API.
Core Classes
pyspark.sql.SparkSession
pyspark.sql.Catalog
pyspark.sql.DataFrame
pyspark.sql.Column
pyspark.sql.Observation
pyspark.sql.Row
pyspark.sql.GroupedData
pyspark.sql.PandasCogroupedOps
pyspark.sql.DataFrameNaFunctions
pyspark.sql.DataFrameStatFunctions
pyspark.sql.Window
pyspark.sql.DataFrameReader
pyspark.sql.DataFrameWriter
Spark Session
pyspark.sql.SparkSession.builder.appName
pyspark.sql.SparkSession.builder.config
pyspark.sql.SparkSession.builder.enableHiveSupport
pyspark.sql.SparkSession.builder.getOrCreate
pyspark.sql.SparkSession.builder.master
pyspark.sql.SparkSession.catalog
pyspark.sql.SparkSession.conf
pyspark.sql.SparkSession.createDataFrame
pyspark.sql.SparkSession.getActiveSession
pyspark.sql.SparkSession.newSession
pyspark.sql.SparkSession.range
pyspark.sql.SparkSession.read
pyspark.sql.SparkSession.readStream
pyspark.sql.SparkSession.sparkContext
pyspark.sql.SparkSession.sql
pyspark.sql.SparkSession.stop
pyspark.sql.SparkSession.streams
pyspark.sql.SparkSession.table
pyspark.sql.SparkSession.udf
pyspark.sql.SparkSession.version
Configuration
pyspark.sql.conf.RuntimeConfig
Input/Output
pyspark.sql.DataFrameReader.csv
pyspark.sql.DataFrameReader.format
pyspark.sql.DataFrameReader.jdbc
pyspark.sql.DataFrameReader.json
pyspark.sql.DataFrameReader.load
pyspark.sql.DataFrameReader.option
pyspark.sql.DataFrameReader.options
pyspark.sql.DataFrameReader.orc
pyspark.sql.DataFrameReader.parquet
pyspark.sql.DataFrameReader.schema
pyspark.sql.DataFrameReader.table
pyspark.sql.DataFrameReader.text
pyspark.sql.DataFrameWriter.bucketBy
pyspark.sql.DataFrameWriter.csv
pyspark.sql.DataFrameWriter.format
pyspark.sql.DataFrameWriter.insertInto
pyspark.sql.DataFrameWriter.jdbc
pyspark.sql.DataFrameWriter.json
pyspark.sql.DataFrameWriter.mode
pyspark.sql.DataFrameWriter.option
pyspark.sql.DataFrameWriter.options
pyspark.sql.DataFrameWriter.orc
pyspark.sql.DataFrameWriter.parquet
pyspark.sql.DataFrameWriter.partitionBy
pyspark.sql.DataFrameWriter.save
pyspark.sql.DataFrameWriter.saveAsTable
pyspark.sql.DataFrameWriter.sortBy
pyspark.sql.DataFrameWriter.text
DataFrame
pyspark.sql.DataFrame.agg
pyspark.sql.DataFrame.alias
pyspark.sql.DataFrame.approxQuantile
pyspark.sql.DataFrame.cache
pyspark.sql.DataFrame.checkpoint
pyspark.sql.DataFrame.coalesce
pyspark.sql.DataFrame.colRegex
pyspark.sql.DataFrame.collect
pyspark.sql.DataFrame.columns
pyspark.sql.DataFrame.corr
pyspark.sql.DataFrame.count
pyspark.sql.DataFrame.cov
pyspark.sql.DataFrame.createGlobalTempView
pyspark.sql.DataFrame.createOrReplaceGlobalTempView
pyspark.sql.DataFrame.createOrReplaceTempView
pyspark.sql.DataFrame.createTempView
pyspark.sql.DataFrame.crossJoin
pyspark.sql.DataFrame.crosstab
pyspark.sql.DataFrame.cube
pyspark.sql.DataFrame.describe
pyspark.sql.DataFrame.distinct
pyspark.sql.DataFrame.drop
pyspark.sql.DataFrame.dropDuplicates
pyspark.sql.DataFrame.drop_duplicates
pyspark.sql.DataFrame.dropna
pyspark.sql.DataFrame.dtypes
pyspark.sql.DataFrame.exceptAll
pyspark.sql.DataFrame.explain
pyspark.sql.DataFrame.fillna
pyspark.sql.DataFrame.filter
pyspark.sql.DataFrame.first
pyspark.sql.DataFrame.foreach
pyspark.sql.DataFrame.foreachPartition
pyspark.sql.DataFrame.freqItems
pyspark.sql.DataFrame.groupBy
pyspark.sql.DataFrame.head
pyspark.sql.DataFrame.hint
pyspark.sql.DataFrame.inputFiles
pyspark.sql.DataFrame.intersect
pyspark.sql.DataFrame.intersectAll
pyspark.sql.DataFrame.isEmpty
pyspark.sql.DataFrame.isLocal
pyspark.sql.DataFrame.isStreaming
pyspark.sql.DataFrame.join
pyspark.sql.DataFrame.limit
pyspark.sql.DataFrame.localCheckpoint
pyspark.sql.DataFrame.mapInPandas
pyspark.sql.DataFrame.mapInArrow
pyspark.sql.DataFrame.na
pyspark.sql.DataFrame.observe
pyspark.sql.DataFrame.orderBy
pyspark.sql.DataFrame.persist
pyspark.sql.DataFrame.printSchema
pyspark.sql.DataFrame.randomSplit
pyspark.sql.DataFrame.rdd
pyspark.sql.DataFrame.registerTempTable
pyspark.sql.DataFrame.repartition
pyspark.sql.DataFrame.repartitionByRange
pyspark.sql.DataFrame.replace
pyspark.sql.DataFrame.rollup
pyspark.sql.DataFrame.sameSemantics
pyspark.sql.DataFrame.sample
pyspark.sql.DataFrame.sampleBy
pyspark.sql.DataFrame.schema
pyspark.sql.DataFrame.select
pyspark.sql.DataFrame.selectExpr
pyspark.sql.DataFrame.semanticHash
pyspark.sql.DataFrame.show
pyspark.sql.DataFrame.sort
pyspark.sql.DataFrame.sortWithinPartitions
pyspark.sql.DataFrame.sparkSession
pyspark.sql.DataFrame.stat
pyspark.sql.DataFrame.storageLevel
pyspark.sql.DataFrame.subtract
pyspark.sql.DataFrame.summary
pyspark.sql.DataFrame.tail
pyspark.sql.DataFrame.take
pyspark.sql.DataFrame.toDF
pyspark.sql.DataFrame.toJSON
pyspark.sql.DataFrame.toLocalIterator
pyspark.sql.DataFrame.toPandas
pyspark.sql.DataFrame.to_pandas_on_spark
pyspark.sql.DataFrame.transform
pyspark.sql.DataFrame.union
pyspark.sql.DataFrame.unionAll
pyspark.sql.DataFrame.unionByName
pyspark.sql.DataFrame.unpersist
pyspark.sql.DataFrame.where
pyspark.sql.DataFrame.withColumn
pyspark.sql.DataFrame.withColumns
pyspark.sql.DataFrame.withColumnRenamed
pyspark.sql.DataFrame.withMetadata
pyspark.sql.DataFrame.withWatermark
pyspark.sql.DataFrame.write
pyspark.sql.DataFrame.writeStream
pyspark.sql.DataFrame.writeTo
pyspark.sql.DataFrame.pandas_api
pyspark.sql.DataFrameNaFunctions.drop
pyspark.sql.DataFrameNaFunctions.fill
pyspark.sql.DataFrameNaFunctions.replace
pyspark.sql.DataFrameStatFunctions.approxQuantile
pyspark.sql.DataFrameStatFunctions.corr
pyspark.sql.DataFrameStatFunctions.cov
pyspark.sql.DataFrameStatFunctions.crosstab
pyspark.sql.DataFrameStatFunctions.freqItems
pyspark.sql.DataFrameStatFunctions.sampleBy
Column
pyspark.sql.Column.alias
pyspark.sql.Column.asc
pyspark.sql.Column.asc_nulls_first
pyspark.sql.Column.asc_nulls_last
pyspark.sql.Column.astype
pyspark.sql.Column.between
pyspark.sql.Column.bitwiseAND
pyspark.sql.Column.bitwiseOR
pyspark.sql.Column.bitwiseXOR
pyspark.sql.Column.cast
pyspark.sql.Column.contains
pyspark.sql.Column.desc
pyspark.sql.Column.desc_nulls_first
pyspark.sql.Column.desc_nulls_last
pyspark.sql.Column.dropFields
pyspark.sql.Column.endswith
pyspark.sql.Column.eqNullSafe
pyspark.sql.Column.getField
pyspark.sql.Column.getItem
pyspark.sql.Column.ilike
pyspark.sql.Column.isNotNull
pyspark.sql.Column.isNull
pyspark.sql.Column.isin
pyspark.sql.Column.like
pyspark.sql.Column.name
pyspark.sql.Column.otherwise
pyspark.sql.Column.over
pyspark.sql.Column.rlike
pyspark.sql.Column.startswith
pyspark.sql.Column.substr
pyspark.sql.Column.when
pyspark.sql.Column.withField
Data Types
ArrayType
BinaryType
BooleanType
ByteType
DataType
DateType
DecimalType
DoubleType
FloatType
IntegerType
LongType
MapType
NullType
ShortType
StringType
StructField
StructType
TimestampType
DayTimeIntervalType
Row
pyspark.sql.Row.asDict
Functions
Normal Functions
Math Functions
Datetime Functions
Collection Functions
Partition Transformation Functions
Aggregate Functions
Window Functions
Sort Functions
String Functions
UDF
Misc Functions
Window
pyspark.sql.Window.currentRow
pyspark.sql.Window.orderBy
pyspark.sql.Window.partitionBy
pyspark.sql.Window.rangeBetween
pyspark.sql.Window.rowsBetween
pyspark.sql.Window.unboundedFollowing
pyspark.sql.Window.unboundedPreceding
pyspark.sql.WindowSpec.orderBy
pyspark.sql.WindowSpec.partitionBy
pyspark.sql.WindowSpec.rangeBetween
pyspark.sql.WindowSpec.rowsBetween
Grouping
pyspark.sql.GroupedData.agg
pyspark.sql.GroupedData.apply
pyspark.sql.GroupedData.applyInPandas
pyspark.sql.GroupedData.avg
pyspark.sql.GroupedData.cogroup
pyspark.sql.GroupedData.count
pyspark.sql.GroupedData.max
pyspark.sql.GroupedData.mean
pyspark.sql.GroupedData.min
pyspark.sql.GroupedData.pivot
pyspark.sql.GroupedData.sum
pyspark.sql.PandasCogroupedOps.applyInPandas
Catalog
pyspark.sql.Catalog.cacheTable
pyspark.sql.Catalog.clearCache
pyspark.sql.Catalog.createExternalTable
pyspark.sql.Catalog.createTable
pyspark.sql.Catalog.currentDatabase
pyspark.sql.Catalog.databaseExists
pyspark.sql.Catalog.dropGlobalTempView
pyspark.sql.Catalog.dropTempView
pyspark.sql.Catalog.functionExists
pyspark.sql.Catalog.isCached
pyspark.sql.Catalog.listColumns
pyspark.sql.Catalog.listDatabases
pyspark.sql.Catalog.listFunctions
pyspark.sql.Catalog.listTables
pyspark.sql.Catalog.recoverPartitions
pyspark.sql.Catalog.refreshByPath
pyspark.sql.Catalog.refreshTable
pyspark.sql.Catalog.registerFunction
pyspark.sql.Catalog.setCurrentDatabase
pyspark.sql.Catalog.tableExists
pyspark.sql.Catalog.uncacheTable
Observation
pyspark.sql.Observation.get
Avro
pyspark.sql.avro.functions.from_avro
pyspark.sql.avro.functions.to_avro
API Reference
Core Classes