pyspark.sql.functions.percentile

pyspark.sql.functions.percentile(col: ColumnOrName, percentage: Union[pyspark.sql.column.Column, float, List[float], Tuple[float]], frequency: Union[pyspark.sql.column.Column, int] = 1) → pyspark.sql.column.Column[source]

Returns the exact percentile(s) of numeric column expr at the given percentage(s) with value range in [0.0, 1.0].

New in version 3.5.0.

Parameters
colColumn or str input column.
percentageColumn, float, list of floats or tuple of floats

percentage in decimal (must be between 0.0 and 1.0).

frequencyColumn or int is a positive numeric literal which

controls frequency.

Returns
Column

the exact percentile of the numeric column.

Examples

>>> key = (col("id") % 3).alias("key")
>>> value = (randn(42) + key * 10).alias("value")
>>> df = spark.range(0, 1000, 1, 1).select(key, value)
>>> df.select(
...     percentile("value", [0.25, 0.5, 0.75], lit(1)).alias("quantiles")
... ).show()
+--------------------+
|           quantiles|
+--------------------+
|[0.74419914941216...|
+--------------------+
>>> df.groupBy("key").agg(
...     percentile("value", 0.5, lit(1)).alias("median")
... ).show()
+---+--------------------+
|key|              median|
+---+--------------------+
|  0|-0.03449962216667901|
|  1|   9.990389751837329|
|  2|  19.967859769284075|
+---+--------------------+