pyspark.sql.functions.percentile¶
-
pyspark.sql.functions.
percentile
(col: ColumnOrName, percentage: Union[pyspark.sql.column.Column, float, List[float], Tuple[float]], frequency: Union[pyspark.sql.column.Column, int] = 1) → pyspark.sql.column.Column[source]¶ Returns the exact percentile(s) of numeric column expr at the given percentage(s) with value range in [0.0, 1.0].
New in version 3.5.0.
- Parameters
- Returns
Column
the exact percentile of the numeric column.
Examples
>>> key = (col("id") % 3).alias("key") >>> value = (randn(42) + key * 10).alias("value") >>> df = spark.range(0, 1000, 1, 1).select(key, value) >>> df.select( ... percentile("value", [0.25, 0.5, 0.75], lit(1)).alias("quantiles") ... ).show() +--------------------+ | quantiles| +--------------------+ |[0.74419914941216...| +--------------------+
>>> df.groupBy("key").agg( ... percentile("value", 0.5, lit(1)).alias("median") ... ).show() +---+--------------------+ |key| median| +---+--------------------+ | 0|-0.03449962216667901| | 1| 9.990389751837329| | 2| 19.967859769284075| +---+--------------------+