pyspark.sql.DataFrame.unpivot¶
-
DataFrame.
unpivot
(ids: Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, …]], values: Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, …], None], variableColumnName: str, valueColumnName: str) → DataFrame[source]¶ Unpivot a DataFrame from wide format to long format, optionally leaving identifier columns set. This is the reverse to groupBy(…).pivot(…).agg(…), except for the aggregation, which cannot be reversed.
This function is useful to massage a DataFrame into a format where some columns are identifier columns (“ids”), while all other columns (“values”) are “unpivoted” to the rows, leaving just two non-id columns, named as given by variableColumnName and valueColumnName.
When no “id” columns are given, the unpivoted DataFrame consists of only the “variable” and “value” columns.
The values columns must not be empty so at least one value must be given to be unpivoted. When values is None, all non-id columns will be unpivoted.
All “value” columns must share a least common data type. Unless they are the same data type, all “value” columns are cast to the nearest common data type. For instance, types IntegerType and LongType are cast to LongType, while IntegerType and StringType do not have a common data type and unpivot fails.
New in version 3.4.0.
- Parameters
- idsstr, Column, tuple, list
Column(s) to use as identifiers. Can be a single column or column name, or a list or tuple for multiple columns.
- valuesstr, Column, tuple, list, optional
Column(s) to unpivot. Can be a single column or column name, or a list or tuple for multiple columns. If specified, must not be empty. If not specified, uses all columns that are not set as ids.
- variableColumnNamestr
Name of the variable column.
- valueColumnNamestr
Name of the value column.
- Returns
DataFrame
Unpivoted DataFrame.
See also
Notes
Supports Spark Connect.
Examples
>>> df = spark.createDataFrame( ... [(1, 11, 1.1), (2, 12, 1.2)], ... ["id", "int", "double"], ... ) >>> df.show() +---+---+------+ | id|int|double| +---+---+------+ | 1| 11| 1.1| | 2| 12| 1.2| +---+---+------+
>>> df.unpivot("id", ["int", "double"], "var", "val").show() +---+------+----+ | id| var| val| +---+------+----+ | 1| int|11.0| | 1|double| 1.1| | 2| int|12.0| | 2|double| 1.2| +---+------+----+