pyspark.sql.functions.array_union

pyspark.sql.functions.array_union(col1: ColumnOrName, col2: ColumnOrName) → pyspark.sql.column.Column[source]

Collection function: returns an array of the elements in the union of col1 and col2, without duplicates.

New in version 2.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
col1Column or str

name of column containing array

col2Column or str

name of column containing array

Returns
Column

an array of values in union of two arrays.

Examples

>>> from pyspark.sql import Row
>>> df = spark.createDataFrame([Row(c1=["b", "a", "c"], c2=["c", "d", "a", "f"])])
>>> df.select(array_union(df.c1, df.c2)).collect()
[Row(array_union(c1, c2)=['b', 'a', 'c', 'd', 'f'])]