SparkContext([master, appName, sparkHome, …])
SparkContext
Main entry point for Spark functionality.
RDD(jrdd, ctx[, jrdd_deserializer])
RDD
A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.
Broadcast([sc, value, pickle_registry, …])
Broadcast
A broadcast variable created with SparkContext.broadcast().
SparkContext.broadcast()
Accumulator(aid, value, accum_param)
Accumulator
A shared variable that can be accumulated, i.e., has a commutative and associative “add” operation.
AccumulatorParam
Helper object that defines how to accumulate values of a given type.
SparkConf([loadDefaults, _jvm, _jconf])
SparkConf
Configuration for a Spark application.
SparkFiles()
SparkFiles
Resolves paths to files added through SparkContext.addFile().
SparkContext.addFile()
StorageLevel(useDisk, useMemory, useOffHeap, …)
StorageLevel
Flags for controlling the storage of an RDD.
TaskContext
Contextual information about a task which can be read or mutated during execution.
RDDBarrier(rdd)
RDDBarrier
Wraps an RDD in a barrier stage, which forces Spark to launch tasks of this stage together.
BarrierTaskContext
A TaskContext with extra contextual info and tooling for tasks in a barrier stage.
BarrierTaskInfo(address)
BarrierTaskInfo
Carries all task infos of a barrier task.
InheritableThread(target, *args, **kwargs)
InheritableThread
Thread that is recommended to be used in PySpark instead of threading.Thread when the pinned thread mode is enabled.
threading.Thread
util.VersionUtils
Provides utility method to determine Spark versions with given input string.
SparkContext.PACKAGE_EXTENSIONS
SparkContext.accumulator(value[, accum_param])
SparkContext.accumulator
Create an Accumulator with the given initial value, using a given AccumulatorParam helper object to define how to add values of the data type if provided.
SparkContext.addArchive(path)
SparkContext.addArchive
Add an archive to be downloaded with this Spark job on every node.
SparkContext.addFile(path[, recursive])
SparkContext.addFile
Add a file to be downloaded with this Spark job on every node.
SparkContext.addPyFile(path)
SparkContext.addPyFile
Add a .py or .zip dependency for all tasks to be executed on this SparkContext in the future.
SparkContext.applicationId
A unique identifier for the Spark application.
SparkContext.binaryFiles(path[, minPartitions])
SparkContext.binaryFiles
Read a directory of binary files from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI as a byte array.
SparkContext.binaryRecords(path, recordLength)
SparkContext.binaryRecords
Load data from a flat binary file, assuming each record is a set of numbers with the specified numerical format (see ByteBuffer), and the number of bytes per record is constant.
SparkContext.broadcast(value)
SparkContext.broadcast
Broadcast a read-only variable to the cluster, returning a Broadcast object for reading it in distributed functions.
SparkContext.cancelAllJobs()
SparkContext.cancelAllJobs
Cancel all jobs that have been scheduled or are running.
SparkContext.cancelJobGroup(groupId)
SparkContext.cancelJobGroup
Cancel active jobs for the specified group.
SparkContext.defaultMinPartitions
Default min number of partitions for Hadoop RDDs when not given by user
SparkContext.defaultParallelism
Default level of parallelism to use when not given by user (e.g.
SparkContext.dump_profiles(path)
SparkContext.dump_profiles
Dump the profile stats into directory path
SparkContext.emptyRDD()
SparkContext.emptyRDD
Create an RDD that has no partitions or elements.
SparkContext.getCheckpointDir()
SparkContext.getCheckpointDir
Return the directory where RDDs are checkpointed.
SparkContext.getConf()
SparkContext.getConf
SparkContext.getLocalProperty(key)
SparkContext.getLocalProperty
Get a local property set in this thread, or null if it is missing.
SparkContext.getOrCreate([conf])
SparkContext.getOrCreate
Get or instantiate a SparkContext and register it as a singleton object.
SparkContext.hadoopFile(path, …[, …])
SparkContext.hadoopFile
Read an ‘old’ Hadoop InputFormat with arbitrary key and value class from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI.
SparkContext.hadoopRDD(inputFormatClass, …)
SparkContext.hadoopRDD
Read an ‘old’ Hadoop InputFormat with arbitrary key and value class, from an arbitrary Hadoop configuration, which is passed in as a Python dict.
SparkContext.newAPIHadoopFile(path, …[, …])
SparkContext.newAPIHadoopFile
Read a ‘new API’ Hadoop InputFormat with arbitrary key and value class from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI.
SparkContext.newAPIHadoopRDD(…[, …])
SparkContext.newAPIHadoopRDD
Read a ‘new API’ Hadoop InputFormat with arbitrary key and value class, from an arbitrary Hadoop configuration, which is passed in as a Python dict.
SparkContext.parallelize(c[, numSlices])
SparkContext.parallelize
Distribute a local Python collection to form an RDD.
SparkContext.pickleFile(name[, minPartitions])
SparkContext.pickleFile
Load an RDD previously saved using RDD.saveAsPickleFile() method.
RDD.saveAsPickleFile()
SparkContext.range(start[, end, step, numSlices])
SparkContext.range
Create a new RDD of int containing elements from start to end (exclusive), increased by step every element.
SparkContext.resources
SparkContext.runJob(rdd, partitionFunc[, …])
SparkContext.runJob
Executes the given partitionFunc on the specified set of partitions, returning the result as an array of elements.
SparkContext.sequenceFile(path[, keyClass, …])
SparkContext.sequenceFile
Read a Hadoop SequenceFile with arbitrary key and value Writable class from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI.
SparkContext.setCheckpointDir(dirName)
SparkContext.setCheckpointDir
Set the directory under which RDDs are going to be checkpointed.
SparkContext.setJobDescription(value)
SparkContext.setJobDescription
Set a human readable description of the current job.
SparkContext.setJobGroup(groupId, description)
SparkContext.setJobGroup
Assigns a group ID to all the jobs started by this thread until the group ID is set to a different value or cleared.
SparkContext.setLocalProperty(key, value)
SparkContext.setLocalProperty
Set a local property that affects jobs submitted from this thread, such as the Spark fair scheduler pool.
SparkContext.setLogLevel(logLevel)
SparkContext.setLogLevel
Control our logLevel.
SparkContext.setSystemProperty(key, value)
SparkContext.setSystemProperty
Set a Java system property, such as spark.executor.memory.
SparkContext.show_profiles()
SparkContext.show_profiles
Print the profile stats to stdout
SparkContext.sparkUser()
SparkContext.sparkUser
Get SPARK_USER for user who is running SparkContext.
SparkContext.startTime
Return the epoch time when the Spark Context was started.
SparkContext.statusTracker()
SparkContext.statusTracker
Return StatusTracker object
StatusTracker
SparkContext.stop()
SparkContext.stop
Shut down the SparkContext.
SparkContext.textFile(name[, minPartitions, …])
SparkContext.textFile
Read a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI, and return it as an RDD of Strings.
SparkContext.uiWebUrl
Return the URL of the SparkUI instance started by this SparkContext
SparkContext.union(rdds)
SparkContext.union
Build the union of a list of RDDs.
SparkContext.version
The version of Spark on which this application is running.
SparkContext.wholeTextFiles(path[, …])
SparkContext.wholeTextFiles
Read a directory of text files from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI.
RDD.aggregate(zeroValue, seqOp, combOp)
RDD.aggregate
Aggregate the elements of each partition, and then the results for all the partitions, using a given combine functions and a neutral “zero value.”
RDD.aggregateByKey(zeroValue, seqFunc, combFunc)
RDD.aggregateByKey
Aggregate the values of each key, using given combine functions and a neutral “zero value”.
RDD.barrier()
RDD.barrier
Marks the current stage as a barrier stage, where Spark must launch all tasks together.
RDD.cache()
RDD.cache
Persist this RDD with the default storage level (MEMORY_ONLY).
RDD.cartesian(other)
RDD.cartesian
Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of elements (a, b) where a is in self and b is in other.
(a, b)
a
b
RDD.checkpoint()
RDD.checkpoint
Mark this RDD for checkpointing.
RDD.cleanShuffleDependencies([blocking])
RDD.cleanShuffleDependencies
Removes an RDD’s shuffles and it’s non-persisted ancestors.
RDD.coalesce(numPartitions[, shuffle])
RDD.coalesce
Return a new RDD that is reduced into numPartitions partitions.
RDD.cogroup(other[, numPartitions])
RDD.cogroup
For each key k in self or other, return a resulting RDD that contains a tuple with the list of values for that key in self as well as other.
RDD.collect()
RDD.collect
Return a list that contains all of the elements in this RDD.
RDD.collectAsMap()
RDD.collectAsMap
Return the key-value pairs in this RDD to the master as a dictionary.
RDD.collectWithJobGroup(groupId, description)
RDD.collectWithJobGroup
When collect rdd, use this method to specify job group.
RDD.combineByKey(createCombiner, mergeValue, …)
RDD.combineByKey
Generic function to combine the elements for each key using a custom set of aggregation functions.
RDD.context
The SparkContext that this RDD was created on.
RDD.count()
RDD.count
Return the number of elements in this RDD.
RDD.countApprox(timeout[, confidence])
RDD.countApprox
Approximate version of count() that returns a potentially incomplete result within a timeout, even if not all tasks have finished.
RDD.countApproxDistinct([relativeSD])
RDD.countApproxDistinct
Return approximate number of distinct elements in the RDD.
RDD.countByKey()
RDD.countByKey
Count the number of elements for each key, and return the result to the master as a dictionary.
RDD.countByValue()
RDD.countByValue
Return the count of each unique value in this RDD as a dictionary of (value, count) pairs.
RDD.distinct([numPartitions])
RDD.distinct
Return a new RDD containing the distinct elements in this RDD.
RDD.filter(f)
RDD.filter
Return a new RDD containing only the elements that satisfy a predicate.
RDD.first()
RDD.first
Return the first element in this RDD.
RDD.flatMap(f[, preservesPartitioning])
RDD.flatMap
Return a new RDD by first applying a function to all elements of this RDD, and then flattening the results.
RDD.flatMapValues(f)
RDD.flatMapValues
Pass each value in the key-value pair RDD through a flatMap function without changing the keys; this also retains the original RDD’s partitioning.
RDD.fold(zeroValue, op)
RDD.fold
Aggregate the elements of each partition, and then the results for all the partitions, using a given associative function and a neutral “zero value.”
RDD.foldByKey(zeroValue, func[, …])
RDD.foldByKey
Merge the values for each key using an associative function “func” and a neutral “zeroValue” which may be added to the result an arbitrary number of times, and must not change the result (e.g., 0 for addition, or 1 for multiplication.).
RDD.foreach(f)
RDD.foreach
Applies a function to all elements of this RDD.
RDD.foreachPartition(f)
RDD.foreachPartition
Applies a function to each partition of this RDD.
RDD.fullOuterJoin(other[, numPartitions])
RDD.fullOuterJoin
Perform a right outer join of self and other.
RDD.getCheckpointFile()
RDD.getCheckpointFile
Gets the name of the file to which this RDD was checkpointed
RDD.getNumPartitions()
RDD.getNumPartitions
Returns the number of partitions in RDD
RDD.getResourceProfile()
RDD.getResourceProfile
Get the pyspark.resource.ResourceProfile specified with this RDD or None if it wasn’t specified.
pyspark.resource.ResourceProfile
RDD.getStorageLevel()
RDD.getStorageLevel
Get the RDD’s current storage level.
RDD.glom()
RDD.glom
Return an RDD created by coalescing all elements within each partition into a list.
RDD.groupBy(f[, numPartitions, partitionFunc])
RDD.groupBy
Return an RDD of grouped items.
RDD.groupByKey([numPartitions, partitionFunc])
RDD.groupByKey
Group the values for each key in the RDD into a single sequence.
RDD.groupWith(other, *others)
RDD.groupWith
Alias for cogroup but with support for multiple RDDs.
RDD.histogram(buckets)
RDD.histogram
Compute a histogram using the provided buckets.
RDD.id()
RDD.id
A unique ID for this RDD (within its SparkContext).
RDD.intersection(other)
RDD.intersection
Return the intersection of this RDD and another one.
RDD.isCheckpointed()
RDD.isCheckpointed
Return whether this RDD is checkpointed and materialized, either reliably or locally.
RDD.isEmpty()
RDD.isEmpty
Returns true if and only if the RDD contains no elements at all.
RDD.isLocallyCheckpointed()
RDD.isLocallyCheckpointed
Return whether this RDD is marked for local checkpointing.
RDD.join(other[, numPartitions])
RDD.join
Return an RDD containing all pairs of elements with matching keys in self and other.
RDD.keyBy(f)
RDD.keyBy
Creates tuples of the elements in this RDD by applying f.
RDD.keys()
RDD.keys
Return an RDD with the keys of each tuple.
RDD.leftOuterJoin(other[, numPartitions])
RDD.leftOuterJoin
Perform a left outer join of self and other.
RDD.localCheckpoint()
RDD.localCheckpoint
Mark this RDD for local checkpointing using Spark’s existing caching layer.
RDD.lookup(key)
RDD.lookup
Return the list of values in the RDD for key key.
RDD.map(f[, preservesPartitioning])
RDD.map
Return a new RDD by applying a function to each element of this RDD.
RDD.mapPartitions(f[, preservesPartitioning])
RDD.mapPartitions
Return a new RDD by applying a function to each partition of this RDD.
RDD.mapPartitionsWithIndex(f[, …])
RDD.mapPartitionsWithIndex
Return a new RDD by applying a function to each partition of this RDD, while tracking the index of the original partition.
RDD.mapPartitionsWithSplit(f[, …])
RDD.mapPartitionsWithSplit
RDD.mapValues(f)
RDD.mapValues
Pass each value in the key-value pair RDD through a map function without changing the keys; this also retains the original RDD’s partitioning.
RDD.max([key])
RDD.max
Find the maximum item in this RDD.
RDD.mean()
RDD.mean
Compute the mean of this RDD’s elements.
RDD.meanApprox(timeout[, confidence])
RDD.meanApprox
Approximate operation to return the mean within a timeout or meet the confidence.
RDD.min([key])
RDD.min
Find the minimum item in this RDD.
RDD.name()
RDD.name
Return the name of this RDD.
RDD.partitionBy(numPartitions[, partitionFunc])
RDD.partitionBy
Return a copy of the RDD partitioned using the specified partitioner.
RDD.persist([storageLevel])
RDD.persist
Set this RDD’s storage level to persist its values across operations after the first time it is computed.
RDD.pipe(command[, env, checkCode])
RDD.pipe
Return an RDD created by piping elements to a forked external process.
RDD.randomSplit(weights[, seed])
RDD.randomSplit
Randomly splits this RDD with the provided weights.
RDD.reduce(f)
RDD.reduce
Reduces the elements of this RDD using the specified commutative and associative binary operator.
RDD.reduceByKey(func[, numPartitions, …])
RDD.reduceByKey
Merge the values for each key using an associative and commutative reduce function.
RDD.reduceByKeyLocally(func)
RDD.reduceByKeyLocally
Merge the values for each key using an associative and commutative reduce function, but return the results immediately to the master as a dictionary.
RDD.repartition(numPartitions)
RDD.repartition
Return a new RDD that has exactly numPartitions partitions.
RDD.repartitionAndSortWithinPartitions([…])
RDD.repartitionAndSortWithinPartitions
Repartition the RDD according to the given partitioner and, within each resulting partition, sort records by their keys.
RDD.rightOuterJoin(other[, numPartitions])
RDD.rightOuterJoin
RDD.sample(withReplacement, fraction[, seed])
RDD.sample
Return a sampled subset of this RDD.
RDD.sampleByKey(withReplacement, fractions)
RDD.sampleByKey
Return a subset of this RDD sampled by key (via stratified sampling).
RDD.sampleStdev()
RDD.sampleStdev
Compute the sample standard deviation of this RDD’s elements (which corrects for bias in estimating the standard deviation by dividing by N-1 instead of N).
RDD.sampleVariance()
RDD.sampleVariance
Compute the sample variance of this RDD’s elements (which corrects for bias in estimating the variance by dividing by N-1 instead of N).
RDD.saveAsHadoopDataset(conf[, …])
RDD.saveAsHadoopDataset
Output a Python RDD of key-value pairs (of form RDD[(K, V)]) to any Hadoop file system, using the old Hadoop OutputFormat API (mapred package).
RDD[(K, V)]
RDD.saveAsHadoopFile(path, outputFormatClass)
RDD.saveAsHadoopFile
RDD.saveAsNewAPIHadoopDataset(conf[, …])
RDD.saveAsNewAPIHadoopDataset
Output a Python RDD of key-value pairs (of form RDD[(K, V)]) to any Hadoop file system, using the new Hadoop OutputFormat API (mapreduce package).
RDD.saveAsNewAPIHadoopFile(path, …[, …])
RDD.saveAsNewAPIHadoopFile
RDD.saveAsPickleFile(path[, batchSize])
RDD.saveAsPickleFile
Save this RDD as a SequenceFile of serialized objects.
RDD.saveAsSequenceFile(path[, …])
RDD.saveAsSequenceFile
Output a Python RDD of key-value pairs (of form RDD[(K, V)]) to any Hadoop file system, using the “org.apache.hadoop.io.Writable” types that we convert from the RDD’s key and value types.
RDD.saveAsTextFile(path[, compressionCodecClass])
RDD.saveAsTextFile
Save this RDD as a text file, using string representations of elements.
RDD.setName(name)
RDD.setName
Assign a name to this RDD.
RDD.sortBy(keyfunc[, ascending, numPartitions])
RDD.sortBy
Sorts this RDD by the given keyfunc
RDD.sortByKey([ascending, numPartitions, …])
RDD.sortByKey
Sorts this RDD, which is assumed to consist of (key, value) pairs.
RDD.stats()
RDD.stats
Return a StatCounter object that captures the mean, variance and count of the RDD’s elements in one operation.
StatCounter
RDD.stdev()
RDD.stdev
Compute the standard deviation of this RDD’s elements.
RDD.subtract(other[, numPartitions])
RDD.subtract
Return each value in self that is not contained in other.
RDD.subtractByKey(other[, numPartitions])
RDD.subtractByKey
Return each (key, value) pair in self that has no pair with matching key in other.
RDD.sum()
RDD.sum
Add up the elements in this RDD.
RDD.sumApprox(timeout[, confidence])
RDD.sumApprox
Approximate operation to return the sum within a timeout or meet the confidence.
RDD.take(num)
RDD.take
Take the first num elements of the RDD.
RDD.takeOrdered(num[, key])
RDD.takeOrdered
Get the N elements from an RDD ordered in ascending order or as specified by the optional key function.
RDD.takeSample(withReplacement, num[, seed])
RDD.takeSample
Return a fixed-size sampled subset of this RDD.
RDD.toDebugString()
RDD.toDebugString
A description of this RDD and its recursive dependencies for debugging.
RDD.toLocalIterator([prefetchPartitions])
RDD.toLocalIterator
Return an iterator that contains all of the elements in this RDD.
RDD.top(num[, key])
RDD.top
Get the top N elements from an RDD.
RDD.treeAggregate(zeroValue, seqOp, combOp)
RDD.treeAggregate
Aggregates the elements of this RDD in a multi-level tree pattern.
RDD.treeReduce(f[, depth])
RDD.treeReduce
Reduces the elements of this RDD in a multi-level tree pattern.
RDD.union(other)
RDD.union
Return the union of this RDD and another one.
RDD.unpersist([blocking])
RDD.unpersist
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
RDD.values()
RDD.values
Return an RDD with the values of each tuple.
RDD.variance()
RDD.variance
Compute the variance of this RDD’s elements.
RDD.withResources(profile)
RDD.withResources
Specify a pyspark.resource.ResourceProfile to use when calculating this RDD.
RDD.zip(other)
RDD.zip
Zips this RDD with another one, returning key-value pairs with the first element in each RDD second element in each RDD, etc.
RDD.zipWithIndex()
RDD.zipWithIndex
Zips this RDD with its element indices.
RDD.zipWithUniqueId()
RDD.zipWithUniqueId
Zips this RDD with generated unique Long ids.
Broadcast.destroy([blocking])
Broadcast.destroy
Destroy all data and metadata related to this broadcast variable.
Broadcast.dump(value, f)
Broadcast.dump
Broadcast.load(file)
Broadcast.load
Broadcast.load_from_path(path)
Broadcast.load_from_path
Broadcast.unpersist([blocking])
Broadcast.unpersist
Delete cached copies of this broadcast on the executors.
Broadcast.value
Return the broadcasted value
Accumulator.add(term)
Accumulator.add
Adds a term to this accumulator’s value
Accumulator.value
Get the accumulator’s value; only usable in driver program
AccumulatorParam.addInPlace(value1, value2)
AccumulatorParam.addInPlace
Add two values of the accumulator’s data type, returning a new value; for efficiency, can also update value1 in place and return it.
AccumulatorParam.zero(value)
AccumulatorParam.zero
Provide a “zero value” for the type, compatible in dimensions with the provided value (e.g., a zero vector)
inheritable_thread_target(f)
inheritable_thread_target
Return thread target wrapper which is recommended to be used in PySpark when the pinned thread mode is enabled.
SparkConf.contains(key)
SparkConf.contains
Does this configuration contain a given key?
SparkConf.get(key[, defaultValue])
SparkConf.get
Get the configured value for some key, or return a default otherwise.
SparkConf.getAll()
SparkConf.getAll
Get all values as a list of key-value pairs.
SparkConf.set(key, value)
SparkConf.set
Set a configuration property.
SparkConf.setAll(pairs)
SparkConf.setAll
Set multiple parameters, passed as a list of key-value pairs.
SparkConf.setAppName(value)
SparkConf.setAppName
Set application name.
SparkConf.setExecutorEnv([key, value, pairs])
SparkConf.setExecutorEnv
Set an environment variable to be passed to executors.
SparkConf.setIfMissing(key, value)
SparkConf.setIfMissing
Set a configuration property, if not already set.
SparkConf.setMaster(value)
SparkConf.setMaster
Set master URL to connect to.
SparkConf.setSparkHome(value)
SparkConf.setSparkHome
Set path where Spark is installed on worker nodes.
SparkConf.toDebugString()
SparkConf.toDebugString
Returns a printable version of the configuration, as a list of key=value pairs, one per line.
SparkFiles.get(filename)
SparkFiles.get
Get the absolute path of a file added through SparkContext.addFile().
SparkFiles.getRootDirectory()
SparkFiles.getRootDirectory
Get the root directory that contains files added through SparkContext.addFile().
StorageLevel.DISK_ONLY
StorageLevel.DISK_ONLY_2
StorageLevel.DISK_ONLY_3
StorageLevel.MEMORY_AND_DISK
StorageLevel.MEMORY_AND_DISK_2
StorageLevel.MEMORY_ONLY
StorageLevel.MEMORY_ONLY_2
StorageLevel.OFF_HEAP
TaskContext.attemptNumber()
TaskContext.attemptNumber
”
TaskContext.get()
TaskContext.get
Return the currently active TaskContext.
TaskContext.getLocalProperty(key)
TaskContext.getLocalProperty
Get a local property set upstream in the driver, or None if it is missing.
TaskContext.partitionId()
TaskContext.partitionId
The ID of the RDD partition that is computed by this task.
TaskContext.resources()
TaskContext.resources
Resources allocated to the task.
TaskContext.stageId()
TaskContext.stageId
The ID of the stage that this task belong to.
TaskContext.taskAttemptId()
TaskContext.taskAttemptId
An ID that is unique to this task attempt (within the same SparkContext, no two task attempts will share the same attempt ID).
RDDBarrier.mapPartitions(f[, …])
RDDBarrier.mapPartitions
Returns a new RDD by applying a function to each partition of the wrapped RDD, where tasks are launched together in a barrier stage.
RDDBarrier.mapPartitionsWithIndex(f[, …])
RDDBarrier.mapPartitionsWithIndex
Returns a new RDD by applying a function to each partition of the wrapped RDD, while tracking the index of the original partition.
BarrierTaskContext.allGather([message])
BarrierTaskContext.allGather
This function blocks until all tasks in the same stage have reached this routine.
BarrierTaskContext.attemptNumber()
BarrierTaskContext.attemptNumber
BarrierTaskContext.barrier()
BarrierTaskContext.barrier
Sets a global barrier and waits until all tasks in this stage hit this barrier.
BarrierTaskContext.get()
BarrierTaskContext.get
Return the currently active BarrierTaskContext.
BarrierTaskContext.getLocalProperty(key)
BarrierTaskContext.getLocalProperty
BarrierTaskContext.getTaskInfos()
BarrierTaskContext.getTaskInfos
Returns BarrierTaskInfo for all tasks in this barrier stage, ordered by partition ID.
BarrierTaskContext.partitionId()
BarrierTaskContext.partitionId
BarrierTaskContext.resources()
BarrierTaskContext.resources
BarrierTaskContext.stageId()
BarrierTaskContext.stageId
BarrierTaskContext.taskAttemptId()
BarrierTaskContext.taskAttemptId
util.VersionUtils.majorMinorVersion(sparkVersion)
util.VersionUtils.majorMinorVersion
Given a Spark version string, return the (major version number, minor version number).