StreamingContext(sparkContext[, …])
StreamingContext
Main entry point for Spark Streaming functionality.
DStream(jdstream, ssc, jrdd_deserializer)
DStream
A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous sequence of RDDs (of the same type) representing a continuous stream of data (see RDD in the Spark core documentation for more details on RDDs).
RDD
StreamingContext.addStreamingListener(…)
StreamingContext.addStreamingListener
Add a [[org.apache.spark.streaming.scheduler.StreamingListener]] object for receiving system events related to streaming.
StreamingContext.awaitTermination([timeout])
StreamingContext.awaitTermination
Wait for the execution to stop.
StreamingContext.awaitTerminationOrTimeout(timeout)
StreamingContext.awaitTerminationOrTimeout
StreamingContext.checkpoint(directory)
StreamingContext.checkpoint
Sets the context to periodically checkpoint the DStream operations for master fault-tolerance.
StreamingContext.getActive()
StreamingContext.getActive
Return either the currently active StreamingContext (i.e., if there is a context started but not stopped) or None.
StreamingContext.getActiveOrCreate(…)
StreamingContext.getActiveOrCreate
Either return the active StreamingContext (i.e.
StreamingContext.getOrCreate(checkpointPath, …)
StreamingContext.getOrCreate
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
StreamingContext.remember(duration)
StreamingContext.remember
Set each DStreams in this context to remember RDDs it generated in the last given duration.
StreamingContext.sparkContext
Return SparkContext which is associated with this StreamingContext.
StreamingContext.start()
StreamingContext.start
Start the execution of the streams.
StreamingContext.stop([stopSparkContext, …])
StreamingContext.stop
Stop the execution of the streams, with option of ensuring all received data has been processed.
StreamingContext.transform(dstreams, …)
StreamingContext.transform
Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams.
StreamingContext.union(*dstreams)
StreamingContext.union
Create a unified DStream from multiple DStreams of the same type and same slide duration.
StreamingContext.binaryRecordsStream(…)
StreamingContext.binaryRecordsStream
Create an input stream that monitors a Hadoop-compatible file system for new files and reads them as flat binary files with records of fixed length.
StreamingContext.queueStream(rdds[, …])
StreamingContext.queueStream
Create an input stream from a queue of RDDs or list.
StreamingContext.socketTextStream(hostname, port)
StreamingContext.socketTextStream
Create an input from TCP source hostname:port.
StreamingContext.textFileStream(directory)
StreamingContext.textFileStream
Create an input stream that monitors a Hadoop-compatible file system for new files and reads them as text files.
DStream.pprint([num])
DStream.pprint
Print the first num elements of each RDD generated in this DStream.
DStream.saveAsTextFiles(prefix[, suffix])
DStream.saveAsTextFiles
Save each RDD in this DStream as at text file, using string representation of elements.
DStream.cache()
DStream.cache
Persist the RDDs of this DStream with the default storage level (MEMORY_ONLY).
DStream.checkpoint(interval)
DStream.checkpoint
Enable periodic checkpointing of RDDs of this DStream
DStream.cogroup(other[, numPartitions])
DStream.cogroup
Return a new DStream by applying ‘cogroup’ between RDDs of this DStream and other DStream.
DStream.combineByKey(createCombiner, …[, …])
DStream.combineByKey
Return a new DStream by applying combineByKey to each RDD.
DStream.context()
DStream.context
Return the StreamingContext associated with this DStream
DStream.count()
DStream.count
Return a new DStream in which each RDD has a single element generated by counting each RDD of this DStream.
DStream.countByValue()
DStream.countByValue
Return a new DStream in which each RDD contains the counts of each distinct value in each RDD of this DStream.
DStream.countByValueAndWindow(…[, …])
DStream.countByValueAndWindow
Return a new DStream in which each RDD contains the count of distinct elements in RDDs in a sliding window over this DStream.
DStream.countByWindow(windowDuration, …)
DStream.countByWindow
Return a new DStream in which each RDD has a single element generated by counting the number of elements in a window over this DStream.
DStream.filter(f)
DStream.filter
Return a new DStream containing only the elements that satisfy predicate.
DStream.flatMap(f[, preservesPartitioning])
DStream.flatMap
Return a new DStream by applying a function to all elements of this DStream, and then flattening the results
DStream.flatMapValues(f)
DStream.flatMapValues
Return a new DStream by applying a flatmap function to the value of each key-value pairs in this DStream without changing the key.
DStream.foreachRDD(func)
DStream.foreachRDD
Apply a function to each RDD in this DStream.
DStream.fullOuterJoin(other[, numPartitions])
DStream.fullOuterJoin
Return a new DStream by applying ‘full outer join’ between RDDs of this DStream and other DStream.
DStream.glom()
DStream.glom
Return a new DStream in which RDD is generated by applying glom() to RDD of this DStream.
DStream.groupByKey([numPartitions])
DStream.groupByKey
Return a new DStream by applying groupByKey on each RDD.
DStream.groupByKeyAndWindow(windowDuration, …)
DStream.groupByKeyAndWindow
Return a new DStream by applying groupByKey over a sliding window.
DStream.join(other[, numPartitions])
DStream.join
Return a new DStream by applying ‘join’ between RDDs of this DStream and other DStream.
DStream.leftOuterJoin(other[, numPartitions])
DStream.leftOuterJoin
Return a new DStream by applying ‘left outer join’ between RDDs of this DStream and other DStream.
DStream.map(f[, preservesPartitioning])
DStream.map
Return a new DStream by applying a function to each element of DStream.
DStream.mapPartitions(f[, preservesPartitioning])
DStream.mapPartitions
Return a new DStream in which each RDD is generated by applying mapPartitions() to each RDDs of this DStream.
DStream.mapPartitionsWithIndex(f[, …])
DStream.mapPartitionsWithIndex
Return a new DStream in which each RDD is generated by applying mapPartitionsWithIndex() to each RDDs of this DStream.
DStream.mapValues(f)
DStream.mapValues
Return a new DStream by applying a map function to the value of each key-value pairs in this DStream without changing the key.
DStream.partitionBy(numPartitions[, …])
DStream.partitionBy
Return a copy of the DStream in which each RDD are partitioned using the specified partitioner.
DStream.persist(storageLevel)
DStream.persist
Persist the RDDs of this DStream with the given storage level
DStream.reduce(func)
DStream.reduce
Return a new DStream in which each RDD has a single element generated by reducing each RDD of this DStream.
DStream.reduceByKey(func[, numPartitions])
DStream.reduceByKey
Return a new DStream by applying reduceByKey to each RDD.
DStream.reduceByKeyAndWindow(func, invFunc, …)
DStream.reduceByKeyAndWindow
Return a new DStream by applying incremental reduceByKey over a sliding window.
DStream.reduceByWindow(reduceFunc, …)
DStream.reduceByWindow
Return a new DStream in which each RDD has a single element generated by reducing all elements in a sliding window over this DStream.
DStream.repartition(numPartitions)
DStream.repartition
Return a new DStream with an increased or decreased level of parallelism.
DStream.rightOuterJoin(other[, numPartitions])
DStream.rightOuterJoin
Return a new DStream by applying ‘right outer join’ between RDDs of this DStream and other DStream.
DStream.slice(begin, end)
DStream.slice
Return all the RDDs between ‘begin’ to ‘end’ (both included)
DStream.transform(func)
DStream.transform
Return a new DStream in which each RDD is generated by applying a function on each RDD of this DStream.
DStream.transformWith(func, other[, …])
DStream.transformWith
Return a new DStream in which each RDD is generated by applying a function on each RDD of this DStream and ‘other’ DStream.
DStream.union(other)
DStream.union
Return a new DStream by unifying data of another DStream with this DStream.
DStream.updateStateByKey(updateFunc[, …])
DStream.updateStateByKey
Return a new “state” DStream where the state for each key is updated by applying the given function on the previous state of the key and the new values of the key.
DStream.window(windowDuration[, slideDuration])
DStream.window
Return a new DStream in which each RDD contains all the elements in seen in a sliding window of time over this DStream.
KinesisUtils.createStream(ssc, …[, …])
KinesisUtils.createStream
Create an input stream that pulls messages from a Kinesis stream.
InitialPositionInStream.LATEST
InitialPositionInStream.TRIM_HORIZON