The underlying JVM object is a SchemaRDD, not a PythonRDD, so we can
utilize the relational query api exposed by SparkSQL.
This class receives raw tuples from Java but assigns a class to it in
all its data-collection methods (mapPartitionsWithIndex, collect, take,
etc) so that PySpark sees them as Row objects with named fields.
|
__init__(self,
jschema_rdd,
sql_ctx)
x.__init__(...) initializes x; see help(type(x)) for signature |
source code
|
|
|
|
|
|
|
|
|
insertInto(self,
tableName,
overwrite=False)
Inserts the contents of this SchemaRDD into the specified table. |
source code
|
|
|
saveAsTable(self,
tableName)
Creates a new table with the contents of this SchemaRDD. |
source code
|
|
|
|
|
schemaString(self)
Returns the output schema in the tree format. |
source code
|
|
|
printSchema(self)
Prints out the schema in the tree format. |
source code
|
|
|
|
|
|
|
mapPartitionsWithIndex(self,
f,
preservesPartitioning=False)
Return a new RDD by applying a function to each partition of this
RDD, while tracking the index of the original partition. |
source code
|
|
|
cache(self)
Persist this RDD with the default storage level
(MEMORY_ONLY_SER ). |
source code
|
|
|
persist(self,
storageLevel)
Set this RDD's storage level to persist its values across operations
after the first time it is computed. |
source code
|
|
|
unpersist(self,
blocking=True)
Mark the RDD as non-persistent, and remove all blocks for it from
memory and disk. |
source code
|
|
|
|
|
|
|
|
|
coalesce(self,
numPartitions,
shuffle=False)
Return a new RDD that is reduced into `numPartitions` partitions. |
source code
|
|
|
|
|
|
|
|
|
subtract(self,
other,
numPartitions=None)
Return each value in self that is not contained in
other . |
source code
|
|
Inherited from rdd.RDD :
__add__ ,
__repr__ ,
aggregate ,
aggregateByKey ,
cartesian ,
cogroup ,
collectAsMap ,
combineByKey ,
context ,
countByKey ,
countByValue ,
filter ,
first ,
flatMap ,
flatMapValues ,
fold ,
foldByKey ,
foreach ,
foreachPartition ,
getNumPartitions ,
getStorageLevel ,
glom ,
groupBy ,
groupByKey ,
groupWith ,
histogram ,
id ,
join ,
keyBy ,
keys ,
leftOuterJoin ,
map ,
mapPartitions ,
mapPartitionsWithSplit ,
mapValues ,
max ,
mean ,
min ,
name ,
partitionBy ,
pipe ,
reduce ,
reduceByKey ,
reduceByKeyLocally ,
rightOuterJoin ,
sample ,
sampleByKey ,
sampleStdev ,
sampleVariance ,
saveAsHadoopDataset ,
saveAsHadoopFile ,
saveAsNewAPIHadoopDataset ,
saveAsNewAPIHadoopFile ,
saveAsPickleFile ,
saveAsSequenceFile ,
saveAsTextFile ,
setName ,
sortBy ,
sortByKey ,
stats ,
stdev ,
subtractByKey ,
sum ,
take ,
takeOrdered ,
takeSample ,
toDebugString ,
top ,
union ,
values ,
variance ,
zip ,
zipWithIndex ,
zipWithUniqueId
Inherited from object :
__delattr__ ,
__format__ ,
__getattribute__ ,
__hash__ ,
__new__ ,
__reduce__ ,
__reduce_ex__ ,
__setattr__ ,
__sizeof__ ,
__str__ ,
__subclasshook__
|