RDD:
Resilent Distribution Datasets :
can be uploaded through external disk and using parallelize .
Transformations, which create a new dataset from an existing one.
For example,
Eg :
map
flatmap
filter
sample
distinct
Union
Intersection
Groupby key
reducebykey
aggregatebykey
sortbykey
cartesian
pipe
Actions :
Which return a value to the driver program after running a computation on the dataset.
Eg :
Reduce
collect
count
first
saveastextfile
countbykey
foreach
Resilent Distribution Datasets :
can be uploaded through external disk and using parallelize .
- Immutable
- Partitioned
- Fault tolerant
- Created by coarse grained operations
- Lazily evaluated
- Can be persisted
Transformations, which create a new dataset from an existing one.
For example,
map
is a transformation that passes each dataset element through a function and returns a new RDD representing the resultsEg :
map
flatmap
filter
sample
distinct
Union
Intersection
Groupby key
reducebykey
aggregatebykey
sortbykey
cartesian
pipe
Actions :
Which return a value to the driver program after running a computation on the dataset.
Eg :
reduce
is an action that aggregates all the elements of the RDD using some function and returns the final result to the driver program (although there is also a parallel reduceByKey
that returns a distributed dataset).Reduce
collect
count
first
saveastextfile
countbykey
foreach
0 Comments