RDD:

Resilent Distribution Datasets :

can be uploaded through external disk and using parallelize .
  • Immutable
  • Partitioned
  • Fault tolerant
  • Created by coarse grained operations
  • Lazily evaluated
  • Can be persisted
Transformations :
Transformations, which create a new dataset from an existing one.

For example, map is a transformation that passes each dataset element through a function and returns a new RDD representing the results

Eg :
map
flatmap
filter
sample
distinct
Union
Intersection
Groupby key
reducebykey
aggregatebykey
sortbykey
cartesian
pipe



Actions :
Which return a value to the driver program after running a computation on the dataset.

Eg :
 reduce is an action that aggregates all the elements of the RDD using some function and returns the final result to the driver program (although there is also a parallel reduceByKey that returns a distributed dataset).

Reduce
collect
count
first
saveastextfile
countbykey

foreach