Shared Variables in Spark
Broadcast Variable :
A data item which changes time to time is a Variable . Broadcast is nothing but sending the data to all recipients or Machines.On a whole broadcast variable is a data item that changes time to time in all machines simultaneously .Broadcast variable creates the set of data available in each machine so as to reduce I/O costs during the retrieval of data .
Broadcast variable is ery useful in a case where the data is needed by multiple tasks at different stages .
Eg :
val broadcastVar = sc.broadcast(Array(Indians are cultural , People ))
The values "Indians are , cultural , People" is made available in all machines
Accumulator Variable :
A data item which changes time to time is a Variable . Accumulator is a data item which changes which will accumulate ( increases the value time to time ) .On a whole Accumulator Variable is a variable in spark which generally accumulates the values and can be used in spark .
Example :
>>val accum = sc.longAccumulator("Count of all elements")
>>sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum.add(x))
>>accum.value
res2: Long = 10
0 Comments