@backstreetbrogrammer
--------------------------------------------------------------------------------
Chapter 16 - Spark RDD - Broadcast variables
--------------------------------------------------------------------------------
Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks.
Spark attempts to distribute broadcast variables using efficient broadcast algorithms to reduce communication cost.
Spark actions are executed through a set of stages, separated by distributed "shuffle" operations.
Spark automatically broadcasts the common data needed by tasks within each stage.
The data broadcast this way is cached in serialized form and deserialized before running each task.
This means that explicitly creating broadcast variables is only useful when tasks across multiple stages need the same data or when caching the data in deserialized form is important.
Broadcast variables are created from a variable m by calling SparkContext.broadcast(m). The broadcast variable is a wrapper around m, and its value can be accessed by calling the value() method.
// m = new int[] {1, 2, 3, 4, 5}
Broadcast|int[]| broadcastVar = sc.broadcast(new int[] {1, 2, 3, 4, 5});
broadcastVar.value(); // returns [1, 2, 3, 4, 5]
To release the resources that the broadcast variable copied onto executors, call unpersist(). If the broadcast is used again afterward, it will be re-broadcast.
To permanently release all resources used by the broadcast variable, call destroy(). The broadcast variable can't be used after that.
Note that these methods do not block by default. To block until resources are freed, specify blocking=true when calling them.
Github: github.com/backstreetbrogramm...
- Apache Spark for Java Developers Playlist: • Apache Spark for Java ...
- Upgrade to Java 21 Playlist: • Upgrade to Java 21
- Top Java Coding Interview Problems Playlist: • Top Java Coding Interv...
- Java Serialization Playlist: • Java Serialization
- Dynamic Programming Playlist: • Dynamic Programming
#java #javadevelopers #javaprogramming #apachespark #spark
Негізгі бет Ғылым және технология 79 - Spark RDD - Broadcast variables - Theory
Пікірлер: 1