@backstreetbrogrammer
--------------------------------------------------------------------------------
Chapter 03 - Apache Spark Web UI - Stages, Storage and Environment tabs
--------------------------------------------------------------------------------
Spark Web UI can be viewed in browser using default port of 4040:
localhost:4040/
- Stages
We can navigate into Stages tab in two ways:
. Select the Description of the respective Spark job
. On the top of Spark Web UI: select Stages tab
The Stages tab displays a summary page that shows the current state of all stages of all Spark jobs in the spark application.
The number of Tasks we could see in each stage is the number of partitions that Spark is going to work on and each task inside a stage is the same work that will be done by spark but on a different partition of data.
DAG Visualization: Displays Directed Acyclic Graph (DAG) of this stage, where vertices represent the RDDs or DataFrame and edges represent an operation to be applied.
ParallelCollectionRDD is created when we create a RDD with a collection object.
- Storage
The Storage tab displays the persisted RDDs and DataFrames, if any, in the application. The Summary page shows the storage levels, sizes and partitions of all RDDs, and the Details page shows the sizes and using executors for all partitions in an RDD or DataFrame.
- Environment
The Environment tab displays the values for the different environment and configuration variables, including JVM, Spark, and system properties. It is a useful place to check whether our Spark application properties have been set correctly.
This Environment tab has 6 parts:
. Runtime Information: simply contains the runtime properties like versions of Java and Scala
. Spark Properties: lists the application properties like spark.app.name and spark.driver.extraJavaOptions
. Resource Profiles: gives details about Executor and Tasks cpu and memory usages
. Hadoop Properties: displays very detailed properties relative to Hadoop, HDFS and YARN
. System Properties: shows more details about the JVM
. Classpath Entries: lists the classes loaded from different sources, which is very useful to resolve class conflicts.
Github: github.com/backstreetbrogramm...
- Apache Spark for Java Developers Playlist: • Apache Spark for Java ...
- Java Serialization Playlist: • Java Serialization
- Dynamic Programming Playlist: • Dynamic Programming
#java #javadevelopers #javaprogramming #apachespark #spark
Негізгі бет Ғылым және технология 21 - Spark Web UI - Stages, Storage and Environment tabs
Пікірлер: 6