Spark Monitoring with Graphite & Telegraf

ConsoleSink: Logs metrics information to the console.
CSVSink: Exports metrics data to CSV files at regular intervals.
JmxSink: Registers metrics for viewing in a JMX console.
MetricsServlet: Adds a servlet within the existing Spark UI to serve metrics data as JSON data.
GraphiteSink: Sends metrics to a Graphite node.
Slf4jSink: Sends metrics to slf4j as log entries.
StatsdSink: Sends metrics to a StatsD node.
#spark.metrics.conf# Enable JvmSource for instance master, worker, driver and executormaster.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
*.sink.graphite.host={{ master_ip_address }}
*.sink.graphite.port=2003
*.sink.graphite.period=10
*.sink.graphite.unit=seconds
*.sink.graphite.prefix=my.spark.metrics.computation
[[inputs.socket_listener]]
## Address and port to host HTTP listener on
service_address = "tcp://:2003"

data_format = "graphite"

## This string will be used to join the matched values.
separator = "_"

templates = [
"my.*.*.*.* .prefix.prefix.prefix.app_name.node.measurement*",
"my.*.*.*.*.driver.* .prefix.prefix.prefix.app_name.node.measurement*",
"my.*.*.*.*.driver.*.*.* .prefix.prefix.prefix.app_name.node.measurement.measurement.type_instance",
"my.*.*.*.*.driver.*.*.*.* .prefix.prefix.prefix.app_name.node.measurement.measurement.measurement.type_instance",
"my.*.*.*.*.driver.*.StreamingMetrics.streaming.* .prefix.prefix.prefix.app_name.node..measurement.measurement.type_instance",
"my.*.*.*.*.*.*.*.* .prefix.prefix.prefix.app_name.node.measurement.measurement.type_instance",
"my.*.*.*.*.*.*.*.*.* .prefix.prefix.prefix.app_name.node.measurement.measurement.measurement.type_instance"
]
----
BlockManager_disk
BlockManager_memory
CodeGenerator_compilationTime
CodeGenerator_compilationTime_count
CodeGenerator_compilationTime_max
CodeGenerator_compilationTime_mean
CodeGenerator_compilationTime_min
CodeGenerator_compilationTime_p50
CodeGenerator_compilationTime_p75
CodeGenerator_compilationTime_p95
CodeGenerator_compilationTime_p98
CodeGenerator_compilationTime_p99
CodeGenerator_compilationTime_p999
CodeGenerator_compilationTime_stddev
CodeGenerator_generatedClassSize
CodeGenerator_generatedClassSize_count
CodeGenerator_generatedClassSize_max
CodeGenerator_generatedClassSize_mean
CodeGenerator_generatedClassSize_min
CodeGenerator_generatedClassSize_p50
CodeGenerator_generatedClassSize_p75
CodeGenerator_generatedClassSize_p95
CodeGenerator_generatedClassSize_p98
CodeGenerator_generatedClassSize_p99
CodeGenerator_generatedClassSize_p999
CodeGenerator_generatedClassSize_stddev
CodeGenerator_generatedMethodSize
CodeGenerator_generatedMethodSize_count
CodeGenerator_generatedMethodSize_max
CodeGenerator_generatedMethodSize_mean
CodeGenerator_generatedMethodSize_min
CodeGenerator_generatedMethodSize_p50
CodeGenerator_generatedMethodSize_p75
CodeGenerator_generatedMethodSize_p95
CodeGenerator_generatedMethodSize_p98
CodeGenerator_generatedMethodSize_p99
CodeGenerator_generatedMethodSize_p999
CodeGenerator_generatedMethodSize_stddev
CodeGenerator_sourceCodeSize
CodeGenerator_sourceCodeSize_count
CodeGenerator_sourceCodeSize_max
CodeGenerator_sourceCodeSize_mean
CodeGenerator_sourceCodeSize_min
CodeGenerator_sourceCodeSize_p50
CodeGenerator_sourceCodeSize_p75
CodeGenerator_sourceCodeSize_p95
CodeGenerator_sourceCodeSize_p98
CodeGenerator_sourceCodeSize_p99
CodeGenerator_sourceCodeSize_p999
CodeGenerator_sourceCodeSize_stddev
DAGScheduler_job
DAGScheduler_messageProcessingTime
DAGScheduler_stage
ExecutorAllocationManager_executors
ExternalShuffle_shuffle-client
ExternalShuffle_shuffle-client_usedDirectMemory
ExternalShuffle_shuffle-client_usedHeapMemory
HiveExternalCatalog_fileCacheHits
HiveExternalCatalog_fileCacheHits_count
HiveExternalCatalog_filesDiscovered
HiveExternalCatalog_filesDiscovered_count
HiveExternalCatalog_hiveClientCalls
HiveExternalCatalog_hiveClientCalls_count
HiveExternalCatalog_parallelListingJobCount
HiveExternalCatalog_parallelListingJobCount_count
HiveExternalCatalog_partitionsFetched
HiveExternalCatalog_partitionsFetched_count
LiveListenerBus_listenerProcessingTime_org
LiveListenerBus_numEventsPosted
LiveListenerBus_queue_appStatus
LiveListenerBus_queue_eventLog
LiveListenerBus_queue_executorManagement
count
cpu
disk
diskio
executor_bytesRead
executor_bytesRead_count
executor_bytesWritten
executor_bytesWritten_count
executor_cpuTime
executor_cpuTime_count
executor_deserializeCpuTime
executor_deserializeCpuTime_count
executor_deserializeTime
executor_deserializeTime_count
executor_diskBytesSpilled
executor_diskBytesSpilled_count
executor_filesystem_file
executor_filesystem_file_largeRead_ops
executor_filesystem_file_read_bytes
executor_filesystem_file_read_ops
executor_filesystem_file_write_bytes
executor_filesystem_file_write_ops
executor_filesystem_hdfs
executor_filesystem_hdfs_largeRead_ops
executor_filesystem_hdfs_read_bytes
executor_filesystem_hdfs_read_ops
executor_filesystem_hdfs_write_bytes
executor_filesystem_hdfs_write_ops
executor_jvmCpuTime
executor_jvmGCTime
executor_jvmGCTime_count
executor_memoryBytesSpilled
executor_memoryBytesSpilled_count
executor_recordsRead
executor_recordsRead_count
executor_recordsWritten
executor_recordsWritten_count
executor_resultSerializationTime
executor_resultSerializationTime_count
executor_resultSize
executor_resultSize_count
executor_runTime
executor_runTime_count
executor_shuffleBytesWritten
executor_shuffleBytesWritten_count
executor_shuffleFetchWaitTime
executor_shuffleFetchWaitTime_count
executor_shuffleLocalBlocksFetched
executor_shuffleLocalBlocksFetched_count
executor_shuffleLocalBytesRead
executor_shuffleLocalBytesRead_count
executor_shuffleRecordsRead
executor_shuffleRecordsRead_count
executor_shuffleRecordsWritten
executor_shuffleRecordsWritten_count
executor_shuffleRemoteBlocksFetched
executor_shuffleRemoteBlocksFetched_count
executor_shuffleRemoteBytesRead
executor_shuffleRemoteBytesReadToDisk
executor_shuffleRemoteBytesReadToDisk_count
executor_shuffleRemoteBytesRead_count
executor_shuffleTotalBytesRead
executor_shuffleTotalBytesRead_count
executor_shuffleWriteTime
executor_shuffleWriteTime_count
executor_threadpool
executor_threadpool_activeTasks
executor_threadpool_completeTasks
executor_threadpool_currentPool_size
executor_threadpool_maxPool_size
jvm_ConcurrentMarkSweep
jvm_ConcurrentMarkSweep_count
jvm_ConcurrentMarkSweep_time
jvm_ParNew
jvm_ParNew_count
jvm_ParNew_time
jvm_direct
jvm_direct_capacity
jvm_direct_count
jvm_direct_used
jvm_heap
jvm_heap_committed
jvm_heap_init
jvm_heap_max
jvm_heap_usage
jvm_heap_used
jvm_mapped
jvm_mapped_capacity
jvm_mapped_count
jvm_mapped_used
jvm_non-heap
jvm_non-heap_committed
jvm_non-heap_init
jvm_non-heap_max
jvm_non-heap_usage
jvm_non-heap_used
jvm_pools_CMS-Old-Gen
jvm_pools_CMS-Old-Gen_committed
jvm_pools_CMS-Old-Gen_init
jvm_pools_CMS-Old-Gen_max
jvm_pools_CMS-Old-Gen_usage
jvm_pools_CMS-Old-Gen_used
jvm_pools_Code-Cache
jvm_pools_Code-Cache_committed
jvm_pools_Code-Cache_init
jvm_pools_Code-Cache_max
jvm_pools_Code-Cache_usage
jvm_pools_Code-Cache_used
jvm_pools_Compressed-Class-Space
jvm_pools_Compressed-Class-Space_committed
jvm_pools_Compressed-Class-Space_init
jvm_pools_Compressed-Class-Space_max
jvm_pools_Compressed-Class-Space_usage
jvm_pools_Compressed-Class-Space_used
jvm_pools_Metaspace
jvm_pools_Metaspace_committed
jvm_pools_Metaspace_init
jvm_pools_Metaspace_max
jvm_pools_Metaspace_usage
jvm_pools_Metaspace_used
jvm_pools_Par-Eden-Space
jvm_pools_Par-Eden-Space_committed
jvm_pools_Par-Eden-Space_init
jvm_pools_Par-Eden-Space_max
jvm_pools_Par-Eden-Space_usage
jvm_pools_Par-Eden-Space_used
jvm_pools_Par-Survivor-Space
jvm_pools_Par-Survivor-Space_committed
jvm_pools_Par-Survivor-Space_init
jvm_pools_Par-Survivor-Space_max
jvm_pools_Par-Survivor-Space_usage
jvm_pools_Par-Survivor-Space_used
jvm_total
jvm_total_committed
jvm_total_init
jvm_total_max
jvm_total_used
kernel
max
mean
mem
min
numContainersPendingAllocate
numExecutorsFailed
numExecutorsRunning
numLocalityAwareTasks
numReleasedContainers
p50
p75
p95
p98
p99
p999
processes
stddev
swap
system
my_spark_metrics_computation_CodeGenerator_compilationTime_count
my_spark_metrics_computation_CodeGenerator_compilationTime_max
my_spark_metrics_computation_CodeGenerator_compilationTime_mean
my_spark_metrics_computation_CodeGenerator_compilationTime_min
my_spark_metrics_computation_CodeGenerator_compilationTime_p50
my_spark_metrics_computation_CodeGenerator_compilationTime_p75
my_spark_metrics_computation_CodeGenerator_compilationTime_p95
my_spark_metrics_computation_CodeGenerator_compilationTime_p98
my_spark_metrics_computation_CodeGenerator_compilationTime_p99
my_spark_metrics_computation_CodeGenerator_compilationTime_p999
my_spark_metrics_computation_CodeGenerator_compilationTime_stddev
my_spark_metrics_computation_CodeGenerator_generatedClassSize_count
my_spark_metrics_computation_CodeGenerator_generatedClassSize_max
my_spark_metrics_computation_CodeGenerator_generatedClassSize_mean
my_spark_metrics_computation_CodeGenerator_generatedClassSize_min
my_spark_metrics_computation_CodeGenerator_generatedClassSize_p50
my_spark_metrics_computation_CodeGenerator_generatedClassSize_p75
my_spark_metrics_computation_CodeGenerator_generatedClassSize_p95
my_spark_metrics_computation_CodeGenerator_generatedClassSize_p98
my_spark_metrics_computation_CodeGenerator_generatedClassSize_p99
my_spark_metrics_computation_CodeGenerator_generatedClassSize_p999
my_spark_metrics_computation_CodeGenerator_generatedClassSize_stddev
my_spark_metrics_computation_CodeGenerator_generatedMethodSize_count
my_spark_metrics_computation_CodeGenerator_generatedMethodSize_max
my_spark_metrics_computation_CodeGenerator_generatedMethodSize_mean
my_spark_metrics_computation_CodeGenerator_generatedMethodSize_min
my_spark_metrics_computation_CodeGenerator_generatedMethodSize_p50
my_spark_metrics_computation_CodeGenerator_generatedMethodSize_p75
my_spark_metrics_computation_CodeGenerator_generatedMethodSize_p95
my_spark_metrics_computation_CodeGenerator_generatedMethodSize_p98
my_spark_metrics_computation_CodeGenerator_generatedMethodSize_p99
my_spark_metrics_computation_CodeGenerator_generatedMethodSize_p999
my_spark_metrics_computation_CodeGenerator_generatedMethodSize_stddev
my_spark_metrics_computation_CodeGenerator_sourceCodeSize_count
my_spark_metrics_computation_CodeGenerator_sourceCodeSize_max
my_spark_metrics_computation_CodeGenerator_sourceCodeSize_mean
my_spark_metrics_computation_CodeGenerator_sourceCodeSize_min
my_spark_metrics_computation_CodeGenerator_sourceCodeSize_p50
my_spark_metrics_computation_CodeGenerator_sourceCodeSize_p75
my_spark_metrics_computation_CodeGenerator_sourceCodeSize_p95
my_spark_metrics_computation_CodeGenerator_sourceCodeSize_p98
my_spark_metrics_computation_CodeGenerator_sourceCodeSize_p99
my_spark_metrics_computation_CodeGenerator_sourceCodeSize_p999
my_spark_metrics_computation_CodeGenerator_sourceCodeSize_stddev
my_spark_metrics_computation_HiveExternalCatalog_fileCacheHits_count
my_spark_metrics_computation_HiveExternalCatalog_filesDiscovered_count
my_spark_metrics_computation_HiveExternalCatalog_hiveClientCalls_count
my_spark_metrics_computation_HiveExternalCatalog_parallelListingJobCount_count
my_spark_metrics_computation_HiveExternalCatalog_partitionsFetched_count
my_spark_metrics_computation_application_XXXXXXXXXXXXX_12478_1_executor_jvmCpuTime
my_spark_metrics_computation_application_XXXXXXXXXXXXX_12478_2_executor_jvmCpuTime
my_spark_metrics_computation_application_XXXXXXXXXXXXX_12478_3_executor_jvmCpuTime
my_spark_metrics_computation_application_XXXXXXXXXXXXX_12478_4_executor_jvmCpuTime
my_spark_metrics_computation_application_XXXXXXXXXXXXX_12478_5_executor_jvmCpuTime
my_spark_metrics_computation_application_XXXXXXXXXXXXX_12478_6_executor_jvmCpuTime
my_spark_metrics_computation_application_XXXXXXXXXXXXX_12478_applicationMaster_numContainersPendingAllocate
my_spark_metrics_computation_application_XXXXXXXXXXXXX_12478_applicationMaster_numExecutorsFailed
my_spark_metrics_computation_application_XXXXXXXXXXXXX_12478_applicationMaster_numExecutorsRunning
my_spark_metrics_computation_application_XXXXXXXXXXXXX_12478_applicationMaster_numLocalityAwareTasks
my_spark_metrics_computation_application_XXXXXXXXXXXXX_12478_applicationMaster_numReleasedContainers
"my.*.*.*.*.*.*.*.* .prefix.prefix.prefix.app_name.node.measurement.measurement.type_instance",
"my.*.*.*.*.*.*.*.*.* .prefix.prefix.prefix.app_name.node.measurement.measurement.measurement.type_instance"

--

--

--

Software engineer at Expedia Group. Passionate about data-science and Big Data technologies.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Enum & custom type from primitive JSON type

Top 10 YII App Development Companies in 2022

Developing a Bluetooth agent in Qt/C++ with BlueZ and D-Bus — Part 2

How to Deploy Django Application on AWS using ECS and ECR?

Docker the Pythonic Way

Your Guide to Building a MVP, with Examples

freeCodeCamp Algorithm Scripting

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
AS

AS

Software engineer at Expedia Group. Passionate about data-science and Big Data technologies.

More from Medium

What is Apache Spark???

What is compaction in big data applications(hudi, hive, spark, kafka, e.t.c)

How I sketched an Event Data Hub

Traffic hub

Demonstrate MapReduce word count program using Hadoop Cloudera (HDP) sandbox.