A
About
Connect AI agents and engineers to Apache Spark History Server for intelligent job analysis, performance monitoring, and investigation. This MCP server exposes 19 tools that allow agents to query Spark History Server data using natural language—conducting multi-step investigations, synthesizing findings across tools, and answering questions about Spark applications.
Features
- 19 comprehensive tools spanning application info, job/stage analysis, executor & resource monitoring, SQL query analysis, performance bottlenecks, and comparative analysis
- Multi-server support: Route queries to production, staging, or development Spark History Servers
- Dual transport modes: stdio (for Claude Desktop, Amazon Q CLI) and streamable-http (for Kiro, LangGraph, Strands Agents)
- AWS integration: Works with AWS Glue and Amazon EMR Persistent UI
- Kubernetes-ready: Helm chart for production deployment with autoscaling
Tools
Application Information
list_applications— List applications with optional status, date, and limit filtersget_application— Get application detail: status, resources, duration, attempts
Job Analysis
list_jobs— List jobs with status filteringlist_slowest_jobs— Top N slowest jobs
Stage Analysis
list_stages— List stages with status filteringlist_slowest_stages— Top N slowest stagesget_stage— Stage detail with attempt and summary metricsget_stage_task_summary— Task metric distributions (execution time, memory, I/O, spill)
Executor & Resource Analysis
list_executors— List executors (active and optionally inactive)get_executor— Executor detail: resources, task stats, performanceget_executor_summary— Aggregate metrics across all executorsget_resource_usage_timeline— Chronological executor add/remove with resource totals
Configuration & Environment
get_environment— Spark config, JVM info, system properties, classpath
SQL & Query Analysis
list_slowest_sql_queries— Top N slowest SQL executions with metricsget_sql_execution— SQL execution detail with optional plan and node metricscompare_sql_execution_plans— Compare SQL plans and metrics between two jobs
Performance & Bottleneck Analysis
get_job_bottlenecks— Identify bottlenecks across stages, tasks, and executors
Comparative Analysis
compare_job_environments— Diff Spark configs between two applicationscompare_job_performance— Diff performance metrics between two applications
Usage Examples
- "Why is my ETL job running slower than yesterday?" → Uses
get_job_bottlenecks,list_slowest_stages, andcompare_job_performance - "What caused job 42 to fail?" → Uses
list_jobs,get_stage, andget_stage_task_summary - "Compare today's batch with yesterday's run" → Uses
compare_job_performanceandcompare_job_environments - "Find my slowest SQL queries and explain why" → Uses
list_slowest_sql_queries,get_sql_execution, andcompare_sql_execution_plans
Configuration
Single Server
servers:
local:
default: true
url: "http://your-spark-history-server:18080"
auth:
username: "user"
password: "pass"
include_plan_description: false
mcp:
transports:
- streamable-http
port: "18888"
debug: false
Multi-Server Setup
servers:
production:
default: true
url: "http://prod-spark-history:18080"
auth:
username: "user"
password: "pass"
staging:
url: "http://staging-spark-history:18080"
Agents can then target specific servers: "Get application <app_id> from the production server"
Integrations
- Claude Desktop (stdio): Setup guide
- Amazon Q CLI (stdio): Setup guide
- Kiro (streamable-http): Setup guide
- LangGraph (streamable-http): Setup guide
- Strands Agents (streamable-http): Setup guide
Kubernetes Deployment
Deploy using Helm:
helm install spark-history-mcp ./deploy/kubernetes/helm/mcp-apache-spark-history-server/
# Production with autoscaling
helm install spark-history-mcp ./deploy/kubernetes/helm/mcp-apache-spark-history-server/ \
--set replicaCount=3 \
--set autoscaling.enabled=true
AWS Integration
- AWS Glue — Connect to Glue Spark History Server
- Amazon EMR — Use EMR Persistent UI for Spark analysis
This server runs through your single 1Server connection. No extra config required.
0Installs
--Stars
Categories
AnalyticsDevOpsMonitoring
Links
Tags
Official