A

Apache Spark History Server

Installable
kubeflow
GitHub

About

Connect AI agents and engineers to Apache Spark History Server for intelligent job analysis, performance monitoring, and investigation. This MCP server exposes 19 tools that allow agents to query Spark History Server data using natural language—conducting multi-step investigations, synthesizing findings across tools, and answering questions about Spark applications.

Features

  • 19 comprehensive tools spanning application info, job/stage analysis, executor & resource monitoring, SQL query analysis, performance bottlenecks, and comparative analysis
  • Multi-server support: Route queries to production, staging, or development Spark History Servers
  • Dual transport modes: stdio (for Claude Desktop, Amazon Q CLI) and streamable-http (for Kiro, LangGraph, Strands Agents)
  • AWS integration: Works with AWS Glue and Amazon EMR Persistent UI
  • Kubernetes-ready: Helm chart for production deployment with autoscaling

Tools

Application Information

  • list_applications — List applications with optional status, date, and limit filters
  • get_application — Get application detail: status, resources, duration, attempts

Job Analysis

  • list_jobs — List jobs with status filtering
  • list_slowest_jobs — Top N slowest jobs

Stage Analysis

  • list_stages — List stages with status filtering
  • list_slowest_stages — Top N slowest stages
  • get_stage — Stage detail with attempt and summary metrics
  • get_stage_task_summary — Task metric distributions (execution time, memory, I/O, spill)

Executor & Resource Analysis

  • list_executors — List executors (active and optionally inactive)
  • get_executor — Executor detail: resources, task stats, performance
  • get_executor_summary — Aggregate metrics across all executors
  • get_resource_usage_timeline — Chronological executor add/remove with resource totals

Configuration & Environment

  • get_environment — Spark config, JVM info, system properties, classpath

SQL & Query Analysis

  • list_slowest_sql_queries — Top N slowest SQL executions with metrics
  • get_sql_execution — SQL execution detail with optional plan and node metrics
  • compare_sql_execution_plans — Compare SQL plans and metrics between two jobs

Performance & Bottleneck Analysis

  • get_job_bottlenecks — Identify bottlenecks across stages, tasks, and executors

Comparative Analysis

  • compare_job_environments — Diff Spark configs between two applications
  • compare_job_performance — Diff performance metrics between two applications

Usage Examples

  • "Why is my ETL job running slower than yesterday?" → Uses get_job_bottlenecks, list_slowest_stages, and compare_job_performance
  • "What caused job 42 to fail?" → Uses list_jobs, get_stage, and get_stage_task_summary
  • "Compare today's batch with yesterday's run" → Uses compare_job_performance and compare_job_environments
  • "Find my slowest SQL queries and explain why" → Uses list_slowest_sql_queries, get_sql_execution, and compare_sql_execution_plans

Configuration

Single Server

servers:
  local:
    default: true
    url: "http://your-spark-history-server:18080"
    auth:
      username: "user"
      password: "pass"
    include_plan_description: false
mcp:
  transports:
    - streamable-http
  port: "18888"
  debug: false

Multi-Server Setup

servers:
  production:
    default: true
    url: "http://prod-spark-history:18080"
    auth:
      username: "user"
      password: "pass"
  staging:
    url: "http://staging-spark-history:18080"

Agents can then target specific servers: "Get application <app_id> from the production server"

Integrations

Kubernetes Deployment

Deploy using Helm:

helm install spark-history-mcp ./deploy/kubernetes/helm/mcp-apache-spark-history-server/

# Production with autoscaling
helm install spark-history-mcp ./deploy/kubernetes/helm/mcp-apache-spark-history-server/ \
  --set replicaCount=3 \
  --set autoscaling.enabled=true

AWS Integration

  • AWS Glue — Connect to Glue Spark History Server
  • Amazon EMR — Use EMR Persistent UI for Spark analysis

This server runs through your single 1Server connection. No extra config required.

0Installs
--Stars

Categories

AnalyticsDevOpsMonitoring

Tags

Official