Control plane vs. data plane, again
spark-submit
│ parse args, set cluster manager from master URL
▼
DRIVER (your SparkContext)
│ registers the application, requests executors
▼
CLUSTER MANAGER (Standalone Master / YARN RM / K8s API)
│ picks nodes with free resources
▼
WORKER NODES launch EXECUTOR JVMs
│ each executor connects back to the DRIVER
▼
DRIVER schedules tasks straight onto executors (cluster manager steps aside)
The key idea: the cluster manager only handles process placement and lifecycle. Once executors register with the driver, all task scheduling flows directly between driver and executors, exactly as described on the scheduling page.
spark-submit — the universal launcher
SparkSubmit parses arguments, prepares the classpath, and dispatches based on the action (submit, kill, request status). prepareSubmitEnvironment reads the master URL to choose the cluster manager — spark:// → Standalone, yarn, k8s://, or local — and the deploy mode. In client mode it runs your main class in the submitting JVM; in cluster mode it asks the cluster manager to launch the driver on a worker.
Client mode vs. cluster mode
In client mode the driver runs wherever you typed spark-submit (your laptop, an edge node); executors run on the cluster and connect back to it. This is great for interactive shells but ties the application's life to your local process. In cluster mode the driver itself is shipped to and run on a cluster node, so the application survives your client disconnecting — the right choice for production jobs.
Standalone Master — the resource broker
The Standalone Master is an RPC endpoint that tracks registered workers, applications, and drivers. When a driver's StandaloneAppClient sends RegisterApplication, the master records it and runs schedule(). startExecutorsOnWorkers is a simple FIFO scheduler that spreads executors across workers with free cores and memory, then launchExecutor sends a LaunchExecutor message to the chosen worker.
Standalone Worker — the process launcher
A Worker registers its capacity with the master and waits. On LaunchExecutor it creates a work directory and an ExecutorRunner that spawns a separate CoarseGrainedExecutorBackend JVM; on LaunchDriver (cluster mode) it spawns the driver via a DriverRunner. The newly started executor connects back to the driver, not to the master.
How an executor joins the driver
After the master sends LaunchExecutor, it notifies the driver with ExecutorAdded. The executor process starts, registers with the driver's CoarseGrainedSchedulerBackend, and from then on receives LaunchTask messages directly. This is the seam where deployment hands off to scheduling: the same CoarseGrainedExecutorBackend is used regardless of whether YARN, Kubernetes, or Standalone launched it.
The cluster managers compared
| Manager | Master URL | How executors are placed |
|---|---|---|
| Standalone | spark://host:7077 | Spark's own Master/Worker daemons; FIFO across workers. |
| YARN | yarn | Executors run as YARN containers requested from the ResourceManager. |
| Kubernetes | k8s://https://... | Driver and executors run as pods created via the K8s API. |
| Local | local[*] | Driver and "executors" are threads in one JVM — for development. |
All four present the same SchedulerBackend to the rest of Spark, which is why a job written and tested with local[*] runs unchanged on a thousand-node YARN or Kubernetes cluster.
Key takeaways
spark-submitchooses the cluster manager from the master URL.- Client mode runs the driver locally; cluster mode ships it onto the cluster.
- The cluster manager only places processes; the driver schedules tasks directly.
- Executors are always
CoarseGrainedExecutorBackendJVMs, regardless of manager.