kube-controller-manager — Data Flow

Hosts all control-plane controllers: each reconciles actual state toward desired state

Startup

main() → Run() → BuildControllers() → RunControllers()
The controller manager starts a single process that hosts all built-in controllers. It elects a leader (if HA is configured) and then starts every enabled controller with a shared ControllerContext.
  1. main() — entry point
    // cmd/kube-controller-manager/controller-manager.go L36
    func main() {
        command := app.NewControllerManagerCommand()
        code := cli.Run(command)
        os.Exit(code)
    }
    cmd/kube-controller-manager/controller-manager.go
  2. Run() — leader election → start controllers
    Sets up health checks, starts the HTTPS server, and wraps the actual start work in a leader election lease. Only the current leader runs controllers.
    // cmd/kube-controller-manager/app/controllermanager.go L199
    func Run(ctx context.Context, c *config.CompletedConfig) error {
        // Start HTTP health / metrics server
        // Set up leader election ...
        run := func(ctx context.Context, ...) {
            controllerContext, _ := CreateControllerContext(ctx, c, ...)  // L269
            controllers, _ := BuildControllers(ctx, controllerContext, descriptors, ...)
            RunControllers(ctx, controllerContext, controllers, ...)
        }
        leaderelection.RunOrDie(ctx, leaderelection.LeaderElectionConfig{
            Callbacks: leaderelection.LeaderCallbacks{OnStartedLeading: run},
        })
    }
    controllermanager.go L199
  3. CreateControllerContext() — shared informers & clients
    Builds one SharedInformerFactory shared by all controllers. Each controller gets a dedicated client to avoid cross-controller rate-limit interference.
    // cmd/kube-controller-manager/app/controllermanager.go L530
    func CreateControllerContext(ctx, s, ...) (ControllerContext, error) {
        versionedClient, _ := clientBuilder.Client("shared-informers")
        sharedInformers := informers.NewSharedInformerFactory(versionedClient, resyncPeriod)
        return ControllerContext{
            ClientBuilder:      clientBuilder,
            InformerFactory:    sharedInformers,
            // ...
        }, nil
    }
    controllermanager.go L530
  4. BuildControllers() + RunControllers()
    Instantiates each enabled controller by calling its constructor, then starts them concurrently. Each controller runs in its own goroutine(s).
    // cmd/kube-controller-manager/app/controllermanager.go L626, L710
    func BuildControllers(ctx, controllerCtx, controllerDescriptors, ...) ([]Controller, error) {
        for name, descriptor := range controllerDescriptors {
            if !controllerCtx.IsControllerEnabled(descriptor) { continue }
            ctrl, err := descriptor.constructor(ctx, controllerCtx, name)
            controllers = append(controllers, ctrl)
        }
        return controllers, nil
    }
    
    func RunControllers(ctx, controllerCtx, controllers, ...) {
        for _, ctrl := range controllers {
            go ctrl.Run(ctx, workers)  // each controller runs its own reconcile loop
        }
        sharedInformers.Start(stopCh)  // begin watching API server
        sharedInformers.WaitForCacheSync(stopCh)
    }
    controllermanager.go L626 controllermanager.go L710

Controller Pattern

The reconcile loop — inform → enqueue → reconcile
Every controller follows the same three-step pattern. This design decouples event delivery from reconciliation: the queue absorbs bursts, applies back-pressure, and provides automatic retries with exponential backoff.
Informer
Watches API server.
Caches objects locally.
Fires event handlers.
Work Queue
Rate-limited, deduplicating.
Holds changed object keys.
Provides retry + backoff.
Reconcile
Reads current state.
Computes diff.
Calls API server to fix.
// Generic controller skeleton (pattern used by all controllers)

type Controller struct {
    informer cache.SharedIndexInformer  // watches resource type
    queue    workqueue.RateLimitingInterface
    client   kubernetes.Interface
}

func (c *Controller) Run(ctx context.Context, workers int) {
    // Register event handlers: AddFunc/UpdateFunc/DeleteFunc → c.queue.Add(key)
    c.informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
        AddFunc:    func(obj interface{}) { c.enqueue(obj) },
        UpdateFunc: func(old, new interface{}) { c.enqueue(new) },
        DeleteFunc: func(obj interface{}) { c.enqueue(obj) },
    })
    // Wait for informer cache to fill before starting workers
    cache.WaitForCacheSync(ctx.Done(), c.informer.HasSynced)
    // Start worker goroutines
    for i := 0; i < workers; i++ {
        go wait.UntilWithContext(ctx, c.worker, time.Second)
    }
}

func (c *Controller) worker(ctx context.Context) {
    for c.processNextItem(ctx) { }
}

func (c *Controller) processNextItem(ctx context.Context) bool {
    key, quit := c.queue.Get()     // blocks until item available
    defer c.queue.Done(key)
    if err := c.reconcile(ctx, key.(string)); err != nil {
        c.queue.AddRateLimited(key)  // requeue with backoff on error
    }
    return true
}

Key Controllers

Built-in controllers — what each one does
ReplicaSet

Ensures pod count matches .spec.replicas. Creates or deletes pods to correct the count.

pkg/controller/replicaset/replica_set.go
Deployment

Manages rolling updates by creating new ReplicaSets and scaling old ones down.

pkg/controller/deployment/
StatefulSet

Manages ordered, sticky pod identities and per-pod PVCs. Handles scale up/down with ordering guarantees.

pkg/controller/statefulset/
Node Lifecycle

Marks nodes as NotReady, evicts pods from unhealthy nodes, manages node taints.

pkg/controller/nodelifecycle/
Endpoint Slice

Watches Services and Pods; populates EndpointSlice objects that kube-proxy uses to program iptables/eBPF.

pkg/controller/endpointslice/
Garbage Collector

Deletes orphaned objects by following ownerReferences. Handles cross-namespace and cross-type ownership chains.

pkg/controller/garbagecollector/
Resource Quota

Admits or rejects resource creation based on namespace-level quota objects.

pkg/controller/resourcequota/
ServiceAccount

Creates default ServiceAccounts in new namespaces and rotates their token secrets.

pkg/controller/serviceaccount/
ReplicaSet controller — worked example
The ReplicaSet controller is the canonical example. It watches both ReplicaSets and Pods, and reconciles by counting pods that match the selector.
// pkg/controller/replicaset/replica_set.go (condensed)
func (rsc *ReplicaSetController) syncReplicaSet(ctx context.Context, key string) error {
    ns, name, _ := cache.SplitMetaNamespaceKey(key)
    rs, _ := rsc.rsLister.ReplicaSets(ns).Get(name)     // read from local cache

    // List pods owned by this ReplicaSet
    allPods, _ := rsc.podLister.Pods(rs.Namespace).List(labels.Everything())
    filteredPods := controller.FilterActivePods(allPods)

    // Compute diff
    diff := len(filteredPods) - int(*(rs.Spec.Replicas))
    if diff < 0 {
        rsc.slowStartBatch(...)  // create -diff pods via API server
    } else if diff > 0 {
        rsc.deletePods(ctx, rs, filteredPods[:diff])  // delete excess pods
    }

    // Update .status.readyReplicas, .status.availableReplicas
    return rsc.updateReplicaSetStatus(ctx, rs, filteredPods)
}
pkg/controller/replicaset/replica_set.go

Leader Election

Leader election — high availability without split-brain
In HA setups, multiple controller-manager pods run simultaneously but only one holds the leader lease. The lease is a coordination.k8s.io/v1 Lease object in the kube-system namespace, renewed every few seconds.
// pkg/client-go/tools/leaderelection/leaderelection.go (staging)
// The leader periodically PATCHes:
//   leases/kube-controller-manager in kube-system
//
// If a leader dies, its lease expires (leaseDuration ~15s).
// A standby then acquires the lease and calls OnStartedLeading.
//
// This prevents two controller managers from simultaneously
// creating/deleting pods and causing a split-brain.
staging/.../tools/leaderelection/leaderelection.go