Controller Manager — Kubernetes Data Flow

Startup

main() → Run() → BuildControllers() → RunControllers()

                        The controller manager starts a single process that
                        hosts all built-in controllers. It elects a leader (if
                        HA is configured) and then starts every enabled
                        controller with a shared ControllerContext.
                    

main() — entry point

// cmd/kube-controller-manager/controller-manager.go L36
func main() {
    command := app.NewControllerManagerCommand()
    code := cli.Run(command)
    os.Exit(code)
}

cmd/kube-controller-manager/controller-manager.go

Run() — leader election → start controllers

Sets up health checks, starts the HTTPS server, and wraps the actual start work in a leader election lease. Only the current leader runs controllers.

// cmd/kube-controller-manager/app/controllermanager.go L199
func Run(ctx context.Context, c *config.CompletedConfig) error {
    // Start HTTP health / metrics server
    // Set up leader election ...
    run := func(ctx context.Context, ...) {
        controllerContext, _ := CreateControllerContext(ctx, c, ...)  // L269
        controllers, _ := BuildControllers(ctx, controllerContext, descriptors, ...)
        RunControllers(ctx, controllerContext, controllers, ...)
    }
    leaderelection.RunOrDie(ctx, leaderelection.LeaderElectionConfig{
        Callbacks: leaderelection.LeaderCallbacks{OnStartedLeading: run},
    })
}

controllermanager.go L199

CreateControllerContext() — shared informers & clients

Builds one SharedInformerFactory shared by all controllers. Each controller gets a dedicated client to avoid cross-controller rate-limit interference.

// cmd/kube-controller-manager/app/controllermanager.go L530
func CreateControllerContext(ctx, s, ...) (ControllerContext, error) {
    versionedClient, _ := clientBuilder.Client("shared-informers")
    sharedInformers := informers.NewSharedInformerFactory(versionedClient, resyncPeriod)
    return ControllerContext{
        ClientBuilder:      clientBuilder,
        InformerFactory:    sharedInformers,
        // ...
    }, nil
}

controllermanager.go L530

BuildControllers() + RunControllers()

Instantiates each enabled controller by calling its constructor, then starts them concurrently. Each controller runs in its own goroutine(s).

// cmd/kube-controller-manager/app/controllermanager.go L626, L710
func BuildControllers(ctx, controllerCtx, controllerDescriptors, ...) ([]Controller, error) {
    for name, descriptor := range controllerDescriptors {
        if !controllerCtx.IsControllerEnabled(descriptor) { continue }
        ctrl, err := descriptor.constructor(ctx, controllerCtx, name)
        controllers = append(controllers, ctrl)
    }
    return controllers, nil
}

func RunControllers(ctx, controllerCtx, controllers, ...) {
    for _, ctrl := range controllers {
        go ctrl.Run(ctx, workers)  // each controller runs its own reconcile loop
    }
    sharedInformers.Start(stopCh)  // begin watching API server
    sharedInformers.WaitForCacheSync(stopCh)
}

controllermanager.go L626 controllermanager.go L710

Controller Pattern

The reconcile loop — inform → enqueue → reconcile

                        Every controller follows the same three-step pattern.
                        This design decouples event delivery from
                        reconciliation: the queue absorbs bursts, applies
                        back-pressure, and provides automatic retries with
                        exponential backoff.
                    

Informer
Watches API server.
Caches objects locally.
Fires event handlers.

→

Work Queue
Rate-limited, deduplicating.
Holds changed object keys.
Provides retry + backoff.

→

Reconcile
Reads current state.
Computes diff.
Calls API server to fix.

// Generic controller skeleton (pattern used by all controllers)

type Controller struct {
    informer cache.SharedIndexInformer  // watches resource type
    queue    workqueue.RateLimitingInterface
    client   kubernetes.Interface
}

func (c *Controller) Run(ctx context.Context, workers int) {
    // Register event handlers: AddFunc/UpdateFunc/DeleteFunc → c.queue.Add(key)
    c.informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
        AddFunc:    func(obj interface{}) { c.enqueue(obj) },
        UpdateFunc: func(old, new interface{}) { c.enqueue(new) },
        DeleteFunc: func(obj interface{}) { c.enqueue(obj) },
    })
    // Wait for informer cache to fill before starting workers
    cache.WaitForCacheSync(ctx.Done(), c.informer.HasSynced)
    // Start worker goroutines
    for i := 0; i < workers; i++ {
        go wait.UntilWithContext(ctx, c.worker, time.Second)
    }
}

func (c *Controller) worker(ctx context.Context) {
    for c.processNextItem(ctx) { }
}

func (c *Controller) processNextItem(ctx context.Context) bool {
    key, quit := c.queue.Get()     // blocks until item available
    defer c.queue.Done(key)
    if err := c.reconcile(ctx, key.(string)); err != nil {
        c.queue.AddRateLimited(key)  // requeue with backoff on error
    }
    return true
}

Key Controllers

Built-in controllers — what each one does

ReplicaSet

Ensures pod count matches .spec.replicas. Creates or deletes pods to correct the count.

pkg/controller/replicaset/replica_set.go

Deployment

Manages rolling updates by creating new ReplicaSets and scaling old ones down.

pkg/controller/deployment/

StatefulSet

Manages ordered, sticky pod identities and per-pod PVCs. Handles scale up/down with ordering guarantees.

pkg/controller/statefulset/

Node Lifecycle

Marks nodes as NotReady, evicts pods from unhealthy nodes, manages node taints.

pkg/controller/nodelifecycle/

Endpoint Slice

Watches Services and Pods; populates EndpointSlice objects that kube-proxy uses to program iptables/eBPF.

pkg/controller/endpointslice/

Garbage Collector

Deletes orphaned objects by following ownerReferences. Handles cross-namespace and cross-type ownership chains.

pkg/controller/garbagecollector/

Resource Quota

Admits or rejects resource creation based on namespace-level quota objects.

pkg/controller/resourcequota/

ServiceAccount

Creates default ServiceAccounts in new namespaces and rotates their token secrets.

pkg/controller/serviceaccount/

ReplicaSet controller — worked example

                        The ReplicaSet controller is the canonical example. It
                        watches both ReplicaSets and Pods, and reconciles by
                        counting pods that match the selector.
                    

// pkg/controller/replicaset/replica_set.go (condensed)
func (rsc *ReplicaSetController) syncReplicaSet(ctx context.Context, key string) error {
    ns, name, _ := cache.SplitMetaNamespaceKey(key)
    rs, _ := rsc.rsLister.ReplicaSets(ns).Get(name)     // read from local cache

    // List pods owned by this ReplicaSet
    allPods, _ := rsc.podLister.Pods(rs.Namespace).List(labels.Everything())
    filteredPods := controller.FilterActivePods(allPods)

    // Compute diff
    diff := len(filteredPods) - int(*(rs.Spec.Replicas))
    if diff < 0 {
        rsc.slowStartBatch(...)  // create -diff pods via API server
    } else if diff > 0 {
        rsc.deletePods(ctx, rs, filteredPods[:diff])  // delete excess pods
    }

    // Update .status.readyReplicas, .status.availableReplicas
    return rsc.updateReplicaSetStatus(ctx, rs, filteredPods)
}

pkg/controller/replicaset/replica_set.go

Leader Election

Leader election — high availability without split-brain

                        In HA setups, multiple controller-manager pods run
                        simultaneously but only one holds the leader lease. The
                        lease is a
                        coordination.k8s.io/v1 Lease object in the
                        kube-system namespace, renewed every few
                        seconds.
                    

// pkg/client-go/tools/leaderelection/leaderelection.go (staging)
// The leader periodically PATCHes:
//   leases/kube-controller-manager in kube-system
//
// If a leader dies, its lease expires (leaseDuration ~15s).
// A standby then acquires the lease and calls OnStartedLeading.
//
// This prevents two controller managers from simultaneously
// creating/deleting pods and causing a split-brain.

staging/.../tools/leaderelection/leaderelection.go