Startup
main() → Run() → BuildControllers() → RunControllers()
ControllerContext.
-
main() — entry point
// cmd/kube-controller-manager/controller-manager.go L36 func main() { command := app.NewControllerManagerCommand() code := cli.Run(command) os.Exit(code) }cmd/kube-controller-manager/controller-manager.go -
Run() — leader election → start controllersSets up health checks, starts the HTTPS server, and wraps the actual start work in a leader election lease. Only the current leader runs controllers.
// cmd/kube-controller-manager/app/controllermanager.go L199 func Run(ctx context.Context, c *config.CompletedConfig) error { // Start HTTP health / metrics server // Set up leader election ... run := func(ctx context.Context, ...) { controllerContext, _ := CreateControllerContext(ctx, c, ...) // L269 controllers, _ := BuildControllers(ctx, controllerContext, descriptors, ...) RunControllers(ctx, controllerContext, controllers, ...) } leaderelection.RunOrDie(ctx, leaderelection.LeaderElectionConfig{ Callbacks: leaderelection.LeaderCallbacks{OnStartedLeading: run}, }) }controllermanager.go L199 -
CreateControllerContext() — shared informers & clientsBuilds one
SharedInformerFactoryshared by all controllers. Each controller gets a dedicated client to avoid cross-controller rate-limit interference.// cmd/kube-controller-manager/app/controllermanager.go L530 func CreateControllerContext(ctx, s, ...) (ControllerContext, error) { versionedClient, _ := clientBuilder.Client("shared-informers") sharedInformers := informers.NewSharedInformerFactory(versionedClient, resyncPeriod) return ControllerContext{ ClientBuilder: clientBuilder, InformerFactory: sharedInformers, // ... }, nil }controllermanager.go L530 -
BuildControllers() + RunControllers()Instantiates each enabled controller by calling its constructor, then starts them concurrently. Each controller runs in its own goroutine(s).
// cmd/kube-controller-manager/app/controllermanager.go L626, L710 func BuildControllers(ctx, controllerCtx, controllerDescriptors, ...) ([]Controller, error) { for name, descriptor := range controllerDescriptors { if !controllerCtx.IsControllerEnabled(descriptor) { continue } ctrl, err := descriptor.constructor(ctx, controllerCtx, name) controllers = append(controllers, ctrl) } return controllers, nil } func RunControllers(ctx, controllerCtx, controllers, ...) { for _, ctrl := range controllers { go ctrl.Run(ctx, workers) // each controller runs its own reconcile loop } sharedInformers.Start(stopCh) // begin watching API server sharedInformers.WaitForCacheSync(stopCh) }controllermanager.go L626 controllermanager.go L710
Controller Pattern
The reconcile loop — inform → enqueue → reconcile
Watches API server.
Caches objects locally.
Fires event handlers.
Rate-limited, deduplicating.
Holds changed object keys.
Provides retry + backoff.
Reads current state.
Computes diff.
Calls API server to fix.
// Generic controller skeleton (pattern used by all controllers)
type Controller struct {
informer cache.SharedIndexInformer // watches resource type
queue workqueue.RateLimitingInterface
client kubernetes.Interface
}
func (c *Controller) Run(ctx context.Context, workers int) {
// Register event handlers: AddFunc/UpdateFunc/DeleteFunc → c.queue.Add(key)
c.informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: func(obj interface{}) { c.enqueue(obj) },
UpdateFunc: func(old, new interface{}) { c.enqueue(new) },
DeleteFunc: func(obj interface{}) { c.enqueue(obj) },
})
// Wait for informer cache to fill before starting workers
cache.WaitForCacheSync(ctx.Done(), c.informer.HasSynced)
// Start worker goroutines
for i := 0; i < workers; i++ {
go wait.UntilWithContext(ctx, c.worker, time.Second)
}
}
func (c *Controller) worker(ctx context.Context) {
for c.processNextItem(ctx) { }
}
func (c *Controller) processNextItem(ctx context.Context) bool {
key, quit := c.queue.Get() // blocks until item available
defer c.queue.Done(key)
if err := c.reconcile(ctx, key.(string)); err != nil {
c.queue.AddRateLimited(key) // requeue with backoff on error
}
return true
}
Key Controllers
Built-in controllers — what each one does
Ensures pod count matches
.spec.replicas. Creates or deletes
pods to correct the count.
Manages rolling updates by creating new ReplicaSets and scaling old ones down.
pkg/controller/deployment/Manages ordered, sticky pod identities and per-pod PVCs. Handles scale up/down with ordering guarantees.
pkg/controller/statefulset/Marks nodes as NotReady, evicts pods from unhealthy nodes, manages node taints.
pkg/controller/nodelifecycle/Watches Services and Pods; populates EndpointSlice objects that kube-proxy uses to program iptables/eBPF.
pkg/controller/endpointslice/
Deletes orphaned objects by following
ownerReferences. Handles
cross-namespace and cross-type ownership chains.
Admits or rejects resource creation based on namespace-level quota objects.
pkg/controller/resourcequota/Creates default ServiceAccounts in new namespaces and rotates their token secrets.
pkg/controller/serviceaccount/ReplicaSet controller — worked example
// pkg/controller/replicaset/replica_set.go (condensed)
func (rsc *ReplicaSetController) syncReplicaSet(ctx context.Context, key string) error {
ns, name, _ := cache.SplitMetaNamespaceKey(key)
rs, _ := rsc.rsLister.ReplicaSets(ns).Get(name) // read from local cache
// List pods owned by this ReplicaSet
allPods, _ := rsc.podLister.Pods(rs.Namespace).List(labels.Everything())
filteredPods := controller.FilterActivePods(allPods)
// Compute diff
diff := len(filteredPods) - int(*(rs.Spec.Replicas))
if diff < 0 {
rsc.slowStartBatch(...) // create -diff pods via API server
} else if diff > 0 {
rsc.deletePods(ctx, rs, filteredPods[:diff]) // delete excess pods
}
// Update .status.readyReplicas, .status.availableReplicas
return rsc.updateReplicaSetStatus(ctx, rs, filteredPods)
}
pkg/controller/replicaset/replica_set.go
Leader Election
Leader election — high availability without split-brain
coordination.k8s.io/v1 Lease object in the
kube-system namespace, renewed every few
seconds.
// pkg/client-go/tools/leaderelection/leaderelection.go (staging) // The leader periodically PATCHes: // leases/kube-controller-manager in kube-system // // If a leader dies, its lease expires (leaseDuration ~15s). // A standby then acquires the lease and calls OnStartedLeading. // // This prevents two controller managers from simultaneously // creating/deleting pods and causing a split-brain.staging/.../tools/leaderelection/leaderelection.go