8. Error handling and edge cases
Exception hierarchy
Most failures extend ElasticsearchException, carrying HTTP status via status(). REST layer maps exceptions in RestController / ElasticsearchException to JSON error bodies with type, reason, stack_trace (if enabled).
ElasticsearchStatusException— explicit REST statusClusterBlockException— index/cluster blocked (read/write/metadata)IndexNotFoundException,DocumentMissingExceptionVersionConflictEngineException— OCC failure at engineCircuitBreakingException— JVM heap / request breaker tripped
ActionListener pattern
Async throughout: transport actions take ActionListener<Response>. Composed via ActionListener#delegateFailure, SubscribableListener. Failures propagate to listener without blocking threads.
Validation layers
- REST param parsing (
RestRequest) ActionRequest#validate()→ActionRequestValidationException- Cluster blocks check (master actions, search, bulk)
- Shard-level engine exceptions (per bulk item)
Retries
- Clients expected to retry 429/503 with backoff (not automatic in server)
- Internal: peer recovery retries, snapshot retries, repository blob retries
- Master task resubmission on
NotMasterExceptionin some paths
Circuit breakers
CircuitBreakerService tracks in-flight bytes (fielddata, request, accounting). RestController checks in-flight HTTP breaker before dispatch. Search aggregations use BreakerService on big structures.
9. Concurrency and lifecycle
Thread pools
| Pool | Typical work |
|---|---|
write | Indexing, bulk shard ops |
search | Query/fetch phases |
search_coordination | Search coordinator merge |
management | Cluster state tasks (master) |
generic | Misc async work |
snapshot | Snapshot threads |
refresh | Scheduled refresh |
Rule: TransportService threads must not execute blocking Lucene IO; shard operations use Engine with internal locking / AsyncIOProcessor for translog fsync.
Locking & consistency
- Primary shard — single writer; seq_no ordering
- Engine —
KeyedLockper id for concurrent index/delete same doc - Cluster state — single-threaded master service; immutable published states
- IndexShard —
ReentrantLockfor engine lifecycle - Translog — fsync serialisation via
AsyncIOProcessor
Cancellation
TaskManager registers tasks with parent links. Search uses CancellableTask; HTTP channel close triggers cancel. SearchTaskWatchdog kills long searches.
Shutdown
Node.close() → reverse start order HTTP stop → cluster leave → indices close → thread pool shutdown Plugins LifecycleComponent.stop() → NodeEnvironment.close()
Shard flush on close attempts to persist translog + Lucene commit.
Security-sensitive paths
- x-pack
Securityplugin —ActionFilterfor authz before actions EntitlementBootstrap— limits plugin native/file/network access (JDK 24+)SecureSettings— keystore secrets never logged- Scripting sandbox — Painless whitelist; other langs restricted
10. Important code walkthroughs
Walkthrough 1 — TransportAction.execute
Purpose: Universal entry for all named actions (index, search, admin).
execute(task, request, listener)callshandleExecution.- Registers task with
TaskManager(for cancellation, headers). - Runs each
ActionFilterin order (security, logging). - Dispatches
doExecuteon the action'sExecutorthread pool. - Listener receives response or exception; resources released via
Releasables.
Why it matters: Adding a new API requires a new TransportAction subclass + registration in ActionModule + optional RestHandler. Forgetting registration → "No handler for action".
Break risk: Running doExecute on wrong executor can block transport threads and stall the cluster.
Walkthrough 2 — InternalEngine.index
Purpose: Apply a single index operation on the primary with Lucene + translog semantics.
indexingStrategyForOperationdecides: index, update, duplicate, skip (soft delete).- On primary origin: assign
seqNoviagenerateSeqNoForOperationOnPrimary. - Add to translog for durability.
- Update Lucene via
IndexWriter(add/update document). - Return
IndexResultwith version, seqNo, failure.
Edge cases: Version conflicts return failure without indexing; soft deletes use SoftDeletesRetentionMergePolicy; append-only mode skips update path.
Break risk: Incorrect seq_no assignment corrupts replica consistency; translog/Lucene ordering bugs lose durability.
Walkthrough 3 — MasterService.executeAndPublishBatch
Purpose: Atomically run queued cluster-state tasks and publish one new state.
- Drain task batch from priority queue.
- Call executor's
execute(BatchExecutionContext)— tasks mutate builder starting from previous state. patchVersionsbumps cluster state version and term metadata.- If changed, call
publishClusterStateUpdate→Coordinator.publish. - On success, run task listeners; on failure,
onBatchFailure.
Why batched: Amortizes publication cost; related tasks (e.g. multiple mapping updates) merge into one state version.
Break risk: Non-deterministic executor → divergent states across master re-elections; must be pure function of inputs + previous state.
Walkthrough 4 — RestController.dispatchRequest
Purpose: Route HTTP request to handler with filters and error handling.
- Resolve route from
PathTrieby method + path. - Apply
RestInterceptorchain (security, product checks). - Parse
RestApiVersionfrom headers/params. - Call
handler.prepareRequest→ returnsRestChannelConsumer. - Consumer runs with
NodeClient, writes toRestChannel. - Uncaught exceptions → JSON error response with appropriate status.
Break risk: Consuming request body twice; missing ref-count release on chunked responses.
11. Non-obvious insights
Hidden coupling & conventions
- Everything is an action. Even single-doc index goes through bulk machinery (
TransportIndexAction→TransportBulkAction). - Cluster state is the source of truth for routing; data nodes cache it via applier thread — stale cache window is bounded by publication latency.
- Guice injector is built once; services are singletons. Testing uses
NodeConstructionwith test overrides. - NamedWriteable registry must list every custom type serialized over transport — forgetting breaks wire compatibility.
- TransportVersion vs Version — wire protocol evolves separately from release version.
- Project metadata — ES 9.x multi-project work adds
ProjectMetadatalayer insideClusterState(@FixForMultiProjectannotations mark migration). - System indices — hard-coded descriptors; Kibana/ML depend on exact names and mappings.
- RandomizedTesting — tests run with random seeds, time zones, Lucene codecs; failures need
-Dtests.seedto reproduce. - Entitlements — new plugin sandbox; policy patches loaded from
es.entitlements.policy.*resources in phase 2 bootstrap. - Performance hot paths — avoid object allocation in bulk indexing; Lucene merge scheduling affects write latency; search uses
IndexSearcherthread pool per shard.