6. Data model and persistence
Domain entities
| Entity | Representation | Storage |
|---|---|---|
| Index | IndexMetadata in cluster state | Persisted metadata + on-disk shard dirs |
| Shard | IndexShard + ShardRouting | {data}/nodes/{id}/indices/{uuid}/{shard_id}/ |
| Document | Lucene docs + _source stored field | Segments under index/ |
| Translog | Translog / TranslogWriter | translog/ directory per shard |
| Cluster state | ClusterState object | In-memory all nodes; persisted on master |
| Snapshot | RepositoryData in blob store | S3/GCS/fs repository |
On-disk shard layout
{data.path}/nodes/{nodeId}/indices/{indexUUID}/{shardId}/
translog/ ← write-ahead log (durability before flush)
index/ ← Lucene segments (_0.si, _0.cfs, …)
state/ ← shard state (seq_no, retention leases)
Write path durability
- Operation appended to translog (configurable
durability: request|async) - Indexed into Lucene buffer (in-memory)
- Refresh makes visible to search (new searcher on
ReferenceManager) - Flush commits Lucene segments; translog generation trimmed per retention policy
Sequence numbers & versioning
Each write gets a monotonic _seq_no and _primary_term on the primary. Replicas apply operations in order. Optimistic concurrency uses if_seq_no / if_primary_term. Global checkpoint tracks replication progress.
Cluster state serialization
ClusterState implements NamedWriteable — binary wire format for publication. Metadata also serialized to XContent for APIs. PersistedClusterStateService writes incremental metadata commits (similar in spirit to Lucene commits).
Mapping & document parsing
JSON source → XContentParser → DocumentParser → ParsedDocument → Lucene Document. Field types define indexing (inverted index, doc values, norms). Dynamic templates control auto-mapping.
Configuration
| Source | Format | Loaded into |
|---|---|---|
config/elasticsearch.yml | YAML | Settings |
config/jvm.options | JVM flags | ServerCli launcher |
config/log4j2.properties | Log4j | LogConfigurator |
| Keystore | Binary secrets | SecureSettings |
| Env vars | ES_* (via settings) | Some settings allow env substitution |
7. External interfaces
REST API
Spec-driven: rest-api-spec/src/main/resources/rest-api-spec/api/*.json defines paths, methods, params. Handlers implement RestHandler with Route declarations. API versioning via RestApiVersion and compatibility headers (REST_API_COMPATIBILITY.md).
Transport (internal RPC)
Binary protocol over TCP (Netty). Actions addressed by name string. Used for inter-node replication, search fan-out, cluster state publication. Not a supported public API (Java transport client removed).
Java REST client
client/rest — low-level HTTP; RestClient wraps Apache HttpClient/async. Sniffer discovers nodes.
Repository / snapshot API
RepositoriesService manages Repository implementations. Snapshots are incremental segment-level backups to blob stores.
Cross-cluster search/replication
Remote clusters configured in settings; RemoteTransportClient (x-pack) opens connections to remote coordinators.
System indices
Hidden indices (Kibana, ML, security) with descriptor metadata in SystemIndices. Access controlled via headers and security plugin.