Elasticsearch — Data & Interfaces

Persistence, serialization, external APIs

6. Data model and persistence

Domain entities

EntityRepresentationStorage
IndexIndexMetadata in cluster statePersisted metadata + on-disk shard dirs
ShardIndexShard + ShardRouting{data}/nodes/{id}/indices/{uuid}/{shard_id}/
DocumentLucene docs + _source stored fieldSegments under index/
TranslogTranslog / TranslogWritertranslog/ directory per shard
Cluster stateClusterState objectIn-memory all nodes; persisted on master
SnapshotRepositoryData in blob storeS3/GCS/fs repository

On-disk shard layout

{data.path}/nodes/{nodeId}/indices/{indexUUID}/{shardId}/
  translog/          ← write-ahead log (durability before flush)
  index/             ← Lucene segments (_0.si, _0.cfs, …)
  state/             ← shard state (seq_no, retention leases)

Write path durability

  1. Operation appended to translog (configurable durability: request|async)
  2. Indexed into Lucene buffer (in-memory)
  3. Refresh makes visible to search (new searcher on ReferenceManager)
  4. Flush commits Lucene segments; translog generation trimmed per retention policy
Translog.java InternalEngine.java

Sequence numbers & versioning

Each write gets a monotonic _seq_no and _primary_term on the primary. Replicas apply operations in order. Optimistic concurrency uses if_seq_no / if_primary_term. Global checkpoint tracks replication progress.

Cluster state serialization

ClusterState implements NamedWriteable — binary wire format for publication. Metadata also serialized to XContent for APIs. PersistedClusterStateService writes incremental metadata commits (similar in spirit to Lucene commits).

Mapping & document parsing

JSON source → XContentParserDocumentParserParsedDocument → Lucene Document. Field types define indexing (inverted index, doc values, norms). Dynamic templates control auto-mapping.

Configuration

SourceFormatLoaded into
config/elasticsearch.ymlYAMLSettings
config/jvm.optionsJVM flagsServerCli launcher
config/log4j2.propertiesLog4jLogConfigurator
KeystoreBinary secretsSecureSettings
Env varsES_* (via settings)Some settings allow env substitution

7. External interfaces

REST API

Spec-driven: rest-api-spec/src/main/resources/rest-api-spec/api/*.json defines paths, methods, params. Handlers implement RestHandler with Route declarations. API versioning via RestApiVersion and compatibility headers (REST_API_COMPATIBILITY.md).

Transport (internal RPC)

Binary protocol over TCP (Netty). Actions addressed by name string. Used for inter-node replication, search fan-out, cluster state publication. Not a supported public API (Java transport client removed).

Java REST client

client/rest — low-level HTTP; RestClient wraps Apache HttpClient/async. Sniffer discovers nodes.

Repository / snapshot API

RepositoriesService manages Repository implementations. Snapshots are incremental segment-level backups to blob stores.

Cross-cluster search/replication

Remote clusters configured in settings; RemoteTransportClient (x-pack) opens connections to remote coordinators.

System indices

Hidden indices (Kibana, ML, security) with descriptor metadata in SystemIndices. Access controlled via headers and security plugin.

External references: REST APIs reference · Index API · Search API · Snapshot/restore

Core Modules · Runtime & Errors →