# Docker Engine
- workload orchestrator
- redis pipeline
- user-space kernel for containers
- file proxy
runtime: cri-o rktnetes containerd docker-shim
BoltDB Badger LevelDB etcd
Container Networking Interface (CNI)
- sdn toolkit
CAP v2: In a distributed system (a collection of interconnected nodes that share data.), you can only have two out of the following three guarantees across a write/read pair: Consistency, Availability, and Partition Tolerance - one of them must be sacrificed.
Consistency: A read is guaranteed to return the most recent write for a given client. Availability: A non-failing node will return a reasonable response within a reasonable amount of time (no error or timeout). Partition Tolerance: The system will continue to function when network partitions occur.
- write ahead log
one-liner: more-than-once commit, written in scala, batching messages with logical offset, log segment file, 1 partition to 1 consumer, system page cache(no in memory cache), broker, zookeeper instead of master node, rebalance process, client handle duplicate, CRC message, monitoring events, avro protocol
activeMQ, rabbitMQ, zeroMQ, JMS spec
# Spark SQL
one-liner: R dataframe like api, Catalyst as query optimizer, nested data model based on Hive, analyze logical plan eagerly, evaluate RDD lazily. Internally, it create a logical data scan operator points to RDD. columnar compression: dict encoding, run-length encoding.
logical optimizer: constant folding, predicate pushdown, projection pruning, null propagation, boolean expr simplification.
physical planning: pipeline projection
codegen: scala quasiquote, AST to code
user-define-types for ML
# dataflow stream model
Millwheel watermark, lower bound(heuristically) on event times processed by the pipeline
Kubenetes in action