# cloud

CAP v2: In a distributed system (a collection of interconnected nodes that share data.), you can only have two out of the following three guarantees across a write/read pair: Consistency, Availability, and Partition Tolerance - one of them must be sacrificed.

Consistency: A read is guaranteed to return the most recent write for a given client. Availability: A non-failing node will return a reasonable response within a reasonable amount of time (no error or timeout). Partition Tolerance: The system will continue to function when network partitions occur.

raft, etcd

write ahead log
lead

pasos

b+ tree

lsm tree

# toolchains

GCP tools

# docker

https://github.com/Yelp/dumb-init

# kafka

codebase

one-liner: more-than-once commit, written in scala, batching messages with logical offset, log segment file, 1 partition to 1 consumer, system page cache(no in memory cache), broker, zookeeper instead of master node, rebalance process, client handle duplicate, CRC message, monitoring events, avro protocol

activeMQ, rabbitMQ, zeroMQ, JMS spec

# Spark SQL

one-liner: R dataframe like api, Catalyst as query optimizer, nested data model based on Hive, analyze logical plan eagerly, evaluate RDD lazily. Internally, it create a logical data scan operator points to RDD. columnar compression: dict encoding, run-length encoding.

logical optimizer: constant folding, predicate pushdown, projection pruning, null propagation, boolean expr simplification.

physical planning: pipeline projection

codegen: scala quasiquote, AST to code

user-define-types for ML

# dataflow stream model

Millwheel watermark, lower bound(heuristically) on event times processed by the pipeline

# Kubenetes

Kubenetes in action

https://google.qwiklabs.com/focuses/878?locale=en&parent=catalog&qlcampaign=77-18-gcpd-236&utm_source=gcp&utm_campaign=kubernetes&utm_medium=documentation

function as a service

← Logs CPP →