Tensor class
tensor.py:80
▼
The Tensor class is the user-facing entry point. Every operation builds a lazy computation graph (UOp DAG) — nothing executes until .realize() is called.
Tensor inherits from OpMixin which pulls in ElementwiseMixin, MovementMixin, and ReduceMixin.
Key attributes
uop
UOp — the computation graph node for this tensor; the root of the DAG
device
str — target device string, e.g.
"CUDA", "CPU", "METAL"
dtype
DType — numeric type (dtypes.float32, dtypes.int32, etc.)
shape
tuple[sint,...] — dimension sizes; may contain symbolic ints
grad
Tensor|None — gradient tensor after backward pass
Constructor inputs
scalar/list
Python number or nested list → converted to UOp via
tensor.py:94 — Tensor.__init__
CONST/BUFFER
numpy array
ndarray → allocated Buffer, data copied in via copyin()
UOp
Wrap an existing UOp node directly (used internally)
None
Allocates an uninitialized BUFFER on the given device
Operation mixins
mixin/
▼
Operations are split into three mixin classes. Each method eventually calls _apply_uop() to build a new UOp node.
E
ElementwiseMixin
Pointwise ops:
add, sub, mul, div, neg, exp2, log2, sin, sqrt, cast, where, …
ADDMULEXP2LOG2WHERECAST
mixin/elementwise.py:10
M
MovementMixin
Shape/layout ops:
reshape, permute, expand, pad, shrink, flip, transpose, slice, …
RESHAPEPERMUTEEXPANDPADSHRINKFLIP
mixin/movement.py
R
ReduceMixin
Reduction ops:
sum, max, min, mean, prod, std, argmax, argmin, …
REDUCEADDMAX
mixin/reduce.py
_apply_uop — lazy graph construction
tensor.py:147
▼
Every op calls _apply_uop which wraps the UOp creation and returns a new Tensor object containing the new DAG node. No computation happens yet.
user: a = Tensor([1,2,3]) # BUFFER uop user: b = Tensor([4,5,6]) # BUFFER uop user: c = a + b # creates ADD uop: src=(a.uop, b.uop) # c.uop = UOp(Ops.ADD, dtypes.int32, src=(a.uop, b.uop)) # nothing executed yet — just a graph nodetensor.py:147 — _apply_uop
The DAG is immutable and deduplicated: if two operations produce identical UOps (same op, dtype, src, arg), the metaclass returns the same cached node.
realize() & linear_with_vars()
tensor.py:228–241
▼
These are the two exits from the lazy graph — they trigger the full pipeline from scheduling through execution.
1
realize(*lst)
Materializes this tensor (and optionally others). Calls
tensor.py:241 — realize()
linear_with_vars() then run_linear(). Returns self so it chains.2
linear_with_vars(*lst)
Builds a SINK UOp over all tensor outputs, then calls
tensor.py:228 — linear_with_vars()
create_linear_with_vars() in the scheduler. Returns (linear_uop, var_vals) — the compiled execution plan without running it.
Example: a.matmul(b)
walkthrough
▼
a = Tensor.randn(4, 8) # BUFFER(shape=(4,8)) b = Tensor.randn(8, 16) # BUFFER(shape=(8,16)) c = a.matmul(b) # Internally, matmul does roughly: # a2 = a.reshape(4,8,1) → RESHAPE uop # b2 = b.permute(1,0).reshape(1,8,16) → PERMUTE + RESHAPE uop # mul = (a2 * b2) → MUL uop (broadcast EXPAND implied) # c = mul.sum(axis=1) → REDUCE(ADD) uop over axis=1 # # No FLOPS yet — just a DAG: # # BUFFER(a) ─┐ # ├─ RESHAPE ─┐ # ├─ MUL ─ REDUCE(ADD) ── c.uop # BUFFER(b) ─ PERMUTE ─ RESHAPE ─┘ c.realize() # ← triggers scheduling → codegen → execution