tinygrad — Tensor & Operations

Tensor class tensor.py:80 ▼

The Tensor class is the user-facing entry point. Every operation builds a lazy computation graph (UOp DAG) — nothing executes until .realize() is called.

Tensor inherits from OpMixin which pulls in ElementwiseMixin, MovementMixin, and ReduceMixin.

tensor.py:80 — class Tensor(OpMixin)

Key attributes

uop UOp — the computation graph node for this tensor; the root of the DAG device str — target device string, e.g. "CUDA", "CPU", "METAL" dtype DType — numeric type (dtypes.float32, dtypes.int32, etc.) shape tuple[sint,...] — dimension sizes; may contain symbolic ints grad Tensor|None — gradient tensor after backward pass

Constructor inputs

scalar/list Python number or nested list → converted to UOp via CONST/BUFFER numpy array ndarray → allocated Buffer, data copied in via copyin() UOp Wrap an existing UOp node directly (used internally) None Allocates an uninitialized BUFFER on the given device

tensor.py:94 — Tensor.__init__

Operation mixins mixin/ ▼

Operations are split into three mixin classes. Each method eventually calls _apply_uop() to build a new UOp node.

E

ElementwiseMixin

Pointwise ops: add, sub, mul, div, neg, exp2, log2, sin, sqrt, cast, where, …

ADDMULEXP2LOG2WHERECAST

mixin/elementwise.py:10

M

MovementMixin

Shape/layout ops: reshape, permute, expand, pad, shrink, flip, transpose, slice, …

RESHAPEPERMUTEEXPANDPADSHRINKFLIP

mixin/movement.py

R

ReduceMixin

Reduction ops: sum, max, min, mean, prod, std, argmax, argmin, …

REDUCEADDMAX

mixin/reduce.py

_apply_uop — lazy graph construction tensor.py:147 ▼

Every op calls _apply_uop which wraps the UOp creation and returns a new Tensor object containing the new DAG node. No computation happens yet.

user: a = Tensor([1,2,3])   # BUFFER uop
user: b = Tensor([4,5,6])   # BUFFER uop
user: c = a + b             # creates ADD uop: src=(a.uop, b.uop)
# c.uop = UOp(Ops.ADD, dtypes.int32, src=(a.uop, b.uop))
# nothing executed yet — just a graph node

tensor.py:147 — _apply_uop

The DAG is immutable and deduplicated: if two operations produce identical UOps (same op, dtype, src, arg), the metaclass returns the same cached node.

realize() & linear_with_vars() tensor.py:228–241 ▼

These are the two exits from the lazy graph — they trigger the full pipeline from scheduling through execution.

1

realize(*lst)

Materializes this tensor (and optionally others). Calls linear_with_vars() then run_linear(). Returns self so it chains.

tensor.py:241 — realize()

2

linear_with_vars(*lst)

Builds a SINK UOp over all tensor outputs, then calls create_linear_with_vars() in the scheduler. Returns (linear_uop, var_vals) — the compiled execution plan without running it.

tensor.py:228 — linear_with_vars()

Example: a.matmul(b) walkthrough ▼

a = Tensor.randn(4, 8)    # BUFFER(shape=(4,8))
b = Tensor.randn(8, 16)   # BUFFER(shape=(8,16))
c = a.matmul(b)

# Internally, matmul does roughly:
#   a2 = a.reshape(4,8,1)         → RESHAPE uop
#   b2 = b.permute(1,0).reshape(1,8,16) → PERMUTE + RESHAPE uop
#   mul = (a2 * b2)               → MUL uop (broadcast EXPAND implied)
#   c = mul.sum(axis=1)           → REDUCE(ADD) uop over axis=1
#
# No FLOPS yet — just a DAG:
#
#   BUFFER(a) ─┐
#               ├─ RESHAPE ─┐
#                           ├─ MUL ─ REDUCE(ADD) ── c.uop
#   BUFFER(b) ─ PERMUTE ─ RESHAPE ─┘

c.realize()   # ← triggers scheduling → codegen → execution