tinygrad — UOps (IR)

UOp dataclass uop/ops.py:128 ▼

UOp is the single IR node that represents every operation in the entire pipeline — from high-level tensor ops through scheduling to compiled kernels. All Tensor computation is a DAG of UOps.

@dataclass(eq=False, slots=True)
class UOp(OpMixin, metaclass=UOpMetaClass):
  op:       Ops                  # what this node does
  dtype:    DType  = dtypes.void # output type
  src:      tuple[UOp,...] = ()  # input nodes (the DAG edges)
  arg:      Any    = None        # op-specific extra data
  tag:      Any    = None        # optional tracing metadata
  metadata: Metadata = None      # user-supplied metadata

uop/ops.py:128 — class UOp

Deduplication: UOpMetaClass (line 85) caches instances — two UOps with identical (op, dtype, src, arg) return the same object. This keeps the DAG compact and enables O(1) equality checks.

Key properties

toposort() Returns ordered dict of all ancestor UOps; respects control-flow entry ordering _shape Recursively infers shape from src nodes; may contain symbolic sint values ssimplify() Constant-fold symbolic expressions where possible sym_infer(var_vals) Substitute symbolic variables with concrete ints

uop/ops.py:174 — toposort() uop/ops.py:212 — _shape

Ops enum — all operation codes uop/__init__.py:13 ▼

The Ops enum enumerates every operation a UOp can represent. The ordering controls toposort priority.

Defines / Params

DEFINE_VAR BIND SPECIAL DEFINE_LOCAL DEFINE_REG

Program Structure

SINK LINEAR PROGRAM SOURCE BINARY PARAM FUNCTION CALL

Memory

BUFFER BUFFER_VIEW LOAD STORE INDEX COPY

Arithmetic (Unary)

CAST BITCAST EXP2 LOG2 SIN SQRT RECIPROCAL NEG TRUNC

Arithmetic (Binary)

ADD MUL SUB FDIV MAX SHL SHR XOR OR AND CMPLT CMPNE CMPEQ POW

Ternary & Reduce

WHERE MULACC REDUCE ALLREDUCE WMMA

Control Flow

RANGE END IF ENDIF BARRIER WAIT

Tensor-only (pre-schedule)

RESHAPE PERMUTE EXPAND PAD SHRINK FLIP CONTIGUOUS DEVICE

uop/__init__.py:13 — class Ops(FastEnum)

PatternMatcher — AST rewriting uop/ops.py:1299 ▼

All optimization and lowering passes are expressed as PatternMatcher rules. A pattern describes a UOp subgraph to match; if matched, a replacement is returned.

pm = PatternMatcher([
  # constant folding: ADD(CONST a, CONST b) → CONST(a+b)
  (UPat(Ops.ADD, src=(UPat(Ops.CONST, name="a"),
                      UPat(Ops.CONST, name="b"))),
   lambda a, b: UOp.const(a.dtype, a.arg + b.arg)),
  ...
])

new_uop = graph_rewrite(uop, pm)

uop/ops.py:1299 — class PatternMatcher

Bottom-up rewriting: graph_rewrite applies the pattern matcher bottom-up, repeatedly until no more rules match. All codegen, scheduling, and optimization passes are built from these rules.

Kernel-related UOp containers uop/ops.py:1037–1097 ▼

Three frozen dataclasses wrap kernel-level information and are stored in UOp arg fields.

KernelInfo Local sizes, dont_use_locals, tensor_core config. Stored in PROGRAM arg. ProgramInfo Device, name, globals/locals shape, op counts. Attached to PROGRAM after compilation. CallInfo Argument buffer list for a CALL node: which Buffers the kernel reads/writes and their roles.

uop/ops.py:1037 — KernelInfo uop/ops.py:1050 — ProgramInfo uop/ops.py:1097 — CallInfo

UOp lifecycle through the pipeline stages ▼

# Stage 1 — Tensor graph (high-level, movement ops present)
BUFFER ─ RESHAPE ─ PERMUTE ─ EXPAND ─┐
                                      ├─ MUL ─ REDUCE ─ SINK
BUFFER ─────────────────────────────┘

# Stage 2 — After scheduling (movement ops eliminated, CALL tree)
LINEAR(
  CALL(PROGRAM(kernel_ast), buf_a, buf_b, buf_out)
  CALL(COPY, buf_src, buf_dst)
  ...
)

# Stage 3 — After codegen (lowered to indexed loops)
SINK(
  RANGE(0..N) ─┐
               ├─ STORE(INDEX(buf_out, i), LOAD(buf_a,i) * LOAD(buf_b,i))
               └─ END
)

# Stage 4 — After render (device source string)
SOURCE("kernel void k(float* a, float* b, float* c) { ... }")

# Stage 5 — After compile
BINARY(bytes_of_ptx_or_metallib)