Stage 2 — Parser
parse.y → parse.c — LEMON LALR(1) grammar that builds the AST
LEMON Parser Generator

SQLite uses LEMON, its own LALR(1) parser generator (similar to yacc/bison but with a cleaner interface and thread-safe output). The grammar is written in parse.y. The LEMON tool compiles this into parse.c (a large generated C file) which is then compiled into SQLite.

LEMON generates a table-driven shift-reduce parser. Grammar rule actions in parse.y are C code fragments that fire when a rule is completely matched. These actions construct AST nodes and — critically — also call code-generation functions directly.

Key insight: Unlike many compilers, SQLite's parser does not build a complete AST then hand it off. Grammar rule actions call code-gen functions (like sqlite3Select()) incrementally as rules reduce. The "AST" is short-lived.
Grammar Structure (parse.y)

The grammar file has roughly 2,100 lines and covers all of SQL that SQLite supports. Rules are written in the Lemon format: result ::= components . { action }

parse.y — top-level structure
/* Top-level rule */
input ::= cmdlist.
cmdlist ::= cmdlist ecmd.
cmdlist ::= ecmd.
ecmd ::= SEMI.
ecmd ::= cmdx SEMI.

/* All statement types */
cmdx ::= cmd.
cmd ::= BEGIN transtype trans_opt. { sqlite3BeginTransaction(pParse, ...); }
cmd ::= COMMIT|END trans_opt.      { sqlite3CommitTransaction(pParse); }
cmd ::= ROLLBACK trans_opt.        { sqlite3RollbackTransaction(pParse); }
cmd ::= select(X). {
  SelectDest dest = {SRT_Output, 0, 0, 0, 0, 0, 0};
  sqlite3Select(pParse, X, &dest);   /* ← code generator called here */
  sqlite3SelectDelete(pParse->db, X);
}
cmd ::= createkw TABLE ...   /* DDL */
cmd ::= insert_stmt(X).  { sqlite3Insert(pParse, ...); }
cmd ::= update_stmt(X).  { sqlite3Update(pParse, ...); }
cmd ::= delete_stmt(X).  { sqlite3DeleteFrom(pParse, ...); }
parse.y — SELECT rule (simplified)
select(A) ::= WITH wqlist(W) selectnowith(X). {
  A = attachWithToSelect(pParse, X, W);
}

select(A) ::= selectnowith(A).

selectnowith(A) ::= selectnowith(X) multiselect_op(Y) oneselect(Z). {
  /* Build linked list for UNION / INTERSECT / EXCEPT */
  A = sqlite3SelectNew(pParse, 0, 0, 0, 0, 0, 0, Y, 0);
  A->pPrior = X;
  A->pNext = Z;
}

oneselect(A) ::= SELECT distinct(D) selcollist(W) from(X)
                 where_opt(Y) groupby_opt(P) having_opt(Q)
                 orderby_opt(Z) limit_opt(L). {
  A = sqlite3SelectNew(pParse, W, X, Y, P, Q, Z, D, L);
}
parse.y — expression rules (simplified)
expr(A) ::= expr(X) AND expr(Y).  { A = sqlite3ExprAnd(pParse,X,Y); }
expr(A) ::= expr(X) OR  expr(Y).  { BINARY(TK_OR,  X, Y); }
expr(A) ::= expr(X) EQ  expr(Y).  { BINARY(TK_EQ,  X, Y); }
expr(A) ::= expr(X) NE  expr(Y).  { BINARY(TK_NE,  X, Y); }
expr(A) ::= expr(X) LT  expr(Y).  { BINARY(TK_LT,  X, Y); }
expr(A) ::= nm(X) DOT nm(Y).      { A = sqlite3PExpr(pParse, TK_DOT, X, Y); }
expr(A) ::= nm(X) LP exprlist(Y) RP. {   /* function call */
  A = sqlite3ExprFunction(pParse, Y, &X, 0);
}
AST Node Types

Grammar actions build lightweight structures that live on the heap and are freed promptly after code generation. The key ones:

StructureCreated byPurpose
Select sqlite3SelectNew() Represents one SELECT statement; linked list for compound queries (UNION etc.)
Expr sqlite3PExpr() Binary tree for expressions; each node carries op code, optional left/right children, and literal value
ExprList sqlite3ExprListAppend() Dynamic array of Expr* for SELECT column list, ORDER BY, etc.
SrcList sqlite3SrcListAppend() FROM clause: list of tables/subqueries with aliases
IdList sqlite3IdListAppend() List of bare identifiers (e.g. column names in INSERT)
Token tokenizer Lightweight: just pointer + length into the original SQL string (no copy)
sqliteInt.h — Expr node (simplified)
struct Expr {
  u8 op;           /* TK_AND, TK_EQ, TK_INTEGER, TK_COLUMN … */
  u8 affExpr;      /* column affinity */
  u32 flags;       /* EP_* flags */
  union {
    char *zToken;  /* TK_STRING, TK_ID, TK_VARIABLE */
    int iValue;    /* TK_INTEGER when EP_IntValue is set */
  } u;
  Expr *pLeft;     /* left subtree */
  Expr *pRight;    /* right subtree */
  union {
    ExprList *pList; /* function arguments / IN operator list */
    Select   *pSelect; /* subquery */
  } x;
  int iTable;      /* TK_COLUMN: cursor number of source table */
  int iColumn;     /* TK_COLUMN: column index */
  ...
};
Name Resolution (resolve.c)

After the AST is built, a name-resolution pass in resolve.c walks every Expr node and converts symbolic column references (like users.id) into table-cursor numbers and column indices. This is where "column not found" errors are raised.

The key function is sqlite3ResolveExprNames() which is called from within the parser actions for WHERE, ORDER BY, GROUP BY, and HAVING clauses.

Compilation Context: the Parse struct

The Parse struct threads through the entire tokenize → parse → codegen pipeline. It carries:

struct Parse {
  sqlite3 *db;        /* database connection */
  Vdbe *pVdbe;        /* the VDBE being built */
  int nErr;           /* error count */
  char *zErrMsg;      /* error message */
  Table *pNewTable;   /* table being CREATE'd */
  int nTab;           /* next available cursor number */
  int nMem;           /* next available register number */
  int nLabel;         /* next available label number */
  ...
};

When the parser fires a grammar action and calls, say, sqlite3Select(pParse, ...), the code generator uses pParse->pVdbe to emit opcodes and pParse->nMem to allocate registers.

Next Stage

Grammar actions directly invoke the code generator. sqlite3Select(), sqlite3Insert(), etc. walk the AST structures and emit VDBE opcodes.