What the VFS Is
The VFS (Virtual File System) is SQLite's portability layer. All file I/O — opening, reading, writing, locking, fsync — goes through a two-level vtable structure instead of calling OS functions directly:
sqlite3_vfs— represents a VFS implementation. Its methods open/delete/access files and provide OS-level services (randomness, time, sleep).sqlite3_file— represents one open file handle. Its methods (viasqlite3_io_methods) handle read/write/sync/lock on that specific file.
VFS Vtable Structures
/* sqlite3.h — VFS interface (simplified) */
struct sqlite3_vfs {
int iVersion; /* 1, 2, or 3 */
int szOsFile; /* size in bytes of the sqlite3_file object */
int mxPathname; /* max pathname length */
sqlite3_vfs *pNext; /* linked list of registered VFS impls */
const char *zName; /* VFS name, e.g. "unix", "win32" */
void *pAppData; /* opaque pointer for the implementation */
/* Methods */
int (*xOpen)(sqlite3_vfs*, const char *zName, sqlite3_file*,
int flags, int *pOutFlags);
int (*xDelete)(sqlite3_vfs*, const char *zName, int syncDir);
int (*xAccess)(sqlite3_vfs*, const char *zName,
int flags, int *pResOut);
int (*xFullPathname)(sqlite3_vfs*, const char *zName,
int nOut, char *zOut);
int (*xRandomness)(sqlite3_vfs*, int nByte, char *zOut);
int (*xSleep)(sqlite3_vfs*, int microseconds);
int (*xCurrentTime)(sqlite3_vfs*, double*);
...
};
struct sqlite3_io_methods {
int iVersion;
int (*xClose)(sqlite3_file*);
int (*xRead)(sqlite3_file*, void*, int iAmt, sqlite3_int64 iOfst);
int (*xWrite)(sqlite3_file*, const void*, int iAmt, sqlite3_int64 iOfst);
int (*xTruncate)(sqlite3_file*, sqlite3_int64 size);
int (*xSync)(sqlite3_file*, int flags);
int (*xFileSize)(sqlite3_file*, sqlite3_int64 *pSize);
int (*xLock)(sqlite3_file*, int);
int (*xUnlock)(sqlite3_file*, int);
int (*xCheckReservedLock)(sqlite3_file*, int *pResOut);
...
};
os.c — Thin Dispatch Wrappers
os.c provides inline wrappers that call through to the vtable methods. The Pager calls these wrappers; it never calls the VFS methods directly. This allows OS-level instrumentation, error injection, and testing to be added in one place.
/* os.c:88 — read wrapper */
int sqlite3OsRead(sqlite3_file *id, void *pBuf, int amt, i64 offset){
DO_OS_MALLOC_TEST(id);
return id->pMethods->xRead(id, pBuf, amt, offset);
}
/* os.c:92 — write wrapper */
int sqlite3OsWrite(sqlite3_file *id, const void *pBuf, int amt, i64 offset){
DO_OS_MALLOC_TEST(id);
return id->pMethods->xWrite(id, pBuf, amt, offset);
}
/* os.c:215 — open: selects VFS and calls xOpen */
int sqlite3OsOpen(
sqlite3_vfs *pVfs,
const char *zPath,
sqlite3_file *pFile,
int flags,
int *pFlagsOut
){
int rc;
DO_OS_MALLOC_TEST(0);
/* SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE | ... */
rc = pVfs->xOpen(pVfs, zPath, pFile, flags, pFlagsOut);
return rc;
}
Unix Implementation (os_unix.c)
os_unix.c (~6,500 lines) is the default VFS on Linux, macOS, and other POSIX systems. It handles the full complexity of POSIX file locking semantics, which differ between operating systems and even between NFS versions.
/* os_unix.c:3512 — unixRead: pread() with short-read handling */
static int unixRead(
sqlite3_file *id,
void *pBuf,
int amt,
sqlite3_int64 offset
){
unixFile *pFile = (unixFile *)id;
int got;
got = osPread(pFile->h, pBuf, amt, offset); /* pread(2) syscall */
if( got==amt ){
return SQLITE_OK;
}else if( got<0 ){
return SQLITE_IOERR_READ;
}else{
/* short read: zero-fill the remainder */
memset(&((char*)pBuf)[got], 0, amt-got);
return SQLITE_IOERR_SHORT_READ;
}
}
/* os_unix.c:3643 — unixWrite: pwrite() with retry on EINTR */
static int unixWrite(
sqlite3_file *id,
const void *pBuf,
int amt,
sqlite3_int64 offset
){
unixFile *pFile = (unixFile *)id;
int wrote = 0;
do {
int rc = osPwrite(pFile->h, pBuf, amt, offset);
if( rc<0 ){
if( errno==EINTR ) continue; /* retry interrupted writes */
return SQLITE_IOERR_WRITE;
}
amt -= rc;
offset += rc;
pBuf = &((char*)pBuf)[rc];
wrote += rc;
} while( amt > 0 );
return SQLITE_OK;
}
File Locking
SQLite uses a 5-level locking protocol to coordinate concurrent access. Locks are advisory POSIX byte-range locks (fcntl(F_SETLK)) on specific byte ranges of the database file.
| Lock Level | Held by | Effect |
|---|---|---|
UNLOCKED |
idle | No locks held |
SHARED |
readers | Multiple allowed; allows reads; blocks pending EXCLUSIVE |
RESERVED |
one writer | Signals intent to write; readers still allowed |
PENDING |
one writer | Waits for existing readers to finish; blocks new SHARED |
EXCLUSIVE |
one writer | No other readers or writers; full write access |
In WAL mode, locking is handled differently: readers and the writer use separate WAL-index lock slots, so a writer never needs to wait for readers to finish.
Registering a Custom VFS
An application can register a custom VFS with sqlite3_vfs_register(). The in-memory VFS (":memory:") and the memdb VFS are built-in examples. The kvvfs in os_kv.c stores pages in a key-value store.
/* Register a custom VFS */
sqlite3_vfs myVfs = {
3, /* iVersion */
sizeof(MyFile), /* szOsFile */
MAX_PATH, /* mxPathname */
0, /* pNext (filled by SQLite) */
"myvfs", /* zName */
0, /* pAppData */
myVfsOpen,
myVfsDelete,
myVfsAccess,
...
};
sqlite3_vfs_register(&myVfs, 0 /* not default */);
/* Then open a database using it */
sqlite3_open_v2("path", &db, SQLITE_OPEN_READWRITE,
"myvfs" /* VFS name */);
Back to Overview
That's all seven stages of the SQLite data flow — from SQL text to bytes on disk. Return to the overview for a summary diagram.