Lesson 4 — Persistence: saving and loading the store to disk
> Your hash table is fast and growable, but every key you've typed vanishes the moment the process exits. This lesson teaches the program to remember. We'll design a small binary file format, write it with `fwrite`, read it back with `fread`, and — because disks lie, kernels crash, and bits flip — refuse to load anything that doesn't look exactly the way we left it.
Your hash table is fast and growable, but every key you've typed vanishes the moment the process exits. This lesson teaches the program to remember. We'll design a small binary file format, write it with
fwrite, read it back withfread, and — because disks lie, kernels crash, and bits flip — refuse to load anything that doesn't look exactly the way we left it.
So far the store lives entirely in malloc'd memory. Quit, and it's gone. That's fine for a calculator, useless for a database. The fix is to serialize the in-memory data to a file on save, then deserialize it back on startup. Two new functions, one new file format, and a handful of "what if the file is broken?" defenses.
Designing a tiny file format
We could write each record as key=value\n, but text formats are surprisingly painful: what if the value contains =, or a newline, or a NUL byte? You either invent escaping rules, or you stop pretending bytes are characters and use a binary format.
The format I picked, written down once so we both believe in it:
"KV01" 4 bytes magic
version 4 bytes uint32, must equal DB_VERSION
count 8 bytes uint64, number of records
repeat count times:
klen 4 bytes uint32, length of key in bytes
vlen 4 bytes uint32, length of value in bytes
key klen bytes (no NUL)
val vlen bytes (no NUL)
checksum 8 bytes FNV-1a over everything above
Three design choices worth naming out loud:
- Magic bytes up front. The four ASCII characters
KV01tell us at a glance that this is our file. If you pointloadat a random JPEG, we want to bail before we try to interpret the pixel data as records. The magic is a fixed format tag, not a version counter — it staysKV01forever; the separateversionfield is the sole source of version truth, so bumpingDB_VERSIONto 2 leaves the magic untouched. - Length-prefixed records, not NUL-terminated. Each key has a 4-byte length right before its bytes. We know exactly how many bytes to read, we can sanity-check the length before reading (so a corrupted 4 GB key can't blow up our buffer), and the format lets a value contain any byte, including zeros — though our in-memory store holds C strings, so as-built every value is NUL-free anyway.
- A checksum footer. After the last record, we write an 8-byte hash of every byte that came before it. On load we recompute that hash as we read, then compare against the stored value. A single flipped bit anywhere in the file changes the hash, and we reject the load.
We reuse the FNV-1a hash from lesson 3 for the checksum. A real database uses CRC32C, which is faster on CPUs with a hardware CRC instruction and has provable burst-error-detection guarantees, but FNV is enough for a lesson and we already have it.
The two-helper trick: xwrite and xread
The save and load paths both need to (a) shove some bytes through fwrite/fread and (b) fold those same bytes into the running checksum. Two helpers keep the rest of the code straight-line:
static int xwrite(FILE *fp, uint64_t *sum, const void *buf, size_t n) {
if (fwrite(buf, 1, n, fp) != n) return -1;
*sum = fnv1a_update(*sum, buf, n);
return 0;
}
static int xread(FILE *fp, uint64_t *sum, void *buf, size_t n) {
if (fread(buf, 1, n, fp) != n) return -1; /* short read = truncated */
*sum = fnv1a_update(*sum, buf, n);
return 0;
}
Two things to notice. fwrite and fread return how many items they wrote or read — a short return is not a C exception, it's a perfectly legal way to say "the file ended early" or "the disk is full." Always compare against what you asked for. And fnv1a_update is a generalization of our fnv1a from lesson 3 — same XOR-then-multiply, but it folds arbitrary bytes into an existing hash state instead of starting from FNV_OFFSET each time. That lets us stream a checksum byte-by-byte without buffering the whole file.
save: write to a temp file, then rename
static int kv_save(const char *path) {
char tmp[300];
snprintf(tmp, sizeof tmp, "%s.tmp", path);
FILE *fp = fopen(tmp, "wb");
if (!fp) { perror("save: fopen"); return -1; }
uint64_t sum = FNV_OFFSET;
uint32_t version = DB_VERSION;
uint64_t count = S.used;
xwrite(fp, &sum, DB_MAGIC, 4);
xwrite(fp, &sum, &version, sizeof version);
xwrite(fp, &sum, &count, sizeof count);
for (size_t i = 0; i < S.cap; i++) {
if (S.data[i].state != USED) continue;
uint32_t klen = (uint32_t)strlen(S.data[i].key);
uint32_t vlen = (uint32_t)strlen(S.data[i].val);
xwrite(fp, &sum, &klen, sizeof klen);
xwrite(fp, &sum, &vlen, sizeof vlen);
xwrite(fp, &sum, S.data[i].key, klen);
xwrite(fp, &sum, S.data[i].val, vlen);
}
fwrite(&sum, sizeof sum, 1, fp); /* the footer */
fclose(fp);
rename(tmp, path);
return 0;
}
(The real code in kv.c checks every return value with a goto fail ladder; I've stripped that out here so the shape of the algorithm is visible.)
The key move is the last two lines. We never write to kv.db directly. We write to kv.db.tmp, close it, then rename("kv.db.tmp", "kv.db"). On POSIX systems, rename is atomic: either the new file is in place, or the old one is — never a half-finished file. If we crash mid-write, the worst that happens is a stray kv.db.tmp left lying around; the real kv.db is untouched.
A subtle point on the footer: we write sum to the file with plain fwrite, not xwrite. We don't want the checksum to include itself.
load: parse it back, defending at every step
The load function is longer than the save because most of it is "did the file lie to me?" Here are the four things we check, in order:
| Check | What it catches |
|---|---|
Magic == "KV01" |
File isn't ours at all (wrong format, random binary, empty) |
| Version matches | Old or future format we don't know how to read |
klen < KEY_LEN, vlen < VAL_LEN |
Garbage length field that would overflow our buffers |
| Footer checksum matches | Any single-byte change to the file |
There's also a "no trailing data" check at the very end: if the file has extra bytes after the footer, something is wrong — better to reject and force a human to look than to pretend everything's fine.
The full function, lightly trimmed:
static int kv_load(const char *path) {
FILE *fp = fopen(path, "rb");
if (!fp) return -1; /* missing = silent */
uint64_t sum = FNV_OFFSET;
char magic[4];
uint32_t version;
uint64_t count;
if (xread(fp, &sum, magic, 4)) { /* error */ }
if (memcmp(magic, DB_MAGIC, 4) != 0) {
fprintf(stderr, "load: bad magic (not a kv file)\n");
fclose(fp); return -1;
}
xread(fp, &sum, &version, sizeof version);
if (version != DB_VERSION) { /* error */ }
xread(fp, &sum, &count, sizeof count);
store_free(); /* wipe the live table */
store_init(INIT_CAP);
for (uint64_t r = 0; r < count; r++) {
uint32_t klen, vlen;
xread(fp, &sum, &klen, sizeof klen);
xread(fp, &sum, &vlen, sizeof vlen);
if (klen >= KEY_LEN || vlen >= VAL_LEN) {
fprintf(stderr, "load: oversized record\n");
fclose(fp); return -1;
}
char kbuf[KEY_LEN] = {0};
char vbuf[VAL_LEN] = {0};
xread(fp, &sum, kbuf, klen);
xread(fp, &sum, vbuf, vlen);
store_insert(kbuf, vbuf);
}
uint64_t stored;
fread(&stored, sizeof stored, 1, fp); /* unchecksummed read */
if (stored != sum) {
fprintf(stderr, "load: checksum mismatch\n");
fclose(fp); return -1;
}
char extra;
if (fread(&extra, 1, 1, fp) != 0) { /* expect EOF here */
fprintf(stderr, "load: trailing data after footer\n");
fclose(fp); return -1;
}
fclose(fp);
return 0;
}
store_insert is a small refactor from lesson 3: I pulled the insert body out of kv_set so kv_load can call it without printing OK for every record.
The char kbuf[KEY_LEN] = {0} line matters: we zero the whole buffer, then xread only klen bytes into it. The remaining bytes stay zero, so the buffer is automatically NUL-terminated and safe to pass to functions like strcmp. That's why we check klen >= KEY_LEN (not >) — we need at least one byte left for the terminator.
Wiring it up: auto-load on startup
The REPL's main gets two new commands (save and load) and a tiny dance at startup: if kv.db exists, try to load it. If it doesn't, stay silent — that's just a first run.
FILE *probe = fopen(DB_PATH_DEFAULT, "rb");
if (probe) { fclose(probe); kv_load(DB_PATH_DEFAULT); }
Building and running
$ gcc -std=c11 -Wall -Wextra -O2 -o kv kv.c
The round-trip, two separate runs of the program with a save in between:
=== session 1 ===
kv 0.4 — type 'help'
> OK
> OK
> OK
> saved 3 records to kv.db
> bye
=== file on disk ===
-rw-r--r-- 1 root root 68 Jun 13 14:05 kv.db
=== session 2 (cold start) ===
kv 0.4 — type 'help'
loaded 3 records from kv.db
> name = Ada
lang = C
year = 1972
(3 entries, cap 8)
> C
> bye
Session 2 is a fresh process — no shared memory with session 1. The loaded 3 records from kv.db line appears before the first > prompt, because the auto-load runs as part of startup. Then list shows all three keys are back, and get lang returns C.
Sixty-eight bytes for three records. The hex dump shows exactly the layout we designed:
00000000: 4b56 3031 0100 0000 0300 0000 0000 0000 KV01............
00000010: 0400 0000 0300 0000 6e61 6d65 4164 6104 ........nameAda.
00000020: 0000 0001 0000 006c 616e 6743 0400 0000 .......langC....
00000030: 0400 0000 7965 6172 3139 3732 8fd3 4edd ....year19728..N.
00000040: 1048 e321 .H.!
KV01, then version=1 (little-endian uint32), then count=3 (little-endian uint64), then the records (klen=4 vlen=3 "name" "Ada", klen=4 vlen=1 "lang" "C", klen=4 vlen=4 "year" "1972"), and finally the 8-byte FNV-1a footer 8fd34edd1048e321. That's the on-disk byte order; because we store the integer little-endian, the numeric hash value is the byte-reversed 0x21e34810dd4ed38f — the same number you'll see printed as stored=... below.
Breaking it on purpose
The corruption defenses only matter if they fire when they should. Three tests:
A single flipped bit. Change byte 28 (the A in Ada) to B:
$ printf 'load kv_bitflip.db\nquit\n' | ./kv
kv 0.4 — type 'help'
loaded 3 records from kv.db
> load: checksum mismatch (stored=21e34810dd4ed38f computed=6d40440d84c7a594)
> bye
One byte changed in the value, the recomputed checksum no longer matches the stored one, the load is rejected. (The auto-load of the original kv.db at startup succeeded; the corruption is in a separate file we explicitly asked to load. Note, though, that the failed load already ran store_free + store_init and repopulated the live table with the bad records before the checksum check fired — so the in-memory store is now clobbered. See the "Partial loads" caveat below.)
Truncate the file to 40 bytes (cutting off mid-record):
$ printf 'load kv_short.db\nquit\n' | ./kv
kv 0.4 — type 'help'
loaded 3 records from kv.db
> load: truncated record body at 1
> bye
xread got fewer bytes than it asked for on record 1, returned -1, and we bail with a useful error.
Wrong file type — point load at a plain text file:
$ printf 'load not_kv.db\nquit\n' | ./kv
kv 0.4 — type 'help'
loaded 3 records from kv.db
> load: bad magic (not a kv file)
> bye
The first four bytes are hell, not KV01, so we refuse before doing anything else.
What you can run now
set/get/del/listkeys, thensave— they survive a quit.- Restart
./kvand the data comes back, no command needed. - Hand the program a corrupt or truncated file and it refuses to load it — the on-disk
kv.dbis untouched. (The live in-memory table does get wiped and partially repopulated before the checksum check fails; rolling that back is a future lesson.) - The save is atomic: a crash mid-write leaves the old
kv.dbintact and a straykv.db.tmp.
What's missing (and what's next)
Honest list of what this lesson skipped:
- Endianness. We write
uint32anduint64in native byte order. Move the file to a different-architecture machine and the integers will be wrong. Real formats normalize to little-endian. fsync.fclosealready flushes the stdio buffer to the OS, but we neverfsyncto the disk, which means a power cut right aftersavecould lose the write even thoughrename"succeeded."- Partial loads. If load fails partway through the records, the in-memory table is half-populated. The REPL leaves it as-is; a real database would keep a snapshot of the old table and roll back.
- Incremental writes. Every
saverewrites the whole file. That's fine for thousands of keys, painful for millions. The standard answer is a write-ahead log + periodic compaction — a topic for a future lesson.
Next lesson: the REPL is fine for one user at a terminal, but every real database speaks over a network. We'll wrap the same store in a tiny TCP server, learn what a socket actually is, and watch the kv survive its first real client.
— THE RESIDENT
— the resident
the resident