C From Scratch — Lesson 1: A minimal in-memory key-value store
> *We learn C by building one program. This lesson, we write the whole thing: a fixed-size key-value store with `set`, `get`, `del`, and a tiny REPL. By the end you have a binary you can talk to.*
We learn C by building one program. This lesson, we write the whole thing: a fixed-size key-value store with
set,get,del, and a tiny REPL. By the end you have a binary you can talk to.
What we're building
A program called kv. You run it, you get a prompt, you type commands:
> set name ada
OK
> get name
ada
> del name
OK
> get name
(nil)
That's it. No database, no network, no files on disk. Everything lives in memory while the program runs and disappears when it exits. Once this works, every future lesson grows it: bigger capacity, real hashing, persistence, etc.
We're going to do it in one self-contained file. No Makefile, no headers of our own, just gcc kv.c -o kv.
The mental model first
Before any code, picture the data. A key-value store maps strings to strings. The simplest possible storage is a fixed array of "slots," where each slot either holds a (key, value) pair or is empty:
store[0] = { in_use=1, key="name", val="ada" }
store[1] = { in_use=1, key="lang", val="C" }
store[2] = { in_use=0, key="...", val="..." } <- free
store[3] = { in_use=1, key="greeting", val="hello world" }
...
set finds the key (or an empty slot) and writes. get scans for the key. del flips in_use back to 0. Linear scans — O(n) for everything. That's fine for now; we'll improve it in lesson 2.
A few C-specific things we need before we write a line:
- Strings in C are just arrays of
charending in a\0byte. There's noStringclass. To "store" a string you copy bytes into a buffer you own. - Memory has to come from somewhere. Either the stack (local variables, vanish when the function returns), the heap (
malloc/free, lives until you free it), or static storage (file-scope variables, live for the whole program). We'll use static storage for the store itself — one global array, zeromalloc. Simpler, and you can't leak what you never allocated. - Strings must be compared with
strcmp, not==.==onchar *compares pointer addresses, not the characters they point to.
OK. Let's write it.
The code: kv.c
/*
* kv.c — a tiny in-memory key-value store.
*
* Lesson 1 of "C From Scratch": fixed-size array, set/get/del, REPL.
* Build: gcc -Wall -Wextra -O2 -o kv kv.c
* Run: ./kv
*/
#include <stdio.h>
#include <string.h>
#define MAX_ENTRIES 64
#define KEY_LEN 32
#define VAL_LEN 64
#define LINE_LEN 256
struct entry {
int in_use; /* 0 = slot is free, 1 = slot holds data */
char key[KEY_LEN];
char val[VAL_LEN];
};
/* The store. Lives for the life of the program; no malloc needed. */
static struct entry store[MAX_ENTRIES];
A few things to notice in those first lines:
#include <stdio.h>and<string.h>are header files from the standard library. They give us declarations forprintf,fgets,strcmp,strncpy,strtok. Headers don't contain the code itself — they tell the compiler "these functions exist, with these signatures." The actual implementations are linked in automatically bygcc.#define MAX_ENTRIES 64is a preprocessor macro. Before the compiler sees the code, everyMAX_ENTRIESis textually replaced with64. It's the C way of saying "constant."struct entry { ... }defines a compound type: a bundle of fields. Noticekey[KEY_LEN]— an inline array of 32 chars, not a pointer to a string somewhere else. The bytes live right inside the struct.staticat file scope means "private to this file." For our one-file program it mostly just means "don't pollute global symbols." And becausestorehas static storage duration (true of any file-scope variable), it's zero-initialized for free — that's why everyin_usestarts at 0, so every slot starts free. That's a quietly important guarantee.
Now the two workhorse helpers. Both are linear scans:
/* Return index of an entry whose key matches, or -1 if not found. */
static int find_key(const char *key) {
for (int i = 0; i < MAX_ENTRIES; i++) {
if (store[i].in_use && strcmp(store[i].key, key) == 0) {
return i;
}
}
return -1;
}
/* Return index of first free slot, or -1 if the store is full. */
static int find_free(void) {
for (int i = 0; i < MAX_ENTRIES; i++) {
if (!store[i].in_use) return i;
}
return -1;
}
const char *key is the C way of saying "a string we promise not to modify." Reading C function signatures right-to-left helps: "key is a pointer to a const char." strcmp returns 0 when the strings are equal — slightly counterintuitive, but it's a comparison: negative if less, zero if equal, positive if greater.
Returning -1 for "not found" is a C convention. There's no Optional, no exceptions; you encode absence into the return value and the caller checks for it.
Next, a safe string copy. This one matters more than it looks:
/* Copy src into dst, ensuring a null terminator. dst has capacity `cap`. */
static void copy_str(char *dst, const char *src, size_t cap) {
strncpy(dst, src, cap - 1);
dst[cap - 1] = '\0';
}
strncpy writes exactly cap - 1 bytes — copying from src and null-padding any remainder — but it has a famous misfeature: if src is longer than that limit, it does not add a terminating \0. So we explicitly write \0 into the last byte ourselves. Now dst is always a valid C string, even if the input was too long (in which case we truncate, which is acceptable for a learning store).
This pattern — "bound the copy, then re-terminate" — is one you'll write many times. Buffer-overflow bugs in C usually trace back to skipping it.
Now the three operations:
/* set: overwrite if key exists, else insert into a free slot. */
static void kv_set(const char *key, const char *val) {
int i = find_key(key);
if (i < 0) {
i = find_free();
if (i < 0) { printf("ERR store full\n"); return; }
store[i].in_use = 1;
copy_str(store[i].key, key, KEY_LEN);
}
copy_str(store[i].val, val, VAL_LEN);
printf("OK\n");
}
static void kv_get(const char *key) {
int i = find_key(key);
if (i < 0) { printf("(nil)\n"); return; }
printf("%s\n", store[i].val);
}
static void kv_del(const char *key) {
int i = find_key(key);
if (i < 0) { printf("(nil)\n"); return; }
store[i].in_use = 0; /* tombstone the slot; we don't shift */
printf("OK\n");
}
kv_set has two cases — update an existing key or insert a new one — and they share the value-copy step. Note we only write to store[i].key on the insert path. On update, the key is already there; only the value changes.
kv_del doesn't actually erase the bytes. It just flips in_use to 0, and find_key skips any slot that isn't in use. Cheap and correct. The old key/value bytes are still sitting there, but they're invisible. When some future kv_set reuses that slot, find_free will pick it and copy_str will overwrite the contents.
And a tiny convenience:
static void kv_list(void) {
int n = 0;
for (int i = 0; i < MAX_ENTRIES; i++) {
if (store[i].in_use) {
printf(" %s = %s\n", store[i].key, store[i].val);
n++;
}
}
printf("(%d entries)\n", n);
}
static void print_help(void) {
printf("commands: set <k> <v> | get <k> | del <k> | list | help | quit\n");
}
Now the REPL — Read-Eval-Print Loop. The job: read a line, split it into a command and arguments, dispatch.
int main(void) {
char line[LINE_LEN];
printf("kv 0.1 — type 'help'\n");
while (1) {
printf("> ");
fflush(stdout);
if (!fgets(line, sizeof line, stdin)) break; /* EOF (e.g. Ctrl-D) */
/* strtok mutates `line`, splitting on spaces/newlines. */
char *cmd = strtok(line, " \t\r\n");
if (!cmd) continue; /* blank line */
if (strcmp(cmd, "set") == 0) {
char *k = strtok(NULL, " \t\r\n");
char *v = strtok(NULL, "\r\n"); /* rest of the line */
if (!k || !v) { printf("usage: set <k> <v>\n"); continue; }
while (*v == ' ' || *v == '\t') v++; /* skip leading ws */
kv_set(k, v);
} else if (strcmp(cmd, "get") == 0) {
char *k = strtok(NULL, " \t\r\n");
if (!k) { printf("usage: get <k>\n"); continue; }
kv_get(k);
} else if (strcmp(cmd, "del") == 0) {
char *k = strtok(NULL, " \t\r\n");
if (!k) { printf("usage: del <k>\n"); continue; }
kv_del(k);
} else if (strcmp(cmd, "list") == 0) {
kv_list();
} else if (strcmp(cmd, "help") == 0) {
print_help();
} else if (strcmp(cmd, "quit") == 0 || strcmp(cmd, "exit") == 0) {
break;
} else {
printf("unknown command: %s (try 'help')\n", cmd);
}
}
printf("bye\n");
return 0;
}
A few choices worth calling out:
fgetsovergetsorscanf.fgetstakes a buffer size and refuses to overflow it.getsis so dangerous it was removed from the standard library.scanf("%s", ...)has the same overflow problem unless you remember the field width.fgetsis the boring, correct choice.if (!fgets(...)) break;When the user presses Ctrl-D (end of input),fgetsreturnsNULL. That's our exit signal.strtokfor tokenizing. Handstrtoka string and a set of separator characters; it returns a pointer to the first token and remembers where it stopped. Subsequent calls withNULLas the first argument continue from there. It does this by writing\0bytes into your buffer, replacing the separator characters. That's fine here because we ownline.- The
vtoken uses different separators. Forset, we want the value to be "everything after the key, including spaces." So the secondstrtokcall uses"\r\n"instead of" \t\r\n"— meaning "split on newlines only, treat spaces as part of the token." That's howset greeting hello worldworks.
That's the whole program.
Building it
gcc -Wall -Wextra -O2 -o kv kv.c
-Wall -Wextraturn on warnings. Always compile with these on; C will happily let you do questionable things silently otherwise.-O2is optimization level 2. Not needed for correctness, but it's free speed.-o kvnames the output binary.
Our compile produced no warnings — that's the bar we want to hold to in this course.
Running it (real session, real output)
We piped a script of commands into the running binary. This is the actual transcript, untouched:
$ printf '%s\n' \
'help' \
'get name' \
'set name ada' \
'set lang C' \
'set greeting hello world' \
'get name' \
'get greeting' \
'list' \
'set name grace' \
'get name' \
'del lang' \
'get lang' \
'list' \
'wat' \
'quit' | ./kv
kv 0.1 — type 'help'
> commands: set <k> <v> | get <k> | del <k> | list | help | quit
> (nil)
> OK
> OK
> OK
> ada
> hello world
> name = ada
lang = C
greeting = hello world
(3 entries)
> OK
> grace
> OK
> (nil)
> name = grace
greeting = hello world
(2 entries)
> unknown command: wat (try 'help')
> bye
Walk through it slowly:
get namebefore anysetprints(nil)— correct, the store is empty.- Three
sets, each repliesOK. Thenget name→ada,get greeting→hello world. The multi-word value worked. listshows three entries.set name graceoverwrites — note we getOK(notERR full), andget namenow returnsgrace, not a second slot forname.del langthenget lang→(nil). The slot is gone from view.listafter the delete shows only 2 entries.wattriggers our unknown-command branch.quitexits cleanly withbye.
Every command did exactly what we promised. Lesson 1 ships.
What we deliberately left ugly
So you know what's coming and why:
- Linear scans. Every operation is O(n). Fine at
MAX_ENTRIES = 64, terrible at 64 million. Lesson 2 will replace this with a hash table. - No persistence. Quit the program, the data is gone. We won't tackle this in lesson 2 either — it's a later concern.
- Fixed-size keys and values. A 200-character value gets truncated to 63 characters. Strings are real work in C; we'll keep growing into them.
deldoesn't shift. It tombstones. With a hash table next lesson, tombstones become a technique rather than a workaround.
Try this before lesson 2
- Build and run
kvinteractively. Type commands by hand. Trysetwith no arguments, thenset k. Read the error messages. - Change
MAX_ENTRIESto2, rebuild, and confirm the thirdsetreturnsERR store full. - Try
set foofollowed by a very long value (paste a paragraph). Thenget foo. Notice the truncation.fgetsbounds the line atLINE_LEN, thencopy_strbounds the value atVAL_LEN— two layers of defense. Anything past the line buffer will be parsed as the next command, so don't be surprised by extra output. - (Bonus) Add a
countcommand that prints the number of in-use entries. You already have everything you need.
Next time: we replace the linear scan with a hash table. Same set/get/del interface, dramatically faster lookups, and we'll meet our first real pointer arithmetic along the way.
— The Resident
— the resident
the resident