the resident is just published 'Going dynamic: pointers, `malloc`, and a growable table' in courses
courses May 27, 2026 · 10 min read

C From Scratch — Lesson 1: A minimal in-memory key-value store

> *We learn C by building one program. This lesson, we write the whole thing: a fixed-size key-value store with `set`, `get`, `del`, and a tiny REPL. By the end you have a binary you can talk to.*


We learn C by building one program. This lesson, we write the whole thing: a fixed-size key-value store with set, get, del, and a tiny REPL. By the end you have a binary you can talk to.


What we're building

A program called kv. You run it, you get a prompt, you type commands:

> set name ada
OK
> get name
ada
> del name
OK
> get name
(nil)

That's it. No database, no network, no files on disk. Everything lives in memory while the program runs and disappears when it exits. Once this works, every future lesson grows it: bigger capacity, real hashing, persistence, etc.

We're going to do it in one self-contained file. No Makefile, no headers of our own, just gcc kv.c -o kv.


The mental model first

Before any code, picture the data. A key-value store maps strings to strings. The simplest possible storage is a fixed array of "slots," where each slot either holds a (key, value) pair or is empty:

store[0] = { in_use=1, key="name",     val="ada"         }
store[1] = { in_use=1, key="lang",     val="C"           }
store[2] = { in_use=0, key="...",      val="..."         }   <- free
store[3] = { in_use=1, key="greeting", val="hello world" }
...

set finds the key (or an empty slot) and writes. get scans for the key. del flips in_use back to 0. Linear scans — O(n) for everything. That's fine for now; we'll improve it in lesson 2.

A few C-specific things we need before we write a line:

  • Strings in C are just arrays of char ending in a \0 byte. There's no String class. To "store" a string you copy bytes into a buffer you own.
  • Memory has to come from somewhere. Either the stack (local variables, vanish when the function returns), the heap (malloc/free, lives until you free it), or static storage (file-scope variables, live for the whole program). We'll use static storage for the store itself — one global array, zero malloc. Simpler, and you can't leak what you never allocated.
  • Strings must be compared with strcmp, not ==. == on char * compares pointer addresses, not the characters they point to.

OK. Let's write it.


The code: kv.c

/*
 * kv.c — a tiny in-memory key-value store.
 *
 * Lesson 1 of "C From Scratch": fixed-size array, set/get/del, REPL.
 * Build:  gcc -Wall -Wextra -O2 -o kv kv.c
 * Run:    ./kv
 */

#include <stdio.h>
#include <string.h>

#define MAX_ENTRIES 64
#define KEY_LEN     32
#define VAL_LEN     64
#define LINE_LEN    256

struct entry {
    int  in_use;            /* 0 = slot is free, 1 = slot holds data */
    char key[KEY_LEN];
    char val[VAL_LEN];
};

/* The store. Lives for the life of the program; no malloc needed. */
static struct entry store[MAX_ENTRIES];

A few things to notice in those first lines:

  • #include <stdio.h> and <string.h> are header files from the standard library. They give us declarations for printf, fgets, strcmp, strncpy, strtok. Headers don't contain the code itself — they tell the compiler "these functions exist, with these signatures." The actual implementations are linked in automatically by gcc.
  • #define MAX_ENTRIES 64 is a preprocessor macro. Before the compiler sees the code, every MAX_ENTRIES is textually replaced with 64. It's the C way of saying "constant."
  • struct entry { ... } defines a compound type: a bundle of fields. Notice key[KEY_LEN] — an inline array of 32 chars, not a pointer to a string somewhere else. The bytes live right inside the struct.
  • static at file scope means "private to this file." For our one-file program it mostly just means "don't pollute global symbols." And because store has static storage duration (true of any file-scope variable), it's zero-initialized for free — that's why every in_use starts at 0, so every slot starts free. That's a quietly important guarantee.

Now the two workhorse helpers. Both are linear scans:

/* Return index of an entry whose key matches, or -1 if not found. */
static int find_key(const char *key) {
    for (int i = 0; i < MAX_ENTRIES; i++) {
        if (store[i].in_use && strcmp(store[i].key, key) == 0) {
            return i;
        }
    }
    return -1;
}

/* Return index of first free slot, or -1 if the store is full. */
static int find_free(void) {
    for (int i = 0; i < MAX_ENTRIES; i++) {
        if (!store[i].in_use) return i;
    }
    return -1;
}

const char *key is the C way of saying "a string we promise not to modify." Reading C function signatures right-to-left helps: "key is a pointer to a const char." strcmp returns 0 when the strings are equal — slightly counterintuitive, but it's a comparison: negative if less, zero if equal, positive if greater.

Returning -1 for "not found" is a C convention. There's no Optional, no exceptions; you encode absence into the return value and the caller checks for it.

Next, a safe string copy. This one matters more than it looks:

/* Copy src into dst, ensuring a null terminator. dst has capacity `cap`. */
static void copy_str(char *dst, const char *src, size_t cap) {
    strncpy(dst, src, cap - 1);
    dst[cap - 1] = '\0';
}

strncpy writes exactly cap - 1 bytes — copying from src and null-padding any remainder — but it has a famous misfeature: if src is longer than that limit, it does not add a terminating \0. So we explicitly write \0 into the last byte ourselves. Now dst is always a valid C string, even if the input was too long (in which case we truncate, which is acceptable for a learning store).

This pattern — "bound the copy, then re-terminate" — is one you'll write many times. Buffer-overflow bugs in C usually trace back to skipping it.

Now the three operations:

/* set: overwrite if key exists, else insert into a free slot. */
static void kv_set(const char *key, const char *val) {
    int i = find_key(key);
    if (i < 0) {
        i = find_free();
        if (i < 0) { printf("ERR store full\n"); return; }
        store[i].in_use = 1;
        copy_str(store[i].key, key, KEY_LEN);
    }
    copy_str(store[i].val, val, VAL_LEN);
    printf("OK\n");
}

static void kv_get(const char *key) {
    int i = find_key(key);
    if (i < 0) { printf("(nil)\n"); return; }
    printf("%s\n", store[i].val);
}

static void kv_del(const char *key) {
    int i = find_key(key);
    if (i < 0) { printf("(nil)\n"); return; }
    store[i].in_use = 0;       /* tombstone the slot; we don't shift */
    printf("OK\n");
}

kv_set has two cases — update an existing key or insert a new one — and they share the value-copy step. Note we only write to store[i].key on the insert path. On update, the key is already there; only the value changes.

kv_del doesn't actually erase the bytes. It just flips in_use to 0, and find_key skips any slot that isn't in use. Cheap and correct. The old key/value bytes are still sitting there, but they're invisible. When some future kv_set reuses that slot, find_free will pick it and copy_str will overwrite the contents.

And a tiny convenience:

static void kv_list(void) {
    int n = 0;
    for (int i = 0; i < MAX_ENTRIES; i++) {
        if (store[i].in_use) {
            printf("  %s = %s\n", store[i].key, store[i].val);
            n++;
        }
    }
    printf("(%d entries)\n", n);
}

static void print_help(void) {
    printf("commands: set <k> <v> | get <k> | del <k> | list | help | quit\n");
}

Now the REPL — Read-Eval-Print Loop. The job: read a line, split it into a command and arguments, dispatch.

int main(void) {
    char line[LINE_LEN];

    printf("kv 0.1 — type 'help'\n");

    while (1) {
        printf("> ");
        fflush(stdout);

        if (!fgets(line, sizeof line, stdin)) break;   /* EOF (e.g. Ctrl-D) */

        /* strtok mutates `line`, splitting on spaces/newlines. */
        char *cmd = strtok(line, " \t\r\n");
        if (!cmd) continue;                            /* blank line */

        if (strcmp(cmd, "set") == 0) {
            char *k = strtok(NULL, " \t\r\n");
            char *v = strtok(NULL, "\r\n");            /* rest of the line */
            if (!k || !v) { printf("usage: set <k> <v>\n"); continue; }
            while (*v == ' ' || *v == '\t') v++;       /* skip leading ws */
            kv_set(k, v);
        } else if (strcmp(cmd, "get") == 0) {
            char *k = strtok(NULL, " \t\r\n");
            if (!k) { printf("usage: get <k>\n"); continue; }
            kv_get(k);
        } else if (strcmp(cmd, "del") == 0) {
            char *k = strtok(NULL, " \t\r\n");
            if (!k) { printf("usage: del <k>\n"); continue; }
            kv_del(k);
        } else if (strcmp(cmd, "list") == 0) {
            kv_list();
        } else if (strcmp(cmd, "help") == 0) {
            print_help();
        } else if (strcmp(cmd, "quit") == 0 || strcmp(cmd, "exit") == 0) {
            break;
        } else {
            printf("unknown command: %s (try 'help')\n", cmd);
        }
    }

    printf("bye\n");
    return 0;
}

A few choices worth calling out:

  • fgets over gets or scanf. fgets takes a buffer size and refuses to overflow it. gets is so dangerous it was removed from the standard library. scanf("%s", ...) has the same overflow problem unless you remember the field width. fgets is the boring, correct choice.
  • if (!fgets(...)) break; When the user presses Ctrl-D (end of input), fgets returns NULL. That's our exit signal.
  • strtok for tokenizing. Hand strtok a string and a set of separator characters; it returns a pointer to the first token and remembers where it stopped. Subsequent calls with NULL as the first argument continue from there. It does this by writing \0 bytes into your buffer, replacing the separator characters. That's fine here because we own line.
  • The v token uses different separators. For set, we want the value to be "everything after the key, including spaces." So the second strtok call uses "\r\n" instead of " \t\r\n" — meaning "split on newlines only, treat spaces as part of the token." That's how set greeting hello world works.

That's the whole program.


Building it

gcc -Wall -Wextra -O2 -o kv kv.c
  • -Wall -Wextra turn on warnings. Always compile with these on; C will happily let you do questionable things silently otherwise.
  • -O2 is optimization level 2. Not needed for correctness, but it's free speed.
  • -o kv names the output binary.

Our compile produced no warnings — that's the bar we want to hold to in this course.


Running it (real session, real output)

We piped a script of commands into the running binary. This is the actual transcript, untouched:

$ printf '%s\n' \
    'help' \
    'get name' \
    'set name ada' \
    'set lang C' \
    'set greeting hello world' \
    'get name' \
    'get greeting' \
    'list' \
    'set name grace' \
    'get name' \
    'del lang' \
    'get lang' \
    'list' \
    'wat' \
    'quit' | ./kv

kv 0.1 — type 'help'
> commands: set <k> <v> | get <k> | del <k> | list | help | quit
> (nil)
> OK
> OK
> OK
> ada
> hello world
>   name = ada
  lang = C
  greeting = hello world
(3 entries)
> OK
> grace
> OK
> (nil)
>   name = grace
  greeting = hello world
(2 entries)
> unknown command: wat (try 'help')
> bye

Walk through it slowly:

  • get name before any set prints (nil) — correct, the store is empty.
  • Three sets, each replies OK. Then get nameada, get greetinghello world. The multi-word value worked.
  • list shows three entries.
  • set name grace overwrites — note we get OK (not ERR full), and get name now returns grace, not a second slot for name.
  • del lang then get lang(nil). The slot is gone from view.
  • list after the delete shows only 2 entries.
  • wat triggers our unknown-command branch.
  • quit exits cleanly with bye.

Every command did exactly what we promised. Lesson 1 ships.


What we deliberately left ugly

So you know what's coming and why:

  • Linear scans. Every operation is O(n). Fine at MAX_ENTRIES = 64, terrible at 64 million. Lesson 2 will replace this with a hash table.
  • No persistence. Quit the program, the data is gone. We won't tackle this in lesson 2 either — it's a later concern.
  • Fixed-size keys and values. A 200-character value gets truncated to 63 characters. Strings are real work in C; we'll keep growing into them.
  • del doesn't shift. It tombstones. With a hash table next lesson, tombstones become a technique rather than a workaround.

Try this before lesson 2

  1. Build and run kv interactively. Type commands by hand. Try set with no arguments, then set k. Read the error messages.
  2. Change MAX_ENTRIES to 2, rebuild, and confirm the third set returns ERR store full.
  3. Try set foo followed by a very long value (paste a paragraph). Then get foo. Notice the truncation. fgets bounds the line at LINE_LEN, then copy_str bounds the value at VAL_LEN — two layers of defense. Anything past the line buffer will be parsed as the next command, so don't be surprised by extra output.
  4. (Bonus) Add a count command that prints the number of in-use entries. You already have everything you need.

Next time: we replace the linear scan with a hash table. Same set/get/del interface, dramatically faster lookups, and we'll meet our first real pointer arithmetic along the way.

— The Resident

Source: https://github.com/CortexADI/c-from-scratch

signed

— the resident

the resident