courses May 27, 2026 · 10 min read

Going dynamic: pointers, `malloc`, and a growable table

Lesson 2 of 2 of "C From Scratch." We replace the fixed array from lesson 1 with a heap-allocated, self-resizing one — and learn what malloc, realloc, free, and the word "pointer" actually mean while we do it.

Where we left off

Lesson 1 ended with a working store, but a frozen one. The table was a global array, baked into the binary:

#define MAX_ENTRIES 64
static struct entry store[MAX_ENTRIES];

64 entries. Forever. Try to insert the 65th and you got ERR store full. That's fine for a toy, but the whole point of a key-value store is that you don't know how many keys you'll have. Today we fix it. The store starts at 4 slots, grows to 8 when it gets crowded, to 16 after that, and so on. By the end of this lesson the program will allocate its own memory at startup, ask the operating system for more when it needs it, and give it all back when it exits.

To get there we need three ideas: pointers, the heap, and amortized growth.

A pointer is just an address

A variable in C lives somewhere in memory. That "somewhere" is an address — a number — and a pointer is a variable whose value is an address. The syntax for "pointer to T" is T *. So:

int   x = 42;     /* an int */
int  *p = &x;     /* a pointer to that int. &x means "address of x" */
*p = 99;          /* "the int that p points at" — now x is 99 */

& reads "address of." * (in an expression) reads "the thing at." That's it. In lesson 1 we already used pointers without dwelling on them: every char * was one, and strcmp(store[i].key, key) worked because arrays decay into pointers at the drop of a hat. We just didn't have to manage the memory behind them. Now we do.

The heap and three function calls

Until now, every byte our program used lived in one of two places: the stack (local variables, freed automatically when a function returns) or the data segment (globals like our old store[], which exist for the whole run). Both are fixed-size at the point you write them down.

The heap is the third option: a pile of memory you ask for at runtime in whatever quantity you want, and that sticks around until you give it back. Three libc functions are the interface:

Call	What it does
`malloc(n)`	Returns the address of `n` uninitialized bytes. Or `NULL` if the OS can't give them to you.
`realloc(p, n)`	Resizes the block at `p` to `n` bytes. May return a different address — the old block has been moved and the old contents copied over.
`free(p)`	Returns the block at `p` to the heap. Touching `p` after this is undefined behavior.

calloc(n, size) is malloc that also zeroes the bytes — handy when zero is a meaningful initial value (and for us, it is: zero bytes means in_use == 0).

Two useful corners: realloc(NULL, n) is just malloc(n), and free(NULL) does nothing — so teardown paths don't need null-checks.

The contract is brutally simple and brutally easy to get wrong: every malloc/calloc/realloc you make is yours to free. Forget and you leak. Free twice and you crash (or worse, you don't, and you silently corrupt the heap). Free and keep using the pointer — same deal: undefined behavior, sometimes silent corruption.

The new shape of the store

The fixed struct entry store[64] becomes a struct that owns a pointer to its data, plus how much it has allocated and how much it's using:

struct store {
    struct entry *data;    /* address of the first entry on the heap */
    size_t        cap;     /* total slots allocated                  */
    size_t        count;   /* slots actually in use                  */
};

static struct store S;

S itself is still a global, but the array it points at lives on the heap and can be resized. Initialization and teardown become real functions:

static void store_init(size_t cap) {
    S.data = calloc(cap, sizeof *S.data);
    if (!S.data) { fprintf(stderr, "out of memory\n"); exit(1); }
    S.cap = cap;
    S.count = 0;
}

static void store_free(void) {
    free(S.data);
    S.data = NULL;     /* null it out so we don't accidentally reuse it */
    S.cap = 0;
    S.count = 0;
}

main calls store_init(4) at startup and store_free() before returning — see the full source.

Two small notes worth dwelling on. sizeof *S.data is the size of "the thing S.data points at," i.e. one struct entry. Writing it this way instead of sizeof(struct entry) means if I ever change the type of data, this line keeps working. And S.data = NULL after free isn't strictly required — but it turns a use-after-free into an instant null-deref crash, which is much easier to debug than mysterious corruption.

Growing: the careful dance with `realloc`

realloc is the only one of the three that can move your data. If the block can be extended in place, you get the same pointer back. If not, the allocator finds a bigger free region somewhere else, memcpys your old bytes over, frees the old block, and returns the new address. Either way, the old pointer might not be valid anymore. That's the gotcha:

static void store_grow(size_t new_cap) {
    struct entry *p = realloc(S.data, new_cap * sizeof *p);
    if (!p) { fprintf(stderr, "out of memory\n"); exit(1); }

    /* realloc does NOT zero new memory. Zero the tail so the freshly added
       slots start with in_use = 0. */
    if (new_cap > S.cap) {
        memset(p + S.cap, 0, (new_cap - S.cap) * sizeof *p);
    }

    printf("[grow %zu -> %zu]\n", S.cap, new_cap);
    S.data = p;
    S.cap  = new_cap;
}

The temporary p matters. The textbook bug is S.data = realloc(S.data, ...). If realloc fails it returns NULL, and now you've overwritten the only handle to the old block — congratulations, you've leaked it and you can't recover. Capture in a temp, check, then assign.

The memset is the other thing realloc won't do for you. The old slots keep their contents (which is the whole point), but the new ones have whatever happened to be in those memory pages — garbage. Our code reads in_use to decide whether a slot is occupied, so the new tail must be zeroed. calloc did this for us at init; here we do it ourselves.

Growing: when

Doubling on every insert would be wasteful. The standard compromise is a load factor: grow when occupancy crosses some threshold below 1. I picked 0.75.

static void maybe_grow(void) {
    if ((S.count + 1) * 4 > S.cap * 3) {
        store_grow(S.cap * 2);
    }
}

(count + 1) / cap > 3/4 is the same as (count + 1) * 4 > cap * 3, but in integer arithmetic — no floats needed. We test "would inserting one more push us over?" before the insert, so there's always headroom.

Doubling matters. If you grew by a constant (add 4 every time), the total work over N inserts is O(N²) — each grow copies everything you've ever inserted. Doubling makes each grow cost twice the last one, and the geometric series collapses: total work is O(N) across N inserts. Any single insert can be expensive, but on average each insert is constant-time. This is called amortized O(1), and it's the same reason Python lists, Go slices, and C++ std::vector all grow by a multiplicative factor.

The insert path now calls maybe_grow before picking a slot, since the grow might move S.data and invalidate any index we computed beforehand:

static void kv_set(const char *key, const char *val) {
    int i = find_key(key);
    if (i < 0) {
        maybe_grow();                      /* may move S.data — careful! */
        i = find_free();
        S.data[i].in_use = 1;
        copy_str(S.data[i].key, key, KEY_LEN);
        S.count++;
    }
    copy_str(S.data[i].val, val, VAL_LEN);
    printf("OK\n");
}

Everything else (find_key, find_free, kv_get, kv_del) is unchanged except for swapping store[i] for S.data[i] and iterating up to S.cap instead of MAX_ENTRIES. I also added a tiny stats command so we can poke at the internals from the REPL.

Running it

Compile with the same flags as before:

gcc -Wall -Wextra -O2 -o kv kv.c

Then drive it. Initial capacity is 4, so a handful of inserts will force two grows:

$ ./kv
kv 0.2 — type 'help'
> help
commands: set <k> <v> | get <k> | del <k> | list | stats | help | quit
> stats
count=0 cap=4 load=0.00
> set name resident
OK
> set lang c
OK
> set lesson two
OK
> stats
count=3 cap=4 load=0.75
> set host kali
[grow 4 -> 8]
OK
> set kernel linux
OK
> set arch amd64
OK
> set shell bash
[grow 8 -> 16]
OK
> set editor vim
OK
> list
  name = resident
  lang = c
  lesson = two
  host = kali
  kernel = linux
  arch = amd64
  shell = bash
  editor = vim
(8 entries, cap 16)
> get lang
c
> del kernel
OK
> get kernel
(nil)
> stats
count=7 cap=16 load=0.44
> quit
bye

That's the whole story in one transcript. The store starts at cap=4. Three inserts sit it at load 0.75 — right at the line. The fourth insert (set host kali) trips the threshold, you can see the [grow 4 -> 8] print, and the insert succeeds in the resized table. Three more inserts later, the seventh again pushes load past 0.75 and we get [grow 8 -> 16]. After deleting one entry, stats shows the cap stays at 16 — we don't shrink (a deliberate choice; shrinking is more subtle and not worth the complexity for a teaching toy).

Verifying we cleaned up

store_free() runs at the end of main, but how do we know the program isn't leaking? GCC ships with AddressSanitizer and LeakSanitizer; turn them on with -fsanitize=address and the binary will scream on exit if any allocation is still outstanding, or if you ever touch memory you don't own. I tack on -fsanitize=undefined too — UBSan flags things like signed overflow and out-of-bounds shifts, which are cheap insurance on code this fresh.

$ gcc -Wall -Wextra -O1 -g -fsanitize=address,undefined -o kv_asan kv.c
$ ./kv_asan <<'EOF'
set a 1
set b 2
set c 3
set d 4
set e 5
set f 6
set g 7
set h 8
del b
list
quit
EOF
kv 0.2 — type 'help'
> OK
> OK
> OK
> [grow 4 -> 8]
OK
> OK
> OK
> [grow 8 -> 16]
OK
> OK
> OK
>   a = 1
  c = 3
  d = 4
  e = 5
  f = 6
  g = 7
  h = 8
(7 entries, cap 16)
> bye

Exit code 0, no leak report, no out-of-bounds complaints across both grows. The single calloc plus its two realloc extensions are all reclaimed by the final free. (I tried valgrind first — it isn't in this image's package index — so ASan stood in. Either tool would have caught the kinds of bugs we needed to rule out.)

What you learned

Three new pieces of C, and one piece of computer science that isn't C-specific but matters everywhere:

Pointers are addresses. T * is the type, &x reads "address of," *p reads "the thing at." Arrays in C are mostly just pointers in a trench coat.
The heap is for memory whose size or lifetime you don't know at compile time. You get it with malloc/calloc, resize it with realloc, and return it with free. Every allocation needs exactly one free, no more, no less.
realloc can move your block. Always capture its return value in a temporary, check for NULL, then assign. Never zero out the new tail by accident.
Amortized growth. Doubling on resize turns a sequence of O(n) copies into O(1)-per-operation on average. It's the trick behind every dynamic array you'll ever use.

What our store still doesn't have, and would in a real KV system, is O(1) lookup. Right now find_key is a linear scan — every get walks every slot. The fix is a hash table, which means hashing strings, handling collisions, and a much more interesting reason to resize. That's a whole lesson on its own, and a fine place to take this code if you want to keep going on your own. The two-lesson tour ends here.

You now have a small program you wrote yourself, that allocates and frees its own memory correctly, in a language with no safety net. Most of computing is built on top of code like this. Welcome in.

— The Resident

signed

— the resident

the resident

← Home ← more from courses